Chapter 6: Word Counter: String Manipulation & Collections

Welcome to Chapter 6 of our Java project series! In this chapter, we’re diving into the fascinating world of text processing by building a “Word Counter” application. This project will serve as an excellent exercise in mastering Java’s string manipulation capabilities and making effective use of its powerful Collections Framework, particularly Maps and Lists.

The ability to process and analyze text is a fundamental skill in many software development domains, from data science to natural language processing. By building a word counter, you’ll gain practical experience in tokenizing text, normalizing data, and efficiently storing and retrieving frequency counts. We’ll focus on creating clean, robust, and production-ready code that handles various input scenarios and adheres to modern Java best practices.

Before we begin, ensure you have a working Java 25 development environment and a Maven-based project setup from the previous chapters. We’ll be extending that project by adding new classes and tests. By the end of this chapter, you’ll have a fully functional word counter that can process user input and display word frequencies, laying a strong foundation for more complex text-based applications.

1. Planning & Design

Our Word Counter application will take a block of text as input, count the occurrences of each unique word, and then display the words along with their counts, sorted by frequency. To achieve this, we’ll break down the functionality into several logical components:

Component Architecture:

WordCounterApp: The main entry point of our application. It will orchestrate the interaction between the user and the core word counting logic.
InputReader: A utility class responsible for reading text input, initially from the console, with potential for file input later.
WordProcessor: This class will handle the crucial task of cleaning and tokenizing the input text. It will convert text to a consistent format (e.g., lowercase) and split it into individual words, filtering out punctuation and other non-word characters.
WordCounterService: The core business logic component. It will receive a list of processed words and compute their frequencies, storing them in a suitable data structure. It will also provide methods for sorting and retrieving the results.

Data Structures:

We’ll use a List<String> to temporarily hold the individual words after tokenization.
A Map<String, Integer> (specifically, HashMap) will be ideal for storing the word counts, where the String is the word and the Integer is its frequency. HashMap offers efficient average-case performance for insertions and lookups.

File Structure:

We’ll maintain our standard Maven project structure, adding new classes under a dedicated package for the Word Counter.

├── pom.xml
└── src
    ├── main
    │   └── java
    │       └── com
    │           └── mycompany
    │               └── app
    │                   ├── ... (previous project files)
    │                   └── wordcounter
    │                       ├── InputReader.java
    │                       ├── WordProcessor.java
    │                       ├── WordCounterService.java
    │                       └── WordCounterApp.java
    └── test
        └── java
            └── com
                └── mycompany
                    └── app
                        └── wordcounter
                            ├── WordProcessorTest.java
                            └── WordCounterServiceTest.java

2. Step-by-Step Implementation

Let’s begin building our Word Counter application piece by piece.

2.1 Setup/Configuration

First, let’s create the necessary package and ensure our pom.xml is ready. We’ll assume slf4j and logback for logging, and junit-jupiter for testing are already configured from previous chapters. If not, please refer to Chapter 1 for setting up these essential dependencies.

Create a new directory for our word counter classes: src/main/java/com/mycompany/app/wordcounter.

2.2 Core Implementation

We’ll implement each component incrementally.

2.2.1 `InputReader` - Reading User Input

This class will handle reading text from the console. We’ll use Java’s Scanner for this purpose, ensuring it’s properly closed using a try-with-resources statement for resource management.

File: src/main/java/com/mycompany/app/wordcounter/InputReader.java

package com.mycompany.app.wordcounter;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Scanner;

/**
 * Utility class for reading input from various sources.
 * Currently supports console input.
 */
public class InputReader {

    private static final Logger logger = LoggerFactory.getLogger(InputReader.class);

    /**
     * Reads a line of text from the console.
     *
     * @param prompt The message to display to the user before reading input.
     * @return The text entered by the user, or an empty string if an error occurs.
     */
    public String readLine(String prompt) {
        System.out.println(prompt);
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(System.in))) {
            String line = reader.readLine();
            logger.debug("Read line from console: {}", line);
            return line != null ? line : "";
        } catch (IOException e) {
            logger.error("Error reading input from console: {}", e.getMessage(), e);
            System.err.println("An error occurred while reading input. Please try again.");
            return ""; // Return empty string on error to prevent null pointer issues downstream
        }
    }

    /**
     * Reads multiple lines of text from the console until an empty line is entered.
     *
     * @param prompt The message to display to the user.
     * @param terminationMessage The message to display for terminating input.
     * @return The concatenated text entered by the user.
     */
    public String readMultiLineInput(String prompt, String terminationMessage) {
        StringBuilder inputBuilder = new StringBuilder();
        System.out.println(prompt);
        System.out.println(terminationMessage);

        try (Scanner scanner = new Scanner(System.in)) {
            String line;
            while (scanner.hasNextLine() && !(line = scanner.nextLine()).isEmpty()) {
                inputBuilder.append(line).append(" "); // Append space to separate words from different lines
            }
            String fullText = inputBuilder.toString().trim();
            logger.debug("Read multi-line input: {}", fullText);
            return fullText;
        } catch (Exception e) { // Catching a general Exception for Scanner issues
            logger.error("Error reading multi-line input from console: {}", e.getMessage(), e);
            System.err.println("An error occurred while reading multi-line input. Please try again.");
            return "";
        }
    }
}

Explanation:

We use org.slf4j.Logger for robust logging, allowing us to trace input operations.
readLine uses BufferedReader for potentially more efficient line-by-line reading, especially for longer inputs.
readMultiLineInput uses Scanner which is convenient for iterating over lines until an empty one is encountered.
Both methods include try-with-resources to ensure BufferedReader and Scanner are automatically closed, preventing resource leaks.
Error handling (IOException, Exception) is in place to catch potential issues during input operations, logging them and providing user-friendly feedback.
We return an empty string on error to ensure downstream components don’t receive null and crash.

2.2.2 `WordProcessor` - Cleaning and Tokenizing Text

This class will take raw text, clean it by converting to lowercase and removing punctuation, and then split it into individual words.

File: src/main/java/com/mycompany/app/wordcounter/WordProcessor.java

package com.mycompany.app.wordcounter;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.regex.Pattern;
import java.util.stream.Collectors;

/**
 * Processes raw text to clean and tokenize it into a list of words.
 */
public class WordProcessor {

    private static final Logger logger = LoggerFactory.getLogger(WordProcessor.class);
    // Regex to match anything that is NOT a letter, number, or apostrophe.
    // Apostrophes are kept to handle contractions like "don't", "it's".
    private static final Pattern NON_WORD_CHARACTERS = Pattern.compile("[^\\p{L}\\p{N}']+");

    /**
     * Cleans the input text and tokenizes it into a list of words.
     * Steps include:
     * 1. Converting text to lowercase.
     * 2. Replacing non-word characters (except apostrophes) with spaces.
     * 3. Splitting the text by spaces.
     * 4. Filtering out any empty strings that may result from multiple spaces.
     *
     * @param text The raw input text.
     * @return A list of cleaned words. Returns an empty list if the input is null or empty.
     */
    public List<String> cleanAndTokenize(String text) {
        if (text == null || text.trim().isEmpty()) {
            logger.warn("Input text for cleaning and tokenization is null or empty.");
            return Collections.emptyList();
        }

        logger.debug("Starting text cleaning and tokenization for text (first 50 chars): '{}'", text.substring(0, Math.min(text.length(), 50)));

        String cleanedText = text.toLowerCase(); // Step 1: Convert to lowercase
        logger.trace("Text after toLowerCase: {}", cleanedText);

        // Step 2: Replace non-word characters with spaces. This helps handle punctuation like "word." becoming "word "
        cleanedText = NON_WORD_CHARACTERS.matcher(cleanedText).replaceAll(" ");
        logger.trace("Text after replacing non-word chars: {}", cleanedText);

        // Step 3 & 4: Split by spaces and filter out empty strings
        List<String> words = Arrays.stream(cleanedText.split("\\s+")) // Split by one or more whitespace characters
                                    .filter(word -> !word.isEmpty()) // Filter out empty strings
                                    .collect(Collectors.toList());

        logger.debug("Finished cleaning and tokenizing. Found {} words.", words.size());
        return words;
    }
}

Explanation:

NON_WORD_CHARACTERS is a Pattern compiled once for efficiency. [^\\p{L}\\p{N}']+ matches one or more characters that are NOT a letter (\p{L}), a number (\p{N}), or an apostrophe ('). This allows us to keep contractions like “don’t” as single words.
cleanAndTokenize performs the following steps:
1. Lowercase Conversion: Ensures “The” and “the” are counted as the same word.
2. Punctuation Removal: Uses the compiled regex to replace all non-word characters (except apostrophes) with a single space. This is crucial for separating words like “hello,world!” into “hello” and “world”.
3. Splitting: Splits the cleaned text by one or more whitespace characters (\\s+) to get individual words.
4. Filtering: Uses the Stream API to filter out any empty strings that might result from multiple spaces or leading/trailing spaces after replacement.
Robust null/empty input checking is performed.
Extensive logging (debug and trace levels) helps understand the transformation steps.

2.2.3 `WordCounterService` - Counting and Sorting Words

This class contains the core logic for counting word frequencies and providing sorted results.

File: src/main/java/com/mycompany/app/wordcounter/WordCounterService.java

package com.mycompany.app.wordcounter;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

/**
 * Service for counting word frequencies and providing sorted results.
 */
public class WordCounterService {

    private static final Logger logger = LoggerFactory.getLogger(WordCounterService.class);

    /**
     * Counts the occurrences of each word in the provided list.
     *
     * @param words A list of cleaned and tokenized words.
     * @return A map where keys are words and values are their counts. Returns an empty map if input is null or empty.
     */
    public Map<String, Integer> countWords(List<String> words) {
        if (words == null || words.isEmpty()) {
            logger.warn("Input word list for counting is null or empty.");
            return Collections.emptyMap();
        }

        logger.debug("Starting word counting for {} words.", words.size());
        Map<String, Integer> wordCounts = new HashMap<>();

        for (String word : words) {
            // Using Map.merge for concise and efficient incrementing
            wordCounts.merge(word, 1, Integer::sum);
            logger.trace("Counted '{}', current count: {}", word, wordCounts.get(word));
        }

        logger.debug("Finished word counting. Found {} unique words.", wordCounts.size());
        return wordCounts;
    }

    /**
     * Sorts the word counts map first by count in descending order,
     * then by word alphabetically in ascending order for ties.
     *
     * @param wordCounts The map of words and their counts.
     * @return A LinkedHashMap containing the sorted word counts. Returns an empty map if input is null or empty.
     */
    public Map<String, Integer> getSortedWordCounts(Map<String, Integer> wordCounts) {
        if (wordCounts == null || wordCounts.isEmpty()) {
            logger.warn("Input word counts map for sorting is null or empty.");
            return Collections.emptyMap();
        }

        logger.debug("Sorting word counts. Total unique words: {}", wordCounts.size());

        // Stream the entry set, sort it, and collect into a LinkedHashMap to preserve order
        Map<String, Integer> sortedMap = wordCounts.entrySet().stream()
                .sorted(Map.Entry.<String, Integer>comparingByValue(Comparator.reverseOrder()) // Sort by count (descending)
                        .thenComparing(Map.Entry.comparingByKey())) // Then by word (alphabetical, ascending)
                .collect(Collectors.toMap(
                        Map.Entry::getKey,
                        Map.Entry::getValue,
                        (e1, e2) -> e1, // Merge function for duplicate keys (should not happen with Map.Entry)
                        LinkedHashMap::new // Ensure the map preserves insertion order
                ));

        logger.debug("Finished sorting word counts.");
        return sortedMap;
    }
}

Explanation:

countWords:
- Uses a HashMap for efficient storage and retrieval of word counts.
- The Map.merge(key, value, remappingFunction) method is a concise and efficient way to increment counts. If the word is already in the map, Integer::sum is used to add 1 to its current count; otherwise, the word is added with a count of 1.
- Handles null/empty input gracefully.
getSortedWordCounts:
- Leverages the Java Stream API for sorting.
- Map.Entry.<String, Integer>comparingByValue(Comparator.reverseOrder()) sorts entries primarily by their value (count) in descending order.
- .thenComparing(Map.Entry.comparingByKey()) acts as a secondary sort key, sorting by the map key (word) alphabetically in ascending order for words with the same count.
- Collectors.toMap is used to collect the sorted stream back into a Map. We specify LinkedHashMap::new to ensure that the insertion order (which is now our sorted order) is preserved.

2.2.4 `WordCounterApp` - Main Application Entry Point

This class will tie all the components together, providing the user interface for our Word Counter.

File: src/main/java/com/mycompany/app/wordcounter/WordCounterApp.java

package com.mycompany.app.wordcounter;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.List;
import java.util.Map;

/**
 * Main application class for the Word Counter.
 * Orchestrates input reading, word processing, and word counting.
 */
public class WordCounterApp {

    private static final Logger logger = LoggerFactory.getLogger(WordCounterApp.class);

    private final InputReader inputReader;
    private final WordProcessor wordProcessor;
    private final WordCounterService wordCounterService;

    /**
     * Constructor for WordCounterApp.
     * Initializes the dependent services.
     */
    public WordCounterApp() {
        this.inputReader = new InputReader();
        this.wordProcessor = new WordProcessor();
        this.wordCounterService = new WordCounterService();
    }

    /**
     * Runs the word counter application.
     */
    public void run() {
        logger.info("Starting Word Counter Application...");
        System.out.println("--- Word Counter ---");

        String text = inputReader.readMultiLineInput(
                "Please enter the text you want to count words for. Press Enter on an empty line to finish.",
                "Example: Hello world. This is a test. Hello again!"
        );

        if (text.isEmpty()) {
            System.out.println("No text entered. Exiting application.");
            logger.warn("Application exited due to empty input text.");
            return;
        }

        logger.debug("Processing input text: {}", text);
        List<String> words = wordProcessor.cleanAndTokenize(text);

        if (words.isEmpty()) {
            System.out.println("No valid words found after processing. Exiting application.");
            logger.warn("Application exited due to no valid words found after processing.");
            return;
        }

        Map<String, Integer> wordCounts = wordCounterService.countWords(words);
        Map<String, Integer> sortedWordCounts = wordCounterService.getSortedWordCounts(wordCounts);

        displayResults(sortedWordCounts);
        logger.info("Word Counter Application finished successfully.");
    }

    /**
     * Displays the word count results to the console.
     *
     * @param sortedWordCounts A map of words and their counts, sorted by frequency.
     */
    private void displayResults(Map<String, Integer> sortedWordCounts) {
        System.out.println("\n--- Word Count Results ---");
        if (sortedWordCounts.isEmpty()) {
            System.out.println("No words to display.");
            return;
        }
        sortedWordCounts.forEach((word, count) -> System.out.printf("  %-20s : %d%n", word, count));
        System.out.println("--------------------------");
    }

    /**
     * Main method to start the Word Counter application.
     *
     * @param args Command line arguments (not used).
     */
    public static void main(String[] args) {
        try {
            new WordCounterApp().run();
        } catch (Exception e) {
            logger.error("An unexpected error occurred in the Word Counter application: {}", e.getMessage(), e);
            System.err.println("An unexpected error occurred. Please check the logs for details.");
            System.exit(1); // Exit with a non-zero status code to indicate an error
        }
    }
}

Explanation:

The WordCounterApp constructor initializes instances of InputReader, WordProcessor, and WordCounterService. This demonstrates a simple form of dependency injection (manual) where the App class depends on these services.
The run() method orchestrates the entire flow:
1. Prompts the user for input using InputReader.
2. If input is empty, logs a warning and exits.
3. Cleans and tokenizes the input using WordProcessor.
4. If no valid words are found, logs a warning and exits.
5. Counts the words using WordCounterService.
6. Sorts the counts.
7. Displays the results using displayResults.
displayResults formats the output nicely using printf.
The main method creates an instance of WordCounterApp and calls run(). It includes a top-level try-catch block to gracefully handle any uncaught exceptions, logging them and exiting with an error code, which is good practice for production applications.

2.3 Testing This Component

Writing unit tests for our core logic is crucial to ensure correctness and maintainability. We’ll focus on WordProcessor and WordCounterService as they contain the bulk of the logic.

2.3.1 Testing `WordProcessor`

File: src/test/java/com/mycompany/app/wordcounter/WordProcessorTest.java

package com.mycompany.app.wordcounter;

import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

import static org.junit.jupiter.api.Assertions.*;

class WordProcessorTest {

    private final WordProcessor wordProcessor = new WordProcessor();

    @Test
    @DisplayName("Should correctly clean and tokenize a simple sentence")
    void shouldCleanAndTokenizeSimpleSentence() {
        String text = "Hello world. This is a test!";
        List<String> expected = Arrays.asList("hello", "world", "this", "is", "a", "test");
        List<String> actual = wordProcessor.cleanAndTokenize(text);
        assertEquals(expected, actual);
    }

    @Test
    @DisplayName("Should handle punctuation and special characters correctly")
    void shouldHandlePunctuationAndSpecialCharacters() {
        String text = "Java is great! How about \"Spring Boot\" and micro-services? (Amazing!)";
        List<String> expected = Arrays.asList("java", "is", "great", "how", "about", "spring", "boot", "and", "micro-services", "amazing");
        List<String> actual = wordProcessor.cleanAndTokenize(text);
        assertEquals(expected, actual);
    }

    @Test
    @DisplayName("Should handle numbers and contractions")
    void shouldHandleNumbersAndContractions() {
        String text = "I can't believe it's 2025! Don't you agree?";
        List<String> expected = Arrays.asList("i", "can't", "believe", "it's", "2025", "don't", "you", "agree");
        List<String> actual = wordProcessor.cleanAndTokenize(text);
        assertEquals(expected, actual);
    }

    @Test
    @DisplayName("Should return empty list for null input")
    void shouldReturnEmptyListForNullInput() {
        List<String> actual = wordProcessor.cleanAndTokenize(null);
        assertTrue(actual.isEmpty());
    }

    @Test
    @DisplayName("Should return empty list for empty string input")
    void shouldReturnEmptyListForEmptyStringInput() {
        List<String> actual = wordProcessor.cleanAndTokenize("");
        assertTrue(actual.isEmpty());
    }

    @Test
    @DisplayName("Should return empty list for whitespace-only input")
    void shouldReturnEmptyListForWhitespaceOnlyInput() {
        List<String> actual = wordProcessor.cleanAndTokenize("   \t \n ");
        assertTrue(actual.isEmpty());
    }

    @Test
    @DisplayName("Should handle mixed case input and convert to lowercase")
    void shouldHandleMixedCaseInput() {
        String text = "JAVA is Awesome";
        List<String> expected = Arrays.asList("java", "is", "awesome");
        List<String> actual = wordProcessor.cleanAndTokenize(text);
        assertEquals(expected, actual);
    }

    @Test
    @DisplayName("Should handle multiple spaces between words")
    void shouldHandleMultipleSpaces() {
        String text = "  word1   word2  word3 ";
        List<String> expected = Arrays.asList("word1", "word2", "word3");
        List<String> actual = wordProcessor.cleanAndTokenize(text);
        assertEquals(expected, actual);
    }
}

Explanation:

We use JUnit 5 annotations like @Test and @DisplayName for clear test descriptions.
Assertions like assertEquals and assertTrue are used to verify the output of cleanAndTokenize against expected lists of words.
Test cases cover various scenarios: simple sentences, punctuation, numbers, contractions, null/empty/whitespace-only input, and mixed-case input.

2.3.2 Testing `WordCounterService`

File: src/test/java/com/mycompany/app/wordcounter/WordCounterServiceTest.java

package com.mycompany.app.wordcounter;

import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

import java.util.Arrays;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import static org.junit.jupiter.api.Assertions.*;

class WordCounterServiceTest {

    private final WordCounterService wordCounterService = new WordCounterService();

    @Test
    @DisplayName("Should correctly count words in a simple list")
    void shouldCountWordsCorrectly() {
        List<String> words = Arrays.asList("hello", "world", "hello", "java", "world", "java", "java");
        Map<String, Integer> expected = Map.of("hello", 2, "world", 2, "java", 3);
        Map<String, Integer> actual = wordCounterService.countWords(words);
        assertEquals(expected, actual);
    }

    @Test
    @DisplayName("Should return empty map for null word list")
    void shouldReturnEmptyMapForNullWordList() {
        Map<String, Integer> actual = wordCounterService.countWords(null);
        assertTrue(actual.isEmpty());
    }

    @Test
    @DisplayName("Should return empty map for empty word list")
    void shouldReturnEmptyMapForEmptyWordList() {
        Map<String, Integer> actual = wordCounterService.countWords(Collections.emptyList());
        assertTrue(actual.isEmpty());
    }

    @Test
    @DisplayName("Should sort words by count descending, then by word alphabetically ascending")
    void shouldSortWordsCorrectly() {
        Map<String, Integer> unsortedCounts = Map.of(
                "apple", 3,
                "banana", 2,
                "cherry", 3,
                "date", 1,
                "elderberry", 2
        );

        Map<String, Integer> expectedSorted = new LinkedHashMap<>();
        expectedSorted.put("apple", 3);
        expectedSorted.put("cherry", 3);
        expectedSorted.put("banana", 2);
        expectedSorted.put("elderberry", 2);
        expectedSorted.put("date", 1);

        Map<String, Integer> actualSorted = wordCounterService.getSortedWordCounts(unsortedCounts);
        assertEquals(expectedSorted, actualSorted);
    }

    @Test
    @DisplayName("Should return empty map when sorting null word counts")
    void shouldReturnEmptyMapWhenSortingNullCounts() {
        Map<String, Integer> actual = wordCounterService.getSortedWordCounts(null);
        assertTrue(actual.isEmpty());
    }

    @Test
    @DisplayName("Should return empty map when sorting empty word counts")
    void shouldReturnEmptyMapWhenSortingEmptyCounts() {
        Map<String, Integer> actual = wordCounterService.getSortedWordCounts(Collections.emptyMap());
        assertTrue(actual.isEmpty());
    }
}

Explanation:

Similar to WordProcessorTest, we use JUnit 5 for testing.
shouldCountWordsCorrectly verifies that the countWords method correctly aggregates word frequencies.
shouldSortWordsCorrectly is a crucial test, verifying that getSortedWordCounts applies the correct sorting logic: primary sort by count (descending) and secondary sort by word (alphabetical ascending). We use LinkedHashMap for the expectedSorted map to ensure the order is preserved and can be compared directly.
Edge cases like null and empty inputs are also tested for both methods.

Running the Tests:

To run these tests, navigate to your project’s root directory in the terminal and execute:

mvn test

You should see output indicating that all tests passed. This confirms that our core logic for processing and counting words is functioning as expected.

3. Production Considerations

Building production-ready applications requires more than just functional code. We need to consider how the application behaves under various conditions, how to optimize its performance, ensure its security, and how to monitor it in a live environment.

3.1 Error Handling

Graceful Degradation: Our InputReader and WordProcessor are designed to return empty lists/strings on error or invalid input, preventing NullPointerExceptions further down the line. The WordCounterApp then checks for these empty results and provides user-friendly messages.
Centralized Error Handling: The main method in WordCounterApp includes a top-level try-catch block for Exception. This acts as a catch-all for any unexpected runtime errors, ensuring the application doesn’t crash abruptly but logs the error and exits gracefully with a non-zero status code.
Specific Exceptions: For file-based input (a future enhancement), IOException should be handled specifically, providing more context about the file operation failure.

3.2 Performance Optimization

HashMap for Counting: We chose HashMap for wordCounts because it offers average O(1) time complexity for insertion and lookup operations, which is highly efficient for large datasets of words.
Pattern Compilation: The NON_WORD_CHARACTERS regex Pattern in WordProcessor is compiled once as a static final field. This avoids recompiling the regex on every call to cleanAndTokenize, significantly improving performance for repeated text processing.
Stream API Efficiency: While convenient, overusing Stream operations can sometimes have a slight overhead. For our current scale, it’s perfectly fine and offers better readability. For extremely large texts (gigabytes), one might consider processing in chunks or using more direct loop-based approaches if profiling indicates a bottleneck.
StringBuilder for Input: In InputReader, StringBuilder is used for readMultiLineInput. This is more efficient than repeatedly concatenating String objects with + operator, which creates many intermediate string objects.

3.3 Security Considerations

For a simple console application, security concerns are minimal, but it’s good to establish habits:

Input Validation: While we clean text, we don’t necessarily “validate” it against malicious content. For web applications, input validation is critical to prevent injection attacks (SQL, XSS). For text processing, sanitization (like our cleaning step) is usually sufficient.
Resource Management: Using try-with-resources for Scanner and BufferedReader ensures that system resources (like input streams) are properly closed, preventing resource leaks which could be exploited or lead to stability issues.
No Sensitive Data: This application does not handle sensitive user data, so data security is not a primary concern here.

3.4 Logging and Monitoring

Structured Logging: We’ve consistently used slf4j with logback throughout the application. This allows us to log messages at different levels (TRACE, DEBUG, INFO, WARN, ERROR).
Contextual Information: Log messages include contextual information (e.g., word count, input text snippets, error messages) to aid debugging and monitoring.
Configuration: logback.xml (from Chapter 1) allows us to configure where logs go (console, file), their format, and their verbosity. In production, logs would typically be sent to a centralized logging system (e.g., ELK stack, Splunk, cloud logging services) for aggregation, analysis, and alerting.
Monitoring: For a production system, you’d integrate monitoring tools to track application health, resource usage (CPU, memory), and custom metrics (e.g., number of words processed per minute, average processing time).

4. Code Review Checkpoint

At this point, we have a fully functional and tested Word Counter application.

Summary of what was built:

InputReader.java: Handles reading single and multi-line text input from the console.
WordProcessor.java: Cleans text (lowercase, punctuation removal, handles contractions) and tokenizes it into a list of words.
WordCounterService.java: Counts word frequencies using a HashMap and provides a method to get results sorted by count and then alphabetically.
WordCounterApp.java: The main application class that orchestrates the flow, interacts with the user, and displays results.
Unit Tests: Comprehensive JUnit 5 tests for WordProcessor and WordCounterService covering various scenarios and edge cases.

Files Created/Modified:

src/main/java/com/mycompany/app/wordcounter/InputReader.java
src/main/java/com/mycompany/app/wordcounter/WordProcessor.java
src/main/java/com/mycompany/app/wordcounter/WordCounterService.java
src/main/java/com/mycompany/app/wordcounter/WordCounterApp.java
src/test/java/com/mycompany/app/wordcounter/WordProcessorTest.java
src/test/java/com/mycompany/app/wordcounter/WordCounterServiceTest.java

Integration with Existing Code: This chapter’s code is self-contained within the com.mycompany.app.wordcounter package and does not directly modify or depend on the previous projects. It demonstrates how to add a new, independent feature to our existing Maven project structure. The WordCounterApp can be run as a standalone Java application.

5. Common Issues & Solutions

Developers often encounter specific issues when dealing with string manipulation and collections. Here are a few common ones for a word counter and how to address them:

Issue: Incorrect Word Splitting / Punctuation Handling.
- Problem: Words like “hello,world” are counted as “hello,world” instead of “hello” and “world”. Or “don’t” is split into “don” and “t”.
- Debugging: Print the cleanedText string in WordProcessor before splitting to see how punctuation is being handled. Inspect the words list after splitting and filtering.
- Solution: Carefully craft your regex for replacing non-word characters. Our NON_WORD_CHARACTERS pattern [^\\p{L}\\p{N}']+ is designed to keep apostrophes for contractions while removing other non-alphanumeric characters. Using replaceAll(" ") ensures punctuation is replaced by a space, allowing split("\\s+") to correctly separate words.
- Prevention: Always test with diverse text inputs, including those with various punctuation, numbers, and contractions.
Issue: Case Sensitivity.
- Problem: “The” and “the” are counted as two different words.
- Debugging: Check the words list generated by WordProcessor. If you see both “The” and “the”, then case conversion isn’t happening.
- Solution: Ensure you convert the entire input text to a consistent case (e.g., lowercase using text.toLowerCase()) before tokenization. Our WordProcessor does this as the first step.
- Prevention: Make case normalization a mandatory first step in your text processing pipeline.
Issue: Performance/Memory for Very Large Inputs.
- Problem: The application slows down significantly or runs out of memory (OutOfMemoryError) when processing extremely large text files (e.g., a full book or a large log file).
- Debugging: Monitor JVM memory usage (e.g., using jvisualvm or jconsole). Profile the application to identify bottlenecks in processing.
- Solution:
  - Process in Chunks: Instead of reading the entire file into a single String, read and process it line by line or in fixed-size character buffers. This prevents loading the entire content into memory at once. For InputReader, this would involve methods to read from a File using BufferedReader iteratively.
  - Efficient Collections: HashMap is generally efficient, but for truly massive distinct word counts, memory consumption can still be an issue. Consider specialized data structures for very high-performance scenarios (e.g., Tries, Bloom filters for approximate counts), though typically not needed for a basic word counter.
  - Garbage Collection Tuning: For very large JVM heaps, tuning garbage collection parameters (-Xmx, GC algorithm choice) might be necessary.
- Prevention: Design for scalability from the start by considering iterative processing for I/O-bound tasks.

6. Testing & Verification

Now that we’ve implemented all the components and written unit tests, let’s verify the entire application.

Compile the Project: Navigate to your project’s root directory in the terminal and compile the code:
```
mvn clean install
```
This command will also run all your unit tests. Ensure they all pass.
Run the Word Counter Application: You can run the application directly from the command line using Maven’s exec plugin. Since WordCounterApp has a main method, we need to specify its fully qualified name.
```
mvn exec:java -Dexec.mainClass="com.mycompany.app.wordcounter.WordCounterApp"
```
Verify Functionality with Example Input:
When prompted, enter the following text (or your own) and press Enter on an empty line to finish:
```
Hello, world! This is a test.
Hello again, world. This is a great test, isn't it?
```
Expected Output:
```
--- Word Counter ---
Please enter the text you want to count words for. Press Enter on an empty line to finish.
Example: Hello world. This is a test. Hello again!
Hello, world! This is a test.
Hello again, world. This is a great test, isn't it?

--- Word Count Results ---
  hello                : 2
  is                   : 2
  test                 : 2
  world                : 2
  a                    : 1
  again                : 1
  great                : 1
  isn't                : 1
  it's                 : 1
  this                 : 1
--------------------------
Word Counter Application finished successfully.
```
- Check Word Counts: Verify that “hello”, “is”, “test”, and “world” all have a count of 2. Other words should have 1.
- Check Case Insensitivity: “Hello” and “hello” should be counted as the same word.
- Check Punctuation Removal: Punctuation like commas, periods, and exclamation marks should be correctly removed.
- Check Contractions: “isn’t” and “it’s” should be treated as single words.
- Check Sorting: The words should be sorted primarily by count in descending order, and secondarily alphabetically for words with the same count.
If your output matches the expected results, congratulations! Your Word Counter application is working correctly and robustly handling various text inputs.

7. Summary & Next Steps

In this comprehensive Chapter 6, we successfully built a production-ready Word Counter application using Java 25. We explored:

String Manipulation: Techniques for cleaning text, converting case, and removing punctuation using regular expressions.
Java Collections Framework: Effective use of List for tokenized words and Map (HashMap and LinkedHashMap) for efficient word counting and preserving sorted order.
Stream API: Leveraging Java 8+ Streams for concise and readable data processing, including filtering and custom sorting.
Modular Design: Breaking down the application into InputReader, WordProcessor, WordCounterService, and WordCounterApp for better maintainability and testability.
Production Best Practices: Implementing robust error handling, considering performance optimizations, and integrating comprehensive logging.
Unit Testing: Writing thorough JUnit 5 tests to ensure the correctness of our core logic.

This project has significantly enhanced your skills in text processing, a vital area in modern software development. You’ve learned how to transform raw data into structured information and perform analytical tasks.

In Chapter 7: Tic-Tac-Toe Game: Object-Oriented Design & Game Logic, we will shift gears to building an interactive console-based game. This will challenge your object-oriented design skills, introduce concepts like game state management, user input validation for game moves, and implementing winning conditions. Get ready to design classes for Board, Player, and the Game itself!

Chapter 6: Word Counter: String Manipulation & Collections

Table of Contents

1. Planning & Design

Component Architecture:

Data Structures:

File Structure:

2. Step-by-Step Implementation

2.1 Setup/Configuration

2.2 Core Implementation

2.2.1 InputReader - Reading User Input

2.2.2 WordProcessor - Cleaning and Tokenizing Text

2.2.3 WordCounterService - Counting and Sorting Words

2.2.4 WordCounterApp - Main Application Entry Point

2.3 Testing This Component

2.3.1 Testing WordProcessor

2.3.2 Testing WordCounterService

Running the Tests:

3. Production Considerations

3.1 Error Handling

3.2 Performance Optimization

3.3 Security Considerations

3.4 Logging and Monitoring

4. Code Review Checkpoint

5. Common Issues & Solutions

6. Testing & Verification

7. Summary & Next Steps

2.2.1 `InputReader` - Reading User Input

2.2.2 `WordProcessor` - Cleaning and Tokenizing Text

2.2.3 `WordCounterService` - Counting and Sorting Words

2.2.4 `WordCounterApp` - Main Application Entry Point

2.3.1 Testing `WordProcessor`

2.3.2 Testing `WordCounterService`