Walkthrough: Level 0 Mini Toolkit¶
This guide walks through the thinking process for building this project. It does NOT give you the complete solution. For that, see SOLUTION.md.
Before reading this¶
Try the project yourself first. Spend at least 20 minutes. If you have not tried yet, close this file and open the project README.
Understanding the problem¶
You need to build a multi-tool command-line script that combines three text utilities into one program:
- Word Counter -- count words, lines, and characters
- Duplicate Finder -- find lines that appear more than once
- String Cleaner -- strip, lowercase, and remove non-alphanumeric characters
The user picks a tool with the --tool flag (or runs all three with --tool all). Results are printed and saved to a JSON file.
This is the Level 0 capstone -- it shows how small, focused functions compose into a larger program.
Planning before code¶
flowchart TD
A[Parse CLI args: --input, --tool, --output] --> B{Which tool?}
B -->|wordcount| C[count_words]
B -->|duplicates| D[find_duplicates]
B -->|clean| E[clean_string]
B -->|all| F[run_all_tools]
F --> C
F --> D
F --> E
C --> G[Print results + save JSON]
D --> G
E --> G
Four pieces to build:
- Three tool functions -- each takes text, returns results
- A dispatcher -- routes the
--toolargument to the right function - An "all" mode -- runs every tool and collects results
- CLI + file I/O -- parse arguments, read input file, write output JSON
Step 1: Word Counter¶
The simplest tool. Split text into words, lines, and count them.
def count_words(text: str) -> dict:
words = text.split()
lines = text.splitlines()
return {
"words": len(words),
"lines": len(lines),
"characters": len(text),
}
.split() with no arguments splits on any whitespace (spaces, tabs, newlines) and ignores leading/trailing whitespace. .splitlines() splits on newline characters specifically.
Predict before you scroll¶
If the text is "hello world\ngoodbye world", how many words, lines, and characters does this return?
Step 2: Duplicate Finder¶
This tool finds lines that appear more than once. The strategy: use a dictionary to count how many times each line appears.
def find_duplicates(lines: list[str]) -> list[dict]:
counts = {}
for line in lines:
stripped = line.strip()
if not stripped:
continue
if stripped in counts:
counts[stripped] += 1
else:
counts[stripped] = 1
return [
{"text": text, "count": count}
for text, count in counts.items()
if count > 1
]
The pattern here is count-then-filter:
1. Loop through all lines and count occurrences in a dictionary
2. Build a list of only the entries where count > 1
Predict before you scroll¶
If a file has the same line three times, what will the count value be for that line in the result?
Step 3: String Cleaner¶
This tool normalises text by stripping whitespace, lowercasing, and removing non-alphanumeric characters.
def clean_string(text: str) -> str:
result = text.strip().lower()
cleaned = []
for char in result:
if char.isalnum() or char == " ":
cleaned.append(char)
return "".join(cleaned)
The approach is a character filter: loop through every character, keep it only if it is a letter, a digit, or a space. Then join the kept characters back into a string.
Step 4: The dispatcher¶
The dispatcher routes a tool name to the right function. This is a fundamental pattern in programming.
def run_tool(tool_name: str, text: str) -> dict:
if tool_name == "wordcount":
return {"tool": "wordcount", "result": count_words(text)}
elif tool_name == "duplicates":
lines = text.splitlines()
return {"tool": "duplicates", "result": find_duplicates(lines)}
elif tool_name == "clean":
lines = text.splitlines()
cleaned = [clean_string(line) for line in lines if line.strip()]
return {"tool": "clean", "result": cleaned}
else:
return {"tool": tool_name, "error": f"Unknown tool: {tool_name}"}
Predict before you scroll¶
What happens if someone passes --tool unknown_tool? Does the current code crash, or does it handle it?
Common mistakes¶
| Mistake | Why it happens | How to fix |
|---|---|---|
text.split(" ") gives wrong word count |
Splitting on " " creates empty strings for multiple spaces |
Use .split() with no argument |
| Duplicate finder counts blank lines | Empty lines are not meaningful duplicates | Skip lines where stripped is empty |
| Clean function collapses into no spaces | Removing all non-alphanumeric removes spaces too | Keep spaces by checking char == " " separately |
| JSON output crashes | Trying to serialise something JSON does not support | Make sure all results are dicts, lists, strings, or numbers |
Testing your solution¶
Run the tests from the project directory:
The tests check:
- count_words() returns correct counts
- find_duplicates() identifies repeated lines
- clean_string() normalises text correctly
- run_tool() dispatches to the correct tool
- run_all_tools() runs all three and returns combined results
You can also test manually:
python project.py --input data/sample_input.txt --tool wordcount
python project.py --input data/sample_input.txt --tool duplicates
python project.py --input data/sample_input.txt --tool all
What to explore next¶
- Add a fourth tool: "reverse" that reverses the order of lines in the file
- Handle the unknown-tool case by raising a
ValueErrorwith a message listing valid tool names