Level 0 / Project 08 - string Cleaner Starter¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| Concept | This project | — | Quiz | Flashcards | Diagram | Browser |
Estimated time: 20 minutes
Focus¶
- trim, lowercase, and replace transformations
Why this project exists¶
Clean messy strings by stripping whitespace, lowercasing, removing special characters, and collapsing multiple spaces. You will build a multi-step cleaning pipeline and see how order of operations matters.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-0/08-string-cleaner-starter
python project.py --input data/sample_input.txt
pytest -q
Expected terminal output¶
=== String Cleaning Results ===
" Hello, World!!!" => "hello world"
"***URGENT*** Check this NOW!" => "urgent check this now"
" spaces everywhere in this line" => "spaces everywhere in this line"
3 lines cleaned. Output written to data/output.json
5 passed
Expected artifacts¶
data/output.json- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
remove_digits()step that strips all numeric characters from the string. - Add a
--stepsflag that lets the user choose which cleaning steps to apply (e.g.--steps strip,lower). - Re-run script and tests.
Break it (required) — Core¶
- Feed in a string that is already perfectly clean -- does
clean_string()return it unchanged? - Feed in a string of only special characters like
@#$%^&*-- does the cleaner return an empty string? - Feed in a string with tab characters (
\t) -- doescollapse_spaces()handle tabs or only spaces?
Fix it (required) — Core¶
- Ensure
collapse_spaces()also collapses tabs and other whitespace, not just spaces. - Handle the all-special-characters case gracefully (return empty string without error).
- Add a test for the tab-handling edge case.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- Why does the cleaning pipeline apply steps in a specific order (strip, then lowercase, then remove specials, then collapse)?
- What happens if you collapse spaces before removing special characters?
- Why does
isalnum()keep letters and digits but remove punctuation? - Where would string cleaning appear in real software (search indexing, data import, form validation)?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|