Level 4 / Project 08 - Malformed Row Quarantine¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 60 minutes
Focus¶
- reject queue and reason tracking
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-4/08-malformed-row-quarantine
python project.py --input data/sample_input.csv --output-dir data/output --required 0,1
pytest -q
Expected terminal output¶
Expected artifacts¶
data/output/valid_rows.txt— clean rows that passed all rulesdata/output/quarantined_rows.json— rejected rows with reasonsdata/output/quarantine_report.json— summary counts- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a new rule:
rule_no_duplicate_valuesthat rejects rows where a key field repeats a value seen in a prior row. - Add a
--delimiterCLI flag to handle TSV or pipe-delimited files. - Re-run script and tests — add a parametrized test for the new rule.
Break it (required) — Core¶
- Feed it a file with only a header and no data rows — observe the counts.
- Create a row with a field containing 10,000+ characters and see if
rule_max_field_lengthcatches it. - Remove the header row entirely and observe what happens to column-count validation.
Fix it (required) — Core¶
- Handle the no-data-rows case gracefully (valid output, zero counts).
- Add a
--has-headerflag for header-less files. - Re-run until all tests pass.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- Why are validation rules written as separate functions instead of one big
if/elseblock? - What is the advantage of collecting ALL reasons per row instead of stopping at the first?
- Why does the quarantine file use JSON instead of CSV?
- How would you add custom rules at runtime (without editing the source code)?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|