Level 4 / Project 08 - Malformed Row Quarantine¶

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize	Try
—	This project	—	—	Flashcards	—	Browser

Estimated time: 60 minutes

Focus¶

reject queue and reason tracking

Why this project exists¶

This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.

Run (copy/paste)¶

Use <repo-root> as the folder containing this repository's README.md.

cd <repo-root>/projects/level-4/08-malformed-row-quarantine
python project.py --input data/sample_input.csv --output-dir data/output --required 0,1
pytest -q

Expected terminal output¶

{
  "total_data_rows": 7,
  "valid": 4,
  "quarantined": 3
}
7 passed

Expected artifacts¶

data/output/valid_rows.txt — clean rows that passed all rules
data/output/quarantined_rows.json — rejected rows with reasons
data/output/quarantine_report.json — summary counts
Passing tests
Updated notes.md

Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.

Alter it (required) — Extension¶

Add a new rule: rule_no_duplicate_values that rejects rows where a key field repeats a value seen in a prior row.
Add a --delimiter CLI flag to handle TSV or pipe-delimited files.
Re-run script and tests — add a parametrized test for the new rule.

Break it (required) — Core¶

Feed it a file with only a header and no data rows — observe the counts.
Create a row with a field containing 10,000+ characters and see if rule_max_field_length catches it.
Remove the header row entirely and observe what happens to column-count validation.

Fix it (required) — Core¶

Handle the no-data-rows case gracefully (valid output, zero counts).
Add a --has-header flag for header-less files.
Re-run until all tests pass.

Checkpoint: All modifications done, tests still pass. Good time to review your changes.

Explain it (teach-back)¶

Why are validation rules written as separate functions instead of one big if/else block?
What is the advantage of collecting ALL reasons per row instead of stopping at the first?
Why does the quarantine file use JSON instead of CSV?
How would you add custom rules at runtime (without editing the source code)?

Mastery check¶

You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.

← Prev	Home	Next →