Level 4 / Project 15 - Level 4 Mini Capstone¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 75 minutes
Focus¶
- data-quality-first automation workflow
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-4/15-level4-mini-capstone
python project.py --input data/sample_input.csv --output-dir data/output --required name,age --batch-size 3
pytest -q
Expected terminal output¶
Expected artifacts¶
data/output/valid_data.json— validated and transformed rowsdata/output/quarantined.json— rejected rows with reasonsdata/output/manifest.json— file inventory with checksums- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
--schemaflag that loads validation rules from a JSON file (like project 01). - Add a
--reportflag that generates a human-readable summary alongside the JSON. - Re-run script and tests — verify the schema-based validation works.
Break it (required) — Core¶
- Kill the process mid-run (Ctrl+C after 2 rows) and restart — verify it resumes from checkpoint.
- Feed it a CSV with headers but no data rows.
- Remove the output directory and verify it is created automatically.
Fix it (required) — Core¶
- Handle keyboard interrupts gracefully (save checkpoint before exiting).
- Add total processing time to the manifest.
- Re-run until all tests pass.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- How does this project combine skills from projects 01-14?
- Why is checkpoint recovery important for data pipelines?
- What is the purpose of the manifest — when would you use it?
- If this pipeline processed 1 million rows, what would be the bottleneck and how would you optimize?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|