Level 2 / Project 15 - Level 2 Mini Capstone¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| Concept | This project | Walkthrough | Quiz | Flashcards | Diagram | Browser |
Estimated time: 45 minutes
Focus¶
- small end-to-end validated pipeline
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-2/15-level2-mini-capstone
python project.py data/sample_input.txt
python project.py data/sample_input.txt --numeric-field salary --threshold 2.0
python project.py data/sample_input.txt --json
pytest -q
Expected terminal output¶
============================================================
DATA PIPELINE REPORT
============================================================
Records loaded: 12
Records valid: 10
Records invalid: 2
Anomalies found: 1
10 passed
Expected artifacts¶
- Pipeline report on stdout
- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
--rulesflag to load validation rules from a separate JSON file. - Add an
--outputflag to save valid records as a new CSV. - Add deduplication as a pipeline stage between cleaning and validation.
Break it (required) — Core¶
- Feed a CSV where every record is invalid — does the report handle 0% pass rate?
- Feed a CSV with no numeric column — does anomaly detection crash?
- Feed an empty CSV (header only) — does the pipeline handle zero records?
Fix it (required) — Core¶
- Guard against zero-record pass rate calculations.
- Handle missing numeric fields in anomaly detection gracefully.
- Add a test for empty/header-only CSV files.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- How do the five pipeline stages (load, clean, validate, analyse, report) connect?
- Why is each stage a separate function instead of one big function?
- What Level 2 skills did you combine in this capstone?
- How would you extend this pipeline for a real data processing job?
Mastery check¶
You can move on when you can: - describe all 5 pipeline stages and what each does, - add a new pipeline stage without modifying existing ones, - explain how data flows from raw CSV to final report, - identify which earlier Level 2 project each stage came from.
Related Concepts¶
- Collections Explained
- Functions Explained
- How Loops Work
- Virtual Environments
- Quiz: Collections Explained
| ← Prev | Home | Next → |
|---|---|---|