Level 2 / Project 15 - Level 2 Mini Capstone¶

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize	Try
Concept	This project	Walkthrough	Quiz	Flashcards	Diagram	Browser

Estimated time: 45 minutes

Focus¶

small end-to-end validated pipeline

Why this project exists¶

This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.

Run (copy/paste)¶

Use <repo-root> as the folder containing this repository's README.md.

cd <repo-root>/projects/level-2/15-level2-mini-capstone
python project.py data/sample_input.txt
python project.py data/sample_input.txt --numeric-field salary --threshold 2.0
python project.py data/sample_input.txt --json
pytest -q

Expected terminal output¶

============================================================
  DATA PIPELINE REPORT
============================================================
Records loaded:    12
Records valid:     10
Records invalid:   2
Anomalies found:   1
10 passed

Expected artifacts¶

Pipeline report on stdout
Passing tests
Updated notes.md

Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.

Alter it (required) — Extension¶

Add a --rules flag to load validation rules from a separate JSON file.
Add an --output flag to save valid records as a new CSV.
Add deduplication as a pipeline stage between cleaning and validation.

Break it (required) — Core¶

Feed a CSV where every record is invalid — does the report handle 0% pass rate?
Feed a CSV with no numeric column — does anomaly detection crash?
Feed an empty CSV (header only) — does the pipeline handle zero records?

Fix it (required) — Core¶

Guard against zero-record pass rate calculations.
Handle missing numeric fields in anomaly detection gracefully.
Add a test for empty/header-only CSV files.

Checkpoint: All modifications done, tests still pass. Good time to review your changes.

Explain it (teach-back)¶

How do the five pipeline stages (load, clean, validate, analyse, report) connect?
Why is each stage a separate function instead of one big function?
What Level 2 skills did you combine in this capstone?
How would you extend this pipeline for a real data processing job?

Mastery check¶

You can move on when you can: - describe all 5 pipeline stages and what each does, - add a new pipeline stage without modifying existing ones, - explain how data flows from raw CSV to final report, - identify which earlier Level 2 project each stage came from.

← Prev	Home	Next →