Level 5 / Project 15 - Level 5 Mini Capstone¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 90 minutes
Focus¶
- intermediate-grade automation package
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-5/15-level5-mini-capstone
python project.py --config data/pipeline_config.json
pytest -q
Expected terminal output¶
Expected artifacts¶
data/output/summary.json- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
--dry-runflag that runs extract and transform but skips the export step. - Add env var overrides:
PIPELINE_THRESHOLD_WARN=80should override the config file value. - Add a retry wrapper around
extract_csv_filesso one bad file does not abort the pipeline. - Re-run script and tests.
Break it (required) — Core¶
- Point
input_dirin the config at a directory that does not exist. - Add a CSV with no numeric columns and observe what
_numericdefaults to. - Set
threshold_warnhigher thanthreshold_critin the config. - Capture the first failing test or visible bad output.
Fix it (required) — Core¶
- Validate that
input_direxists before starting extraction. - Log a clear warning when no numeric column is found in a row.
- Validate that
warn < critat config load time. - Add tests for each broken scenario.
- Re-run until output and tests are deterministic.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- How does
load_configimplement the three-layer priority (defaults, file, env)? - Why does
atomic_writeuse a.tmpfile and rename instead of writing directly? - How does
check_thresholdsseparate warnings from criticals? - How does this capstone tie together config, ETL, monitoring, and atomic export from earlier projects?
- Where would you see a pipeline like this in production (data warehousing, CI/CD, monitoring)?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|