Level 6 / Project 15 - Level 6 Mini Capstone¶

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize	Try
—	This project	—	—	Flashcards	—	—

Focus¶

sql-centric pipeline resilience project

Why this project exists¶

This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.

Run (copy/paste)¶

Use <repo-root> as the folder containing this repository's README.md.

cd <repo-root>/projects/level-6/15-level6-mini-capstone
python project.py --input data/sample_input.txt --output data/output_summary.json
pytest -q

Expected terminal output¶

{
  "input_records": 7,
  "staged": 5,
  "loaded": 5,
  "rejected": 2,
  "dead_letters": 2,
  "lineage_entries": 12,
  "watermark": "2025-01-15T12:00:00"
}

Expected artifacts¶

data/output_summary.json — full pipeline results
Passing tests (pytest -q → 6+ passed)
Updated notes.md

Alter it (required)¶

Run the pipeline twice with a persistent database to confirm the watermark prevents reprocessing.
Add a --report flag that prints a summary of lineage entries grouped by step.
Add a run_log query to the output showing all historical pipeline runs.
Re-run script and tests after each change.

Break it (required)¶

Feed a record with key evt-003 and an older timestamp than the existing record — does the upsert overwrite with stale data?
Feed only invalid records and observe that the pipeline handles an all-rejection batch gracefully.
Corrupt the watermark table manually and observe the next run's behavior.

Fix it (required)¶

Add a timestamp comparison in the upsert: only update if the new timestamp is newer.
Handle the case where 100% of records are rejected without errors.
Add watermark validation to reject obviously invalid values.

Explain it (teach-back)¶

How do the individual Level 6 patterns (staging, upsert, lineage, watermark, dead-letter) combine into a full pipeline?
What would break if you removed the staging step and loaded directly to target?
How does the watermark enable idempotent reruns?
How would you adapt this pipeline for a production environment with millions of records?

Mastery check¶

You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.

← Prev	Home	Next →