Level 6 / Project 02 - Staging Table Loader¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | — |
Focus¶
- staging ingestion contract simulation
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-6/02-staging-table-loader
python project.py --input data/sample_input.txt --output data/output_summary.json
pytest -q
Expected terminal output¶
{
"input_rows": 8,
"accepted": 5,
"rejected": 3,
"errors": ["row=6 missing_field=timestamp", ...],
"total_in_staging": 5
}
Expected artifacts¶
data/output_summary.json— load results with accept/reject counts- Passing tests (
pytest -q→ 7+ passed) - Updated
notes.md
Alter it (required)¶
- Add a new validation rule: reject rows where the
messagefield exceeds 200 characters. - Add a
--strictCLI flag that aborts the entire load if any row fails validation (instead of skipping). - Add a
loaded_attimestamp column tostaging_eventspopulated by the loader. - Re-run script and tests after each change.
Break it (required)¶
- Feed a CSV with mismatched column headers (e.g. rename "level" to "severity") and observe the error.
- Insert a row with a
levelvalue of "DEBUG" (not inVALID_LEVELS) and confirm rejection. - Try loading the same file twice and observe whether duplicates accumulate.
Fix it (required)¶
- Add header validation that checks required columns exist before processing any rows.
- Add an idempotency check using a hash of each row to prevent duplicate inserts.
- Add tests for each broken scenario.
Explain it (teach-back)¶
- Why do we insert row-by-row instead of using
executemanyfor this use case? - What is the advantage of a staging table vs inserting directly into the final table?
- How does the
CHECKconstraint in the DDL differ from Python-side validation? - In production ETL, what happens to rejected rows — are they just logged?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|