Level 6 / Project 02 - Staging Table Loader¶

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize	Try
—	This project	—	—	Flashcards	—	—

Focus¶

staging ingestion contract simulation

Why this project exists¶

This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.

Run (copy/paste)¶

Use <repo-root> as the folder containing this repository's README.md.

cd <repo-root>/projects/level-6/02-staging-table-loader
python project.py --input data/sample_input.txt --output data/output_summary.json
pytest -q

Expected terminal output¶

{
  "input_rows": 8,
  "accepted": 5,
  "rejected": 3,
  "errors": ["row=6 missing_field=timestamp", ...],
  "total_in_staging": 5
}

Expected artifacts¶

data/output_summary.json — load results with accept/reject counts
Passing tests (pytest -q → 7+ passed)
Updated notes.md

Alter it (required)¶

Add a new validation rule: reject rows where the message field exceeds 200 characters.
Add a --strict CLI flag that aborts the entire load if any row fails validation (instead of skipping).
Add a loaded_at timestamp column to staging_events populated by the loader.
Re-run script and tests after each change.

Break it (required)¶

Feed a CSV with mismatched column headers (e.g. rename "level" to "severity") and observe the error.
Insert a row with a level value of "DEBUG" (not in VALID_LEVELS) and confirm rejection.
Try loading the same file twice and observe whether duplicates accumulate.

Fix it (required)¶

Add header validation that checks required columns exist before processing any rows.
Add an idempotency check using a hash of each row to prevent duplicate inserts.
Add tests for each broken scenario.

Explain it (teach-back)¶

Why do we insert row-by-row instead of using executemany for this use case?
What is the advantage of a staging table vs inserting directly into the final table?
How does the CHECK constraint in the DDL differ from Python-side validation?
In production ETL, what happens to rejected rows — are they just logged?

Mastery check¶

You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.

← Prev	Home	Next →