Level 4 / Project 13 - Reconciliation Reporter¶

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize	Try
—	This project	—	—	Flashcards	—	Browser

Estimated time: 70 minutes

Focus¶

source vs target comparison outputs

Why this project exists¶

This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.

Run (copy/paste)¶

Use <repo-root> as the folder containing this repository's README.md.

cd <repo-root>/projects/level-4/13-reconciliation-reporter
python project.py --source data/source.csv --target data/target.csv --output data/reconciliation_report.json --key id
pytest -q

Expected terminal output¶

{
  "source_records": 5,
  "target_records": 4,
  "matched": 1,
  "mismatches": 2,
  "only_in_source": 2,
  "only_in_target": 1
}
6 passed

Expected artifacts¶

data/reconciliation_report.json — full comparison report
Passing tests
Updated notes.md

Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.

Alter it (required) — Extension¶

Add a --tolerance flag for numeric fields (e.g., salary difference within 5% is still "matched").
Add a --format flag to output the report as CSV instead of JSON.
Re-run script and tests — add a test for numeric tolerance.

Break it (required) — Core¶

Use a key field that has duplicate values in one file — observe the "last row wins" behavior.
Feed two CSVs with completely different headers and see what happens.
Use an empty CSV (headers only) as one of the inputs.

Fix it (required) — Core¶

Handle duplicate keys by reporting them as a warning instead of silently overwriting.
Report header differences as part of the reconciliation.
Re-run until all tests pass.

Checkpoint: All modifications done, tests still pass. Good time to review your changes.

Explain it (teach-back)¶

Why does reconcile use set operations (union, intersection, difference)?
What is the purpose of compare_fields — when would you NOT compare all fields?
Why does the report separate "only_in_source" from "mismatches"?
How would this scale to files with millions of rows?

Mastery check¶

You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.

← Prev	Home	Next →