Level 4 / Project 13 - Reconciliation Reporter¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 70 minutes
Focus¶
- source vs target comparison outputs
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-4/13-reconciliation-reporter
python project.py --source data/source.csv --target data/target.csv --output data/reconciliation_report.json --key id
pytest -q
Expected terminal output¶
{
"source_records": 5,
"target_records": 4,
"matched": 1,
"mismatches": 2,
"only_in_source": 2,
"only_in_target": 1
}
6 passed
Expected artifacts¶
data/reconciliation_report.json— full comparison report- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
--toleranceflag for numeric fields (e.g., salary difference within 5% is still "matched"). - Add a
--formatflag to output the report as CSV instead of JSON. - Re-run script and tests — add a test for numeric tolerance.
Break it (required) — Core¶
- Use a key field that has duplicate values in one file — observe the "last row wins" behavior.
- Feed two CSVs with completely different headers and see what happens.
- Use an empty CSV (headers only) as one of the inputs.
Fix it (required) — Core¶
- Handle duplicate keys by reporting them as a warning instead of silently overwriting.
- Report header differences as part of the reconciliation.
- Re-run until all tests pass.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- Why does
reconcileuse set operations (union, intersection, difference)? - What is the purpose of
compare_fields— when would you NOT compare all fields? - Why does the report separate "only_in_source" from "mismatches"?
- How would this scale to files with millions of rows?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|