Level 4 / Project 04 - Data Contract Enforcer¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 50 minutes
Focus¶
- contract validation and drift detection
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-4/04-data-contract-enforcer
python project.py --contract data/contract.json --input data/sample_input.csv --output data/enforcement_report.json
pytest -q
Expected terminal output¶
Expected artifacts¶
data/enforcement_report.json— per-row violation details- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
"pattern"rule (regex) to the contract for email-like fields. - Add a
--strictflag that also treats extra columns as violations. - Re-run script and tests — add a test for pattern enforcement.
Break it (required) — Core¶
- Remove a required column entirely from the CSV and see what
missing_columnsreports. - Feed a value that is technically the right type but fails range AND allowed-values checks simultaneously.
- Create a contract with contradictory rules (e.g.,
min: 100, max: 50) and observe the behavior.
Fix it (required) — Core¶
- Add contract self-validation that catches contradictory rules before enforcement begins.
- Handle the case where a column exists in the contract but not in the CSV data headers.
- Re-run until all tests pass.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- Why does
coerce_valuetry to convert strings instead of checking types directly? - What is the difference between
missing_columnsand a required field that is empty? - Why are violations collected per-row instead of per-column?
- How would this pattern work with a streaming data source (no full CSV in memory)?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
Stuck? Ask AI¶
If you are stuck after trying for 20 minutes, use one of these prompts:
- "I am working on Data Contract Enforcer. I got this error: [paste error]. Can you explain what this error means without giving me the fix?"
- "I am trying to detect when a data file's structure changes unexpectedly. Can you explain what 'schema drift' means with a practical example?"
- "Can you explain how to compare two schemas and identify the differences?"
| ← Prev | Home | Next → |
|---|---|---|