Level 7 / Project 04 - Source Field Mapper¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | — |
Focus¶
- explicit source-to-target mapping contracts
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-7/04-source-field-mapper
python project.py --input data/sample_input.txt --output data/output_summary.json
pytest -q
Expected terminal output¶
Expected artifacts¶
data/output_summary.json- Passing tests
- Updated
notes.md
Alter it (required)¶
- Add a
"datetime"cast type that parses ISO-format strings into Unix timestamps. - Add a
drop_unmappedoption that removes source fields not in the mapping rules. - Re-run script and tests — verify new cast and drop behavior work correctly.
Break it (required)¶
- Map a field with
cast: "int"but provide a non-numeric string (e.g."abc"). - Reference a source field that does not exist in the record and has no default.
- Observe the ValueError or KeyError in the output.
Fix it (required)¶
- Wrap cast operations in try/except and log a warning instead of crashing.
- Skip missing source fields when no default is configured, adding them to a
skipped_fieldsreport. - Add a test for bad cast values and missing source fields.
Explain it (teach-back)¶
- Why are explicit mapping rules better than renaming fields in-place?
- What happened when a string could not be cast to int?
- How did the try/except prevent a pipeline crash on bad data?
- Where would you use field mapping in a real ETL pipeline?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Mastery Check¶
- Can you explain the architectural trade-offs in your solution?
- Could you refactor this for a completely different use case?
- Can you identify at least two alternative approaches and explain why you chose yours?
- Could you debug this without print statements, using only breakpoint()?
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|