Level 4 / Project 09 - Transformation Pipeline V1¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 60 minutes
Focus¶
- multi-step transform sequencing
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-4/09-transformation-pipeline-v1
python project.py --input data/sample_input.csv --output data/pipeline_output.json --steps strip_whitespace,lowercase_keys,filter_empty_rows,coerce_numbers,add_row_id
pytest -q
Expected terminal output¶
{
"steps": [
{"step": "strip_whitespace", "status": "ok", ...},
...
],
"output_records": 5
}
8 passed
Expected artifacts¶
data/pipeline_output.json— transformed data with step log- Passing tests
- Updated
notes.md
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- Add a
transform_rename_columnsstep that accepts a rename mapping (e.g.,Name -> full_name). - Add a
--dry-runflag that shows the step log without writing output. - Re-run script and tests — add a test for the rename transform.
Break it (required) — Core¶
- Pass an unknown step name in
--stepsand verify it is logged as skipped. - Reorder the steps (e.g.,
add_row_idbeforefilter_empty_rows) and observe the difference. - Feed it a CSV where all rows are empty and see what
filter_empty_rowsdoes.
Fix it (required) — Core¶
- Add step ordering validation — warn if
add_row_idruns beforefilter_empty_rows. - Handle the case where an input CSV has no rows (only headers) gracefully.
- Re-run until all tests pass.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- Why are transforms written as pure functions (no side effects)?
- What is the
TRANSFORMSregistry pattern and why is it useful? - Why does the step log track
records_beforeandrecords_after? - How would you add error handling so one failing step does not crash the whole pipeline?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|