Level 5 / Project 03 - Multi File ETL Runner¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 65 minutes
Focus¶
- multi-file ingestion orchestration
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-5/03-multi-file-etl-runner
python project.py --source-dir data/sources --output data/etl_output.json --strategy deduplicate --key id
pytest -q
Expected terminal output¶
Expected artifacts¶
data/etl_output.json- Passing tests
- Updated
notes.md
Design First¶
Before writing code, sketch your approach in notes.md:
- What functions or classes do you need?
- What data structures will you use?
- What's the flow from input to output?
- What could go wrong?
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- How could you make the tool more observable — what progress information would help?
- What should happen when one file in the batch fails?
- Add a summary report with the statistics you think matter most.
Break it (required) — Core¶
- What happens when source files do not share the same structure?
- Try running with no input files at all.
- Find the first failure and capture it.
Fix it (required) — Core¶
- Add validation for the structural consistency issue you found.
- Handle the empty-input case with a clear message.
- Write tests for both edge cases and re-run until deterministic.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- What is the difference between append, deduplicate, and update merge strategies?
- Why does
merge_deduplicateuse a set to track seen keys? - What happens if two files have overlapping keys with
merge_update? - How does this ETL pattern apply to data warehouse loading in production?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
- Classes and Objects
- Errors and Debugging
- Files and Paths
- Types and Conversions
- Quiz: Classes and Objects
Stuck? Ask AI¶
If you are stuck after trying for 20 minutes, use one of these prompts:
- "I am working on Multi File ETL Runner. I got this error: [paste error]. Can you explain what this error means without giving me the fix?"
- "I am trying to process multiple files in a specific order. Can you explain the Extract-Transform-Load (ETL) pattern with a simple example?"
- "Can you explain how to use
pathlib.Path.glob()to find files matching a pattern?"
| ← Prev | Home | Next → |
|---|---|---|