Level 4 / Project 02 - Excel Input Health Check¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | Browser |
Estimated time: 45 minutes
Focus¶
- tabular input quality scoring
Why this project exists¶
This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.
Run (copy/paste)¶
Use <repo-root> as the folder containing this repository's README.md.
cd <repo-root>/projects/level-4/02-excel-input-health-check
python project.py --input data/sample_input.csv --output data/health_report.json
pytest -q
Expected terminal output¶
{
"file": "data/sample_input.csv",
"status": "WARN",
"headers": { ... },
"completeness": { "short_rows": [6], ... }
}
6 passed
Expected artifacts¶
data/health_report.json— structured quality report- Passing tests
- Updated
notes.md
Worked Example¶
Here is a similar (but different) problem, solved step by step.
Problem: Write a function that scores the quality of a list of records based on completeness.
Step 1: Define what "quality" means. Count how many fields are empty or missing in each record.
def score_completeness(records, required_fields):
total_checks = len(records) * len(required_fields)
if total_checks == 0:
return {"score": 1.0, "missing": []}
missing = []
for i, record in enumerate(records):
for field in required_fields:
value = record.get(field, "")
if not str(value).strip():
missing.append({"row": i, "field": field})
score = 1.0 - (len(missing) / total_checks)
return {"score": round(score, 3), "missing": missing}
Step 2: Interpret the score. 1.0 means every field is filled. 0.5 means half are missing. 0.0 means everything is empty.
Step 3: Add thresholds. PASS if score >= 0.95, WARN if >= 0.8, FAIL otherwise.
The thought process: Define measurable quality criteria. Score them numerically. Set thresholds for actionable categories. This is the same pattern the health check project uses.
Design First¶
Before writing code, sketch your approach in notes.md:
- What functions or classes do you need?
- What data structures will you use?
- What's the flow from input to output?
- What could go wrong?
Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.
Alter it (required) — Extension¶
- What additional quality check would catch problems the current tool misses?
- Can you make the tool support more than one input format?
- Write a test to verify your new check works.
Break it (required) — Core¶
- Create messy input data — what quality issues does the tool miss?
- What happens with files that use unexpected encodings?
- Try extreme cases (huge headers, wildly inconsistent rows) — where does it struggle?
Fix it (required) — Core¶
- Add a safeguard so the report stays manageable even with very messy input.
- Handle the encoding issue you discovered.
- Re-run until all tests pass.
Checkpoint: All modifications done, tests still pass. Good time to review your changes.
Explain it (teach-back)¶
- Why does
detect_delimiteruse the minimum count across rows instead of the average? - What is the difference between
check_headersandcheck_row_completeness— could they be merged? - Why does the health check return early when encoding fails instead of continuing?
- How would you extend this to support Excel
.xlsxfiles?
Mastery check¶
You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.
Related Concepts¶
Stuck? Ask AI¶
If you are stuck after trying for 20 minutes, use one of these prompts:
- "I am working on Excel Input Health Check. I got this error: [paste error]. Can you explain what this error means without giving me the fix?"
- "I am trying to detect whether a CSV uses commas or tabs as delimiters. Can you explain
csv.Snifferwith a simple example?" - "Can you explain how to calculate a completeness score for tabular data?"
| ← Prev | Home | Next → |
|---|---|---|