Level 8 / Project 09 - Graceful Degradation Engine¶
Home: README
Learn Your Way¶
| Read | Build | Watch | Test | Review | Visualize | Try |
|---|---|---|---|---|---|---|
| — | This project | — | — | Flashcards | — | — |
Focus¶
- Circuit breaker pattern: closed, open, half-open state machine
- Sliding window error-rate calculation with
collections.deque - Service tier degradation: full, reduced, minimal, offline
- Feature flags tied to degradation tiers
- Recovery testing with half-open state and request limits
Why this project exists¶
Production systems must degrade gracefully rather than fail completely. When a database slows down, the right response is not a 500 error page — it is disabling non-essential features (search, recommendations, exports) while keeping core functionality alive. This project implements a circuit-breaker-style degradation engine that monitors error rates and progressively reduces service quality — the same pattern used by Netflix, AWS, and every major cloud platform to maintain availability during partial outages.
Run (copy/paste)¶
cd <repo-root>/projects/level-8/09-graceful-degradation-engine
python project.py --window 20
pytest -q
Expected terminal output¶
{
"final_status": {"circuit_state": "closed", "service_tier": "full", ...},
"timeline": [
{"step": 0, "circuit_state": "closed", "service_tier": "full", ...},
...
]
}
7 passed
Expected artifacts¶
- Console JSON output with degradation timeline
- Passing tests
- Updated
notes.md
Alter it (required)¶
- Add a
HALF_OPENstate to the circuit breaker that lets a single request through to test recovery. - Add a
recovery_time_secondsparameter that controls how long the circuit stays open before testing. - Add per-tier feature lists (e.g. tier DEGRADED disables search but keeps core reads).
Break it (required)¶
- Set
failure_threshold=0— does the engine immediately open the circuit? - Record successes rapidly after failures — does the sliding window correctly age out old entries?
- Set
window_size=0onSlidingWindowTracker— what happens to error rate calculation?
Fix it (required)¶
- Validate that
failure_threshold > 0andwindow_size > 0in__init__. - Add a guard for division by zero in error rate calculation when the window is empty.
- Add a test for the HALF_OPEN to CLOSED recovery transition.
Explain it (teach-back)¶
- What is the circuit breaker pattern and how does it differ from simple retry logic?
- How does the sliding window tracker calculate error rate and why is it time-based?
- What are service tiers (FULL, DEGRADED, MINIMAL) and how do real systems use them?
- Why is graceful degradation preferable to a complete outage?
Mastery check¶
You can move on when you can: - draw the state machine for CLOSED to OPEN to HALF_OPEN to CLOSED, - explain why sliding windows are better than cumulative counters for error rates, - describe a real-world degradation scenario (e.g. Netflix disabling recommendations), - add a new tier with specific feature restrictions without modifying existing tiers.
Related Concepts¶
| ← Prev | Home | Next → |
|---|---|---|