Level 5 / Project 11 - Retry Backoff Runner¶

Learn Your Way¶

Read	Build	Watch	Test	Review	Visualize	Try
—	This project	—	—	Flashcards	—	Browser

Estimated time: 80 minutes

Focus¶

exponential backoff strategy practice

Why this project exists¶

This project gives you level-appropriate practice in a realistic operations context. Goal: run the baseline, alter behavior, break one assumption, recover safely, and explain the fix.

Run (copy/paste)¶

Use <repo-root> as the folder containing this repository's README.md.

cd <repo-root>/projects/level-5/11-retry-backoff-runner
python project.py --max-retries 5 --base-delay 0.1 --output data/retry_report.json
pytest -q

Expected terminal output¶

Success after 3 retries (total delay: 0.7s)
5 passed

Expected artifacts¶

data/retry_report.json
Passing tests
Updated notes.md

Checkpoint: Baseline code runs and all tests pass. Commit your work before continuing.

Alter it (required) — Extension¶

Add jitter to the backoff delay (random variation) to prevent thundering-herd behavior.
Add a --max-retries flag that overrides the default retry count from the command line.
Log each retry attempt with the delay duration and the error that triggered it.
Re-run script and tests.

Break it (required) — Core¶

Set --max-retries 0 so no retries are allowed and the flaky function always fails.
Set --base-delay to a negative number.
Capture the first failing test or visible bad output.

Fix it (required) — Core¶

Validate that max_retries >= 1 and base_delay > 0.
Return a clear error report when all retries are exhausted.
Add tests for zero retries and negative delay.
Re-run until output and tests are deterministic.

Checkpoint: All modifications done, tests still pass. Good time to review your changes.

Explain it (teach-back)¶

How does exponential backoff calculate the delay for each retry attempt?
Why does create_flaky_function use a counter to simulate intermittent failures?
What is jitter and why does it prevent thundering-herd problems?
Where do you see retry with backoff in production (AWS SDK, HTTP clients, message queues)?

Mastery check¶

You can move on when you can: - run baseline without docs, - explain one core function line-by-line, - break and recover in one session, - keep tests passing after your change.

← Prev	Home	Next →