06 - SQL Integration (SQL & ETL Pipelines for Reporting and Dashboards)¶
Home: README
Who this is for¶
- Learners moving from file automation to data pipelines.
- Teams with SQL databases as a reporting backbone.
What you will build¶
- A Python ETL pipeline that ingests validated records into a SQL database.
- Staging and reporting tables with idempotent loads.
- A daily summary query output for dashboard use.
Prerequisites¶
- Capstone A outputs from 05_AUTOMATION_FILES_EXCEL.md.
- Basic SQL query familiarity.
- DB credentials with insert/select access in target schema.
Step-by-step lab pack¶
Lab 1 - SQL fundamentals in strict order¶
SELECTandWHERE.GROUP BYaggregates.JOINacross lookup tables.INSERTandUPDATEbasics.- transactions with commit/rollback.
Lab 2 - Table design¶
Create tables:
- staging_alerts
- alerts_reporting
Minimum metadata fields:
- source_file
- ingested_at_utc
- idempotency_key
Lab 3 - Python connection strategy¶
Preferred start: sqlite3 (built-in, no driver needed).
Optional scaling path: SQLAlchemy (supports SQLite, PostgreSQL, and more).
Connection requirements: - no hardcoded secrets, - explicit timeout, - retry for transient failures, - structured error logging.
Lab 4 - Load strategy¶
- Load raw validated rows into
staging_alerts. - Promote clean rows into
alerts_reporting. - Use
idempotency_keyto prevent duplicates.
Lab 5 - Daily summary output¶
Generate query output by: - date, - severity, - customer/site.
Export summary to output/daily_summary.csv for dashboard consumption.
Lab 6 - Custom reporting backend integration¶
For an existing reporting database: - identify existing table contracts, - map your ETL output to existing schema, - avoid direct writes to unmanaged tables until schema ownership is clear.
Lab 7 - Optional PostgreSQL extension¶
If you scale beyond SQLite: - ingest to staging only first, - normalize data types to PostgreSQL-compatible forms, - preserve source system metadata.
Expected output¶
- Repeatable ETL job with clean staging-to-reporting flow.
- No duplicate records on reruns.
- Usable CSV summary artifacts for dashboards.
Break/fix drills¶
- Force duplicate ingest and prove idempotency key blocks duplicates.
- Simulate DB timeout and confirm retries/logging.
- Insert malformed rows in staging and verify promotion filter blocks them.
Troubleshooting¶
- connection failures:
- confirm driver installation,
- validate host, db name, user,
- test least-privilege account manually.
- duplicate rows:
- inspect key generation and unique constraint.
- poor query performance:
- add indexes on date, severity, idempotency key.
Mastery check¶
You are ready for API integration when you can: - explain your table contract, - rerun ETL safely, - recover from transient DB failures, - produce daily summaries without manual edits.
Learning-style options (Play/Build/Dissect/Teach-back)¶
- Play: test different idempotency key designs.
- Build: implement full staging -> reporting pipeline.
- Dissect: explain query plans and table role boundaries.
- Teach-back: present ETL data flow and failure strategy.
Primary Sources¶
Optional Resources¶
Sample database schemas¶
- Full schema pack for staging/reporting/cache/marts:
- 13_SAMPLE_DATABASE_SCHEMAS.md