06 - SQL Integration (SQL & ETL Pipelines for Reporting and Dashboards)¶

Home: README

Who this is for¶

Learners moving from file automation to data pipelines.
Teams with SQL databases as a reporting backbone.

What you will build¶

A Python ETL pipeline that ingests validated records into a SQL database.
Staging and reporting tables with idempotent loads.
A daily summary query output for dashboard use.

Prerequisites¶

Capstone A outputs from 05_AUTOMATION_FILES_EXCEL.md.
Basic SQL query familiarity.
DB credentials with insert/select access in target schema.

Step-by-step lab pack¶

Lab 1 - SQL fundamentals in strict order¶

SELECT and WHERE.
GROUP BY aggregates.
JOIN across lookup tables.
INSERT and UPDATE basics.
transactions with commit/rollback.

Lab 2 - Table design¶

Create tables: - staging_alerts - alerts_reporting

Minimum metadata fields: - source_file - ingested_at_utc - idempotency_key

Lab 3 - Python connection strategy¶

Preferred start: sqlite3 (built-in, no driver needed). Optional scaling path: SQLAlchemy (supports SQLite, PostgreSQL, and more).

Connection requirements: - no hardcoded secrets, - explicit timeout, - retry for transient failures, - structured error logging.

Lab 4 - Load strategy¶

Load raw validated rows into staging_alerts.
Promote clean rows into alerts_reporting.
Use idempotency_key to prevent duplicates.

Lab 5 - Daily summary output¶

Generate query output by: - date, - severity, - customer/site.

Export summary to output/daily_summary.csv for dashboard consumption.

Lab 6 - Custom reporting backend integration¶

For an existing reporting database: - identify existing table contracts, - map your ETL output to existing schema, - avoid direct writes to unmanaged tables until schema ownership is clear.

Lab 7 - Optional PostgreSQL extension¶

If you scale beyond SQLite: - ingest to staging only first, - normalize data types to PostgreSQL-compatible forms, - preserve source system metadata.

Expected output¶

Repeatable ETL job with clean staging-to-reporting flow.
No duplicate records on reruns.
Usable CSV summary artifacts for dashboards.

Break/fix drills¶

Force duplicate ingest and prove idempotency key blocks duplicates.
Simulate DB timeout and confirm retries/logging.
Insert malformed rows in staging and verify promotion filter blocks them.

Troubleshooting¶

connection failures:
confirm driver installation,
validate host, db name, user,
test least-privilege account manually.
duplicate rows:
inspect key generation and unique constraint.
poor query performance:
add indexes on date, severity, idempotency key.

Mastery check¶

You are ready for API integration when you can: - explain your table contract, - rerun ETL safely, - recover from transient DB failures, - produce daily summaries without manual edits.