Solution: Level 8 / Project 01 - Dashboard KPI Assembler¶
STOP -- Have you attempted this project yourself first?
Learning happens in the struggle, not in reading answers. Spend at least 20 minutes trying before reading this solution. If you are stuck, try the Walkthrough first -- it guides your thinking without giving away the answer.
Complete solution¶
"""Dashboard KPI Assembler -- aggregate metrics from multiple sources into a unified dashboard."""
from __future__ import annotations
import argparse
import json
import math
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
from typing import Any
# --- Domain types -------------------------------------------------------
# WHY Enum for status? -- Traffic-light status is a closed set of values.
# Using an Enum prevents typos ("grren") and enables IDE autocomplete.
# The .value attribute gives the JSON-friendly string when serializing.
class KPIStatus(Enum):
GREEN = "green"
YELLOW = "yellow"
RED = "red"
# WHY embed evaluate() on KPIDefinition? -- This is the Information Expert
# pattern: the object that holds the threshold data is the one that decides
# the status colour. Putting evaluation logic elsewhere would scatter
# knowledge about thresholds across multiple locations.
@dataclass
class KPIDefinition:
name: str
unit: str
green_threshold: float # values <= this are green
yellow_threshold: float # values <= this (but > green) are yellow; above is red
def evaluate(self, value: float) -> KPIStatus:
if value <= self.green_threshold:
return KPIStatus.GREEN
if value <= self.yellow_threshold:
return KPIStatus.YELLOW
return KPIStatus.RED
@dataclass
class MetricSample:
source: str
kpi_name: str
timestamp: str
value: float
@dataclass
class KPISummary:
name: str
unit: str
sample_count: int
mean: float
p95: float
minimum: float
maximum: float
status: KPIStatus
trend: str # "improving", "stable", "degrading"
# WHY a Dashboard dataclass with count fields? -- Pre-computing red/yellow/green
# counts at assembly time avoids re-iterating the KPI list every time a consumer
# needs the summary. The overall_health field gives a single top-level verdict.
@dataclass
class Dashboard:
title: str
kpis: list[KPISummary] = field(default_factory=list)
red_count: int = 0
yellow_count: int = 0
green_count: int = 0
overall_health: str = "unknown"
# --- Statistical helpers ------------------------------------------------
# WHY nearest-rank percentile? -- It is the simplest percentile method and
# matches what most monitoring dashboards display. More complex interpolation
# methods (e.g. linear) add precision but also complexity that distracts
# from the core lesson of threshold-based evaluation.
def percentile(values: list[float], pct: float) -> float:
if not values:
return 0.0
sorted_v = sorted(values)
# WHY math.ceil? -- Nearest-rank: the smallest value whose rank
# is >= the requested percentile. ceil ensures we never undershoot.
rank = math.ceil(pct / 100.0 * len(sorted_v)) - 1
return sorted_v[max(0, rank)]
# WHY split-half trend detection? -- Comparing first-half mean to second-half
# mean is a lightweight approach that doesn't require scipy or numpy.
# The 10% threshold avoids noise: small fluctuations report "stable".
def compute_trend(values: list[float]) -> str:
# WHY minimum 4 samples? -- With fewer than 4, the halves contain
# 1-2 values each, making the comparison statistically meaningless.
if len(values) < 4:
return "stable"
mid = len(values) // 2
first_mean = sum(values[:mid]) / mid
second_mean = sum(values[mid:]) / (len(values) - mid)
if first_mean == 0:
return "stable"
change_pct = (second_mean - first_mean) / abs(first_mean) * 100
# WHY "lower is better"? -- For latency-style KPIs, a decrease is
# improvement. This convention matches how Grafana and Datadog render trends.
if change_pct < -10:
return "improving"
if change_pct > 10:
return "degrading"
return "stable"
# --- Core logic ---------------------------------------------------------
def load_kpi_definitions(raw: list[dict[str, Any]]) -> list[KPIDefinition]:
# WHY parse into typed objects? -- Working with dicts throughout the
# codebase invites KeyError bugs. Typed dataclasses catch missing fields
# at construction time and give IDE support downstream.
return [
KPIDefinition(
name=d["name"],
unit=d.get("unit", ""),
green_threshold=float(d["green_threshold"]),
yellow_threshold=float(d["yellow_threshold"]),
)
for d in raw
]
def load_metric_samples(raw: list[dict[str, Any]]) -> list[MetricSample]:
return [
MetricSample(
source=s["source"],
kpi_name=s["kpi_name"],
timestamp=s.get("timestamp", ""),
value=float(s["value"]),
)
for s in raw
]
def aggregate_kpi(
definition: KPIDefinition,
samples: list[MetricSample],
) -> KPISummary:
# WHY filter samples by kpi_name here? -- Each call aggregates one KPI.
# Filtering inside the function keeps the caller simple (pass all samples).
values = [s.value for s in samples if s.kpi_name == definition.name]
if not values:
return KPISummary(
name=definition.name, unit=definition.unit,
sample_count=0, mean=0.0, p95=0.0,
minimum=0.0, maximum=0.0,
status=KPIStatus.GREEN, trend="stable",
)
mean_val = sum(values) / len(values)
# WHY evaluate on mean? -- The mean is the primary aggregation for
# threshold comparison. The p95 is reported for context but doesn't
# drive the traffic-light status in this design.
return KPISummary(
name=definition.name,
unit=definition.unit,
sample_count=len(values),
mean=round(mean_val, 2),
p95=round(percentile(values, 95), 2),
minimum=round(min(values), 2),
maximum=round(max(values), 2),
status=definition.evaluate(mean_val),
trend=compute_trend(values),
)
def assemble_dashboard(
title: str,
definitions: list[KPIDefinition],
samples: list[MetricSample],
) -> Dashboard:
dashboard = Dashboard(title=title)
for defn in definitions:
summary = aggregate_kpi(defn, samples)
dashboard.kpis.append(summary)
if summary.status == KPIStatus.RED:
dashboard.red_count += 1
elif summary.status == KPIStatus.YELLOW:
dashboard.yellow_count += 1
else:
dashboard.green_count += 1
# WHY worst-status-wins for overall_health? -- A single red KPI means
# the system needs attention. This mirrors how Grafana dashboards show
# a red banner if any panel is alerting.
if dashboard.red_count > 0:
dashboard.overall_health = "critical"
elif dashboard.yellow_count > 0:
dashboard.overall_health = "warning"
else:
dashboard.overall_health = "healthy"
return dashboard
def dashboard_to_dict(dashboard: Dashboard) -> dict[str, Any]:
# WHY a separate serialization function? -- Keeps the Dashboard dataclass
# free of JSON concerns. You could swap this for Protobuf or MessagePack
# without touching the domain model.
return {
"title": dashboard.title,
"overall_health": dashboard.overall_health,
"counts": {
"red": dashboard.red_count,
"yellow": dashboard.yellow_count,
"green": dashboard.green_count,
},
"kpis": [
{
"name": k.name, "unit": k.unit,
"sample_count": k.sample_count, "mean": k.mean,
"p95": k.p95, "min": k.minimum, "max": k.maximum,
"status": k.status.value, "trend": k.trend,
}
for k in dashboard.kpis
],
}
# --- CLI ----------------------------------------------------------------
def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Assemble KPI data from multiple sources into a dashboard."
)
parser.add_argument("--input", default="data/sample_input.json")
parser.add_argument("--output", default="data/dashboard_output.json")
parser.add_argument("--title", default="Operations Dashboard")
return parser.parse_args(argv)
def main(argv: list[str] | None = None) -> None:
args = parse_args(argv)
input_path = Path(args.input)
if not input_path.exists():
raise SystemExit(f"Input file not found: {input_path}")
raw = json.loads(input_path.read_text(encoding="utf-8"))
definitions = load_kpi_definitions(raw["kpi_definitions"])
samples = load_metric_samples(raw["samples"])
dashboard = assemble_dashboard(args.title, definitions, samples)
output = dashboard_to_dict(dashboard)
out_path = Path(args.output)
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(json.dumps(output, indent=2), encoding="utf-8")
print(json.dumps(output, indent=2))
if __name__ == "__main__":
main()
Design decisions¶
| Decision | Why | Alternative considered |
|---|---|---|
| Evaluate status on mean, not p95 | Mean gives a stable central measure for threshold comparison; p95 is reported for context | Evaluate on p95 -- better for latency KPIs but overly sensitive for throughput metrics |
| Split-half trend detection | Zero-dependency approach that works without numpy/scipy | Linear regression -- more accurate but adds a heavy dependency for a simple dashboard |
| Worst-status-wins for overall health | A single failing KPI warrants attention; mirrors Grafana/Datadog behaviour | Majority voting -- hides critical issues if most KPIs are green |
Separate dashboard_to_dict function |
Decouples domain model from serialization format | to_dict() method on Dashboard -- couples serialization to the dataclass |
Alternative approaches¶
Approach B: Pandas-based aggregation¶
import pandas as pd
def aggregate_kpi_pandas(definition, samples_df):
filtered = samples_df[samples_df["kpi_name"] == definition.name]
return {
"mean": filtered["value"].mean(),
"p95": filtered["value"].quantile(0.95),
"min": filtered["value"].min(),
"max": filtered["value"].max(),
}
Trade-off: Pandas makes aggregation one-liners but adds a 30MB dependency. For a dashboard assembler processing thousands of KPIs, the DataFrame overhead could actually be slower than pure Python list comprehensions. Use Pandas when you need complex groupby operations or already have it in the stack.
Common pitfalls¶
| Scenario | What happens | Prevention |
|---|---|---|
| KPI with zero samples | Division by zero in mean calculation | Return a default KPISummary with 0.0 values and GREEN status |
| Trend with fewer than 4 data points | Split-half comparison is meaningless with 1-2 values per half | Return "stable" for any KPI with fewer than 4 samples |
| first_mean is exactly 0.0 | Division by zero in percentage change calculation | Guard with if first_mean == 0: return "stable" |