Gap Detection & Auto-Backfill¶
The pipeline sometimes misses data — an EirGrid endpoint 503s, a pipeline run
crashes, a partition is written with an hour's worth of nulls. Rather than let
those holes linger, a separate gapcheck Lambda runs daily, scans the last
31 days of Parquet for every (area, region) pair, and asynchronously
schedules targeted backfills for any day containing a gap.
A gaps.json summary is written to S3 alongside the area summaries and served
from the API at /api/gaps; the dashboard renders amber indicators on any
gauge or chart that has known-missing data.
See ADR-006 for the full design rationale.
What counts as a gap¶
For each (area, region) series, a gap is two or more consecutive missing
points at the area's native cadence, anywhere inside the 31-day detection
window. The most recent 6 hours are excluded so that the usual end-of-day
latency on the EirGrid upstream feed does not register as a permanent gap.
| Area | Cadence | Gap threshold (minutes missing) |
|---|---|---|
| wind, solar, demand, interconnection, co2 | 15 min | ≥ 30 |
| SNSP | 30 min | ≥ 60 |
| frequency | 1 h (hourly buckets) | ≥ 120 |
generation (fuel mix) is a daily snapshot and is not included in gap
detection — the cadence varies with how often the pipeline runs.
Data flow¶
graph LR
SCH["EventBridge<br/>cron(0 6 * * ? *)"] --> GC["gapcheck Lambda"]
GC -->|DuckDB scan| S3P["S3 Parquet<br/>(31-day window)"]
GC -->|read / update| LEDGER["S3 ledger<br/>grid-data/gaps/ledger.json"]
GC -->|async invoke per area-region-day| PL["pipeline Lambda<br/>backfill mode"]
GC -->|write| SUM["S3 gaps.json"]
PL -->|write merged| S3P
API["API Lambda"] -->|cache refresh| SUM
UI["Dashboard /ui/"] -->|GET /api/gaps| API
Backfill ledger¶
Each (area, region, day) that is dispatched for backfill is recorded in an
S3-hosted JSON ledger ({prefix}/gaps/ledger.json). The ledger prevents the
gapcheck from re-firing the same backfill every run and gives up after a
configurable number of attempts.
| Entry state | Meaning |
|---|---|
scheduled |
Async backfill accepted by Lambda; status carried to the next run |
queued |
Dispatch failed to start — retry after the cooldown expires |
failed |
Dispatch returned an error (transient); retried up to max_attempts |
permanent_failure |
Exceeded max_attempts; no further attempts |
Entries whose (area, region, day) is no longer in the latest gap set are
pruned — once a backfill closes the hole it disappears from the ledger on the
next run.
/api/gaps response¶
{
"generated_at": "2026-04-22T06:00:04Z",
"window_days": 31,
"trailing_exclusion_hours": 6,
"areas": {
"wind": {
"ROI": {
"gap_count": 1,
"missing_points": 4,
"ranges": [
{"start": "2026-04-18T14:00", "end": "2026-04-18T14:45", "points": 4}
]
}
}
},
"backfills": {
"in_flight": 3,
"permanent_failure": 0
}
}
ranges lists each contiguous gap with its start, end (inclusive), and the
number of missing points at the area's cadence. When there are no gaps the
area is simply absent from the areas map — an empty response is valid and
expected on a healthy day.
Dashboard presentation¶
- Gauges. A small amber dot appears on any gauge whose current region has at least one gap in the 31-day window. The tooltip lists the affected ranges.
- Line charts. Missing ranges render as shaded amber regions over the affected time span, with the connecting line suppressed across the gap.
- Header pill. When
backfills.in_flight > 0a small "restoring…" indicator is shown so visitors know why a gap is still visible.
Running gap-check locally¶
This runs python -m gaps.runner against the local MinIO bucket using the
same environment variables the pipeline uses. With PIPELINE_LAMBDA_ARN
unset (the default locally) no dispatches are actually made; the run still
writes the gaps.json summary, which is useful for previewing the dashboard
overlays.
Set PIPELINE_LAMBDA_ARN to exercise the real async invoke path in
staging.
Configuration¶
All four gap-detection knobs are environment-driven — see Configuration → Gap detection.
| Variable | Default | Effect |
|---|---|---|
GAP_WINDOW_DAYS |
31 |
Lookback window for detection |
GAP_TRAILING_HOURS |
6 |
Recent period excluded from detection |
GAP_COOLDOWN_HOURS |
3 |
Minimum wait between retries on the same day |
GAP_MAX_ATTEMPTS |
5 |
Attempts before an entry is marked permanent failure |