Skip to content

Gap Detection & Auto-Backfill

The pipeline sometimes misses data — an EirGrid endpoint 503s, a pipeline run crashes, a partition is written with an hour's worth of nulls. Rather than let those holes linger, a separate gapcheck Lambda runs daily, scans the last 31 days of Parquet for every (area, region) pair, and asynchronously schedules targeted backfills for any day containing a gap.

A gaps.json summary is written to S3 alongside the area summaries and served from the API at /api/gaps; the dashboard renders amber indicators on any gauge or chart that has known-missing data.

See ADR-006 for the full design rationale.

What counts as a gap

For each (area, region) series, a gap is two or more consecutive missing points at the area's native cadence, anywhere inside the 31-day detection window. The most recent 6 hours are excluded so that the usual end-of-day latency on the EirGrid upstream feed does not register as a permanent gap.

Area Cadence Gap threshold (minutes missing)
wind, solar, demand, interconnection, co2 15 min ≥ 30
SNSP 30 min ≥ 60
frequency 1 h (hourly buckets) ≥ 120

generation (fuel mix) is a daily snapshot and is not included in gap detection — the cadence varies with how often the pipeline runs.

Data flow

graph LR
    SCH["EventBridge<br/>cron(0 6 * * ? *)"] --> GC["gapcheck Lambda"]
    GC -->|DuckDB scan| S3P["S3 Parquet<br/>(31-day window)"]
    GC -->|read / update| LEDGER["S3 ledger<br/>grid-data/gaps/ledger.json"]
    GC -->|async invoke per area-region-day| PL["pipeline Lambda<br/>backfill mode"]
    GC -->|write| SUM["S3 gaps.json"]
    PL -->|write merged| S3P
    API["API Lambda"] -->|cache refresh| SUM
    UI["Dashboard /ui/"] -->|GET /api/gaps| API

Backfill ledger

Each (area, region, day) that is dispatched for backfill is recorded in an S3-hosted JSON ledger ({prefix}/gaps/ledger.json). The ledger prevents the gapcheck from re-firing the same backfill every run and gives up after a configurable number of attempts.

Entry state Meaning
scheduled Async backfill accepted by Lambda; status carried to the next run
queued Dispatch failed to start — retry after the cooldown expires
failed Dispatch returned an error (transient); retried up to max_attempts
permanent_failure Exceeded max_attempts; no further attempts

Entries whose (area, region, day) is no longer in the latest gap set are pruned — once a backfill closes the hole it disappears from the ledger on the next run.

/api/gaps response

{
  "generated_at": "2026-04-22T06:00:04Z",
  "window_days": 31,
  "trailing_exclusion_hours": 6,
  "areas": {
    "wind": {
      "ROI": {
        "gap_count": 1,
        "missing_points": 4,
        "ranges": [
          {"start": "2026-04-18T14:00", "end": "2026-04-18T14:45", "points": 4}
        ]
      }
    }
  },
  "backfills": {
    "in_flight": 3,
    "permanent_failure": 0
  }
}

ranges lists each contiguous gap with its start, end (inclusive), and the number of missing points at the area's cadence. When there are no gaps the area is simply absent from the areas map — an empty response is valid and expected on a healthy day.

Dashboard presentation

  • Gauges. A small amber dot appears on any gauge whose current region has at least one gap in the 31-day window. The tooltip lists the affected ranges.
  • Line charts. Missing ranges render as shaded amber regions over the affected time span, with the connecting line suppressed across the gap.
  • Header pill. When backfills.in_flight > 0 a small "restoring…" indicator is shown so visitors know why a gap is still visible.

Running gap-check locally

make gap-check

This runs python -m gaps.runner against the local MinIO bucket using the same environment variables the pipeline uses. With PIPELINE_LAMBDA_ARN unset (the default locally) no dispatches are actually made; the run still writes the gaps.json summary, which is useful for previewing the dashboard overlays.

Set PIPELINE_LAMBDA_ARN to exercise the real async invoke path in staging.

Configuration

All four gap-detection knobs are environment-driven — see Configuration → Gap detection.

Variable Default Effect
GAP_WINDOW_DAYS 31 Lookback window for detection
GAP_TRAILING_HOURS 6 Recent period excluded from detection
GAP_COOLDOWN_HOURS 3 Minimum wait between retries on the same day
GAP_MAX_ATTEMPTS 5 Attempts before an entry is marked permanent failure