Handling Schema Evolution in Druid Ingestion

Apache Druid enforces an immutable segment architecture, so a schema change cannot be resolved with a DDL statement like ALTER TABLE — the structure of a segment is fixed the moment it is published. When an upstream producer renames a column, promotes a STRING to a LONG, or starts emitting a field the ingestion spec never declared, Druid does not error at the source: it coerces, drops, or silently zeroes the data, and the damage only surfaces days later as a broken dashboard or a query that returns wrong aggregates across a partition boundary. Every structural change must instead be absorbed at ingestion time, through the same schema validation gate for Druid specs that already guards submission — extended here to keep its source-column contract current as the upstream shape moves. This page is the operational runbook for detecting drift, pinning an evolution-aware spec, and re-mapping safely without corrupting the segments you already published.

Failure Modes & Diagnostics

Schema drift resolves into four deterministic signatures. Each has a distinct fingerprint in the task reports and the segment metadata, and each is diagnosable with a curl-and-jq one-liner before it spreads across intervals.

1. Type coercion violation (column_type_mismatch). A column that ingested as STRING starts arriving as LONG (or the reverse). Druid performs no implicit cast during segment merge, so the Peon throws a ClassCastException during segment creation or during a later compaction that tries to merge the two shapes. Pull the failed task's exception payload:

curl -s "http://overlord:8090/druid/indexer/v1/task/idx_events_web_ab12cd34ef56/reports" \
  | jq -r '.ingestionStatsAndErrors.payload.errorMsg'

A message naming a cast between Long and String confirms the signature; the fix is an explicit columnType in dimensionsSpec (below), not a retry.

2. Implicit column suppression (silent_drop_on_missing_dimension). A new upstream field is absent from the ingestion spec, so unless useSchemaDiscovery is enabled Druid discards it with no error. Analysts referencing the column get nulls. Diff the source's live columns against what the segment actually published, using the same broker INFORMATION_SCHEMA query the validation layer relies on:

curl -s -X POST "http://broker:8082/druid/v2/sql" -H "Content-Type: application/json" \
  --data '{"query":"SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '\''events_web'\''"}' \
  | jq -r '.[].COLUMN_NAME' | sort > /tmp/druid_cols.txt

Compare /tmp/druid_cols.txt against the upstream catalog's column list; any name present upstream but missing here is a silently dropped field.

3. Rollup aggregation inconsistency (rollup_aggregation_mismatch). A metric changes from count to doubleSum (or an aggregator's fieldName moves) without reprocessing history. Queries that span old and new segments blend two incompatible aggregations and return skewed totals. The tell is a discontinuity at the deploy boundary — compare per-interval metric sums either side of the change:

curl -s -X POST "http://broker:8082/druid/v2/sql" -H "Content-Type: application/json" \
  --data '{"query":"SELECT FLOOR(__time TO DAY) d, SUM(bytes) FROM events_web GROUP BY 1 ORDER BY 1"}' \
  | jq -r '.[] | "\(.d)\t\(.EXPR$1)"'

A step-change in the daily sum that lines up with the deploy — not with real traffic — is the fingerprint. The safe resolution is a versioned datasource (events_web_v2) or an explicit reindex of the affected intervals, never an in-place metric swap.

4. Temporal parsing failure (timestamp_format_shift). The upstream timestamp changes representation — ISO-8601 to epoch millis, say — while timestampSpec.format still declares the old one. Every row fails to parse; with a lax maxParseExceptions the task publishes an almost-empty segment instead of failing. Read the row stats:

curl -s "http://overlord:8090/druid/indexer/v1/task/idx_events_web_ab12cd34ef56/reports" \
  | jq '.ingestionStatsAndErrors.payload.rowStats.buildSegments'

A high processedWithError or unparseable against a near-zero processed confirms a format shift. Centralising timestamp normalisation upstream, and setting maxParseExceptions: 0, converts this from silent loss into a loud, catchable failure.

Target Spec & Validated JSON

The spec that survives evolution does three things the naive one does not: it pins every dimension's columnType so a type promotion is rejected rather than coerced, it enables useSchemaDiscovery so genuinely new fields are captured rather than dropped, and it holds maxParseExceptions at 0 so a timestamp shift fails the task loudly. The segmentGranularity here follows the same segment granularity settings that govern every partition boundary, so a reindex stays interval-aligned.

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "events_web_v2",
      "timestampSpec": { "column": "event_time", "format": "iso" },
      "dimensionsSpec": {
        "useSchemaDiscovery": true,
        "dimensions": [
          { "type": "string", "name": "country" },
          { "type": "string", "name": "device" },
          { "type": "long",   "name": "session_len" }
        ]
      },
      "metricsSpec": [
        { "type": "count",   "name": "rows" },
        { "type": "longSum", "name": "bytes", "fieldName": "resp_bytes" }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "MINUTE",
        "rollup": true
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": { "type": "s3", "prefixes": ["s3://lake/events/web/2026-07-03/"] },
      "inputFormat": { "type": "json" },
      "appendToExisting": false
    },
    "tuningConfig": {
      "type": "index_parallel",
      "maxRowsInMemory": 1000000,
      "maxRowsPerSegment": 5000000,
      "maxParseExceptions": 0,
      "partitionsSpec": { "type": "hashed", "targetRowsPerSegment": 5000000 }
    }
  }
}

dimensionsSpec.dimensions[*].type — declaring each dimension's type explicitly makes a STRING→LONG promotion a rejected mismatch instead of a silent coercion; the validator that owns the source-column contract compares against exactly these declarations.
dimensionsSpec.useSchemaDiscovery: true — new upstream fields are auto-typed and captured rather than dropped, closing the silent_drop_on_missing_dimension gap while the explicit list still pins the columns you care about.
dataSource: "events_web_v2" — a versioned datasource isolates a breaking aggregation change from published history, so no query blends the old and new rollup shapes.
tuningConfig.maxParseExceptions: 0 — turns a timestamp_format_shift into an immediate task failure instead of a quietly truncated segment.

Python Automation Script

The orchestrator below diffs the live source columns against the last-known schema, builds an evolution-aware spec with explicit type mapping and safe fallbacks, submits it, and polls to a terminal state with capped exponential backoff. It uses only the standard library plus requests, and mirrors the submit/poll discipline the parent async task execution patterns establish — the difference is the detect_drift step that keeps the mapping honest before anything is submitted.

import time
import requests
from typing import Dict, List, Optional

OVERLORD = "http://overlord:8090"
TERMINAL = {"SUCCESS", "FAILED"}

# Upstream catalog type -> Druid dimension type. Unknown types fall back to string.
TYPE_MAP = {
    "VARCHAR": "string", "TEXT": "string",
    "BIGINT": "long", "INTEGER": "long",
    "DOUBLE": "double", "FLOAT": "float",
}


def detect_drift(known: Dict[str, str], live: Dict[str, str]) -> Dict[str, List[str]]:
    """Compare the pinned schema to the live source. Return added/removed/retyped columns."""
    added = [c for c in live if c not in known]
    removed = [c for c in known if c not in live]
    retyped = [c for c in live if c in known and live[c] != known[c]]
    return {"added": added, "removed": removed, "retyped": retyped}


def build_evolution_aware_spec(datasource: str, input_source: Dict,
                               live_schema: Dict[str, str],
                               timestamp_col: str = "event_time") -> Dict:
    """Emit a spec that pins declared types and lets discovery capture new fields."""
    dimensions = [
        {"type": TYPE_MAP.get(t.upper(), "string"), "name": c}
        for c, t in live_schema.items() if c != timestamp_col
    ]
    return {
        "type": "index_parallel",
        "spec": {
            "dataSchema": {
                "dataSource": datasource,
                "timestampSpec": {"column": timestamp_col, "format": "iso"},
                "dimensionsSpec": {"useSchemaDiscovery": True, "dimensions": dimensions},
                "metricsSpec": [{"type": "count", "name": "rows"}],
                "granularitySpec": {
                    "type": "uniform", "segmentGranularity": "HOUR",
                    "queryGranularity": "MINUTE", "rollup": True,
                },
            },
            "ioConfig": {
                "type": "index_parallel", "inputSource": input_source,
                "inputFormat": {"type": "json"}, "appendToExisting": False,
            },
            "tuningConfig": {
                "type": "index_parallel", "maxParseExceptions": 0,
                "partitionsSpec": {"type": "hashed", "targetRowsPerSegment": 5000000},
            },
        },
    }


def submit(spec: Dict, timeout: int = 30) -> str:
    r = requests.post(f"{OVERLORD}/druid/indexer/v1/task", json=spec, timeout=timeout)
    r.raise_for_status()
    return r.json()["task"]


def poll_until_terminal(task_id: str, max_wait: int = 3600) -> str:
    """Poll status with capped exponential backoff until the task terminates."""
    deadline = time.monotonic() + max_wait
    delay = 2.0
    while time.monotonic() < deadline:
        r = requests.get(f"{OVERLORD}/druid/indexer/v1/task/{task_id}/status", timeout=15)
        r.raise_for_status()
        state = r.json()["status"]["status"]
        if state in TERMINAL:
            return state
        time.sleep(delay)
        delay = min(delay * 2, 30.0)  # cap backoff at 30s
    raise TimeoutError(f"{task_id} did not terminate in {max_wait}s")


def evolve_and_ingest(known: Dict[str, str], live: Dict[str, str],
                      input_source: Dict, base_name: str) -> Optional[str]:
    """Route on drift: a retype forces a new datasource version; otherwise append safely."""
    drift = detect_drift(known, live)
    datasource = f"{base_name}_v2" if drift["retyped"] else base_name
    spec = build_evolution_aware_spec(datasource, input_source, live)
    task_id = submit(spec)
    return poll_until_terminal(task_id)

The routing is the load-bearing part: a retyped column is a breaking change, so evolve_and_ingest writes to a versioned datasource rather than mutating history, while additive-only drift is safe to append. The source-column map handed in as live is exactly the contract the dynamic ingestion spec generation layer already produces from the upstream catalog, so drift detection reuses the same input the builder consumes rather than fetching it twice.

Verification Steps

After the evolved task reports SUCCESS, confirm the new shape actually landed rather than assuming it. First, verify the discovered and pinned columns are all present with the types you expect:

curl -s -X POST "http://broker:8082/druid/v2/sql" -H "Content-Type: application/json" \
  --data '{"query":"SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '\''events_web_v2'\''"}' \
  | jq -r '.[] | "\(.COLUMN_NAME)\t\(.DATA_TYPE)"'

Expected output — the new session_len present as LONG, no column silently missing:

country      VARCHAR
device       VARCHAR
session_len  BIGINT
bytes        BIGINT
rows         BIGINT

Then confirm the task itself dropped no rows to a parse or type failure — a clean run shows zero unparseable and zero processed-with-error:

curl -s "http://overlord:8090/druid/indexer/v1/task/idx_events_web_v2_9f3c1a/reports" \
  | jq '.ingestionStatsAndErrors.payload.rowStats.buildSegments'

{ "processed": 48210334, "processedWithError": 0, "thrownAway": 0, "unparseable": 0 }

A non-zero unparseable or processedWithError means a signature slipped past the spec — return to the diagnostics above, tighten the columnType or timestampSpec.format, and reingest the interval into the same versioned datasource.

Schema validation for Druid ingestion specs — the two-layer gate whose source-column contract this page keeps current as the upstream shape drifts.
Automating Druid ingestion specs with Python — the builder that emits the evolution-aware spec once drift has been resolved.
Batch vs streaming ingestion sync — why a retype must apply identically to both paths, or backfills and live segments diverge.

Up one level: schema validation for Druid specs is the parent that defines the structural and semantic contract this evolution runbook keeps aligned with reality.

Handling Schema Evolution in Druid Ingestion

Failure Modes & Diagnostics #

Target Spec & Validated JSON #

Python Automation Script #

Verification Steps #

Related #

Failure Modes & Diagnostics

Target Spec & Validated JSON

Python Automation Script

Verification Steps

Related