Configuring Druid Native Compaction Rules

Engineers usually reach this page after compaction has quietly stopped doing its job: segment counts climb, query scans fan out over hundreds of sub-optimal shards, and yet the Overlord shows no failed tasks. Apache Druid native compaction is a deterministic background reconciliation process that merges fragmented segments, enforces row limits, and realigns physical storage with query patterns — but a single misplaced field (skipOffsetFromLatest below your watermark lag, a byte target that no longer exists, or a missing lock mode) makes it stall silently rather than error loudly. This page details the exact spec grammar and diagnostic signatures for the rules themselves; for the timing layer that decides when the Coordinator submits these tasks, see the parent guide on automated compaction task scheduling.

Failure Modes & Diagnostics

Compaction failures rarely arrive as HTTP errors. They show up as segment bloat, Historical node OOM, or a Coordinator duty that skips intervals it should be merging. Diagnose from the REST APIs before touching config.

Check what the Coordinator currently believes about a datasource's compaction backlog. bytesAwaitingCompaction staying flat across cycles is the primary signal that a rule is misconfigured, not that work is done:

curl -s "http://coordinator:8081/druid/coordinator/v1/compaction/progress?dataSource=analytics_events" | jq
curl -s "http://coordinator:8081/druid/coordinator/v1/compaction/status?dataSource=analytics_events" \
  | jq '.latestStatus[] | {dataSource, scheduleStatus, bytesAwaitingCompaction, segmentCountAwaitingCompaction}'

Confirm the effective compaction config actually persisted — a POST that returns 200 can still leave a datasource with no config if the payload was malformed:

curl -s "http://coordinator:8081/druid/coordinator/v1/config/compaction" \
  | jq '.compactionConfigs[] | select(.dataSource=="analytics_events")'

Inspect the real segment size distribution. Sub-optimal shards below ~256 MB compressed are the reason query planners fan out; count them straight from the metadata sys table:

curl -s "http://broker:8082/druid/v2/sql" -H 'Content-Type: application/json' -d '{
  "query": "SELECT COUNT(*) small_segs FROM sys.segments WHERE datasource='"'"'analytics_events'"'"' AND is_active=1 AND \"size\" < 268435456"
}' | jq

When compaction runs but the Coordinator heap climbs, the duty is queuing more tasks than the deployment can drain. Watch old-gen GC on the Coordinator JVM and cross-check against running compact tasks on the Overlord:

jstat -gcutil "$(pgrep -f coordinator)" 1000 5
curl -s "http://overlord:8090/druid/indexer/v1/runningTasks" \
  | jq '[.[] | select(.type=="compact")] | length'

The recurring signatures and their fixes:

Symptom	Root cause	Resolution
`TaskLock` contention, tasks stuck `PENDING`	`skipOffsetFromLatest` shorter than ingestion watermark lag, so compaction collides with active ingestion	Raise the offset above observed lag (e.g. `PT3H`); derive it from monitoring, not a guess
Coordinator old-gen GC > 85%	Unbounded compaction queue; no slot cap	Cap `maxCompactionTaskSlots` / `compactionTaskSlotRatio`; keep effective slots small and finite
Same interval recompacted every cycle	Row target too low for the data's average row size, so output never reaches the target band	Raise `maxRowsPerSegment` / `targetRowsPerSegment` until compressed output clears ~256 MB
`Unknown property targetCompactionSizeBytes`	Byte-based sizing field removed in Druid 0.21	Switch to row targets in `tuningConfig` or a `partitionsSpec`
Historical memory pressure during business hours	Compaction scan window overlaps peak query scans	Bias `maxCompactionTaskSlots` higher off-peak; align cadence with the query window

Target Spec & Validated JSON

There are two surfaces, and they are not interchangeable. A manual compact task is a one-shot job you POST to the Overlord; a DataSourceCompactionConfig is the recurring policy the Coordinator's compaction duty evaluates. skipOffsetFromLatest belongs only to the latter.

A minimal, valid one-shot compact task. It enforces rollup, caps segment size by rows, bounds concurrency, and holds an exclusive lock so no other task interleaves writes on the interval:

{
  "type": "compact",
  "dataSource": "analytics_events",
  "ioConfig": {
    "type": "compact",
    "inputSpec": {
      "type": "interval",
      "interval": "2024-01-01/2024-02-01"
    }
  },
  "granularitySpec": {
    "segmentGranularity": "DAY",
    "queryGranularity": "HOUR",
    "rollup": true
  },
  "tuningConfig": {
    "type": "index_parallel",
    "maxRowsPerSegment": 5000000,
    "maxNumConcurrentSubTasks": 4,
    "partitionsSpec": {
      "type": "dynamic",
      "maxRowsPerSegment": 5000000
    }
  },
  "context": {
    "forceTimeChunkLock": true
  }
}

The recurring auto-compaction policy for the same datasource. This is the payload for POST /druid/coordinator/v1/config/compaction, and it is where skipOffsetFromLatest lives:

{
  "dataSource": "analytics_events",
  "skipOffsetFromLatest": "PT3H",
  "taskPriority": 25,
  "tuningConfig": {
    "type": "index_parallel",
    "partitionsSpec": {
      "type": "dynamic",
      "maxRowsPerSegment": 5000000
    },
    "maxNumConcurrentSubTasks": 4
  },
  "granularitySpec": {
    "segmentGranularity": "DAY",
    "queryGranularity": "HOUR",
    "rollup": true
  }
}

The three fields that break production most often:

skipOffsetFromLatest must exceed your ingestion pipeline's maximum watermark lag. Set below actual latency it produces TaskLock conflicts against active ingestion and stalls the newest intervals. It has no meaning in a manual task — only the Coordinator duty reads it.
Output size is row-driven. The byte-based targetCompactionSizeBytes was removed in Druid 0.21; use maxRowsPerSegment (dynamic partitioning) or targetRowsPerSegment (hash/range partitionsSpec). Choose a target that yields compressed segments in the 500 MB–1 GB band — the same band the segment size optimization strategies guide derives from measured bytes-per-row. The full row-vs-concurrency calibration lives in compaction threshold tuning.
forceTimeChunkLock: true forces an exclusive lock over the whole time chunk rather than per-segment locks, so concurrent tasks cannot interleave writes on the interval. The segmentGranularity you compact to must match how the segment granularity settings already partition the timeline, or the duty rewrites chunks every cycle.

Rules must also respect retention. If a datasource's TTL window is shorter than its compaction cadence, the duty burns compute rewriting segments a kill task then purges — coordinate the two through TTL mapping and data expiration so skipOffsetFromLatest and the drop rules do not overlap wastefully.

Python Automation Script

Applying and submitting compaction rules must be idempotent: a retried POST should re-apply the same config, never spawn drift. This orchestrator uses only the standard library plus requests, mounts an exponential-backoff retry policy, and re-reads the config to confirm it persisted. For the submit-and-poll primitives it builds on, see asynchronous task execution patterns.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

COORDINATOR = "http://coordinator:8081"
OVERLORD = "http://overlord:8090"


def _session(retries: int = 4) -> requests.Session:
    strategy = Retry(
        total=retries,
        backoff_factor=1.5,  # 0s, 1.5s, 3s, 6s between attempts
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET", "POST"],
    )
    session = requests.Session()
    adapter = HTTPAdapter(max_retries=strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)
    return session


def apply_compaction_config(config: dict) -> dict:
    """Idempotently apply a DataSourceCompactionConfig and verify it persisted."""
    session = _session()
    resp = session.post(
        f"{COORDINATOR}/druid/coordinator/v1/config/compaction",
        json=config,
        headers={"Content-Type": "application/json"},
        timeout=(5, 30),
    )
    resp.raise_for_status()

    ds = config["dataSource"]
    current = session.get(
        f"{COORDINATOR}/druid/coordinator/v1/config/compaction",
        timeout=(5, 30),
    ).json()
    applied = next(
        (c for c in current.get("compactionConfigs", []) if c["dataSource"] == ds),
        None,
    )
    if applied is None:
        raise RuntimeError(f"compaction config for {ds} did not persist")
    return applied


def submit_compact_task(spec: dict) -> str:
    """POST a one-shot compact task to the Overlord; returns the task id."""
    session = _session()
    resp = session.post(
        f"{OVERLORD}/druid/indexer/v1/task",
        json=spec,
        headers={"Content-Type": "application/json"},
        timeout=(5, 30),
    )
    resp.raise_for_status()
    return resp.json()["task"]

Verification Steps

After applying a rule, confirm three things: the config persisted, tasks are running, and the backlog is draining.

Verify the policy is registered against the datasource:

curl -s "http://coordinator:8081/druid/coordinator/v1/config/compaction" \
  | jq '.compactionConfigs[] | select(.dataSource=="analytics_events") | {dataSource, skipOffsetFromLatest, tuningConfig}'

Expected — the config echoes back with your offset and row target:

{
  "dataSource": "analytics_events",
  "skipOffsetFromLatest": "PT3H",
  "tuningConfig": {
    "type": "index_parallel",
    "partitionsSpec": { "type": "dynamic", "maxRowsPerSegment": 5000000 }
  }
}

Confirm the backlog is shrinking across two successive polls a Coordinator period apart:

curl -s "http://coordinator:8081/druid/coordinator/v1/compaction/progress?dataSource=analytics_events" \
  | jq '{bytesAwaitingCompaction}'

Expected — the figure trends toward zero as replacement segments publish:

{ "bytesAwaitingCompaction": 0 }

Finally, confirm compacted output landed in the target band and small-segment count fell:

curl -s "http://broker:8082/druid/v2/sql" -H 'Content-Type: application/json' -d '{
  "query": "SELECT COUNT(*) small_segs FROM sys.segments WHERE datasource='"'"'analytics_events'"'"' AND is_active=1 AND \"size\" < 268435456"
}' | jq

Expected — near-zero remaining sub-optimal shards on the compacted intervals:

[ { "small_segs": 0 } ]

Automated Compaction Task Scheduling — the timing layer: how the Coordinator duty decides which intervals to compact and when.
Compaction Threshold Tuning — calibrate the row, size, and concurrency thresholds that set how aggressively these rules fire.
TTL Mapping and Data Expiration — retention and kill-task grammar that compaction cadence must respect to avoid rewriting doomed segments.

Up one level: Automated Compaction Task Scheduling.

Configuring Druid Native Compaction Rules

Failure Modes & Diagnostics #

Target Spec & Validated JSON #

Python Automation Script #

Verification Steps #

Related #

Failure Modes & Diagnostics

Target Spec & Validated JSON

Python Automation Script

Verification Steps

Related