Configuring Segment Retention Policies in Apache Druid

Engineers hit this page after one of two symptoms: queries suddenly return partial results because Historicals have unloaded segments a Broker still routes against, or deep storage and the metadata database grow without bound because nothing ever expires. Both are retention-policy defects. Retention in Apache Druid is a deterministic, rule-based lifecycle mechanism orchestrated exclusively by the Coordinator, moving segments from active (loaded and served) to unused (unloaded from Historicals) and finally to permanently deleted from deep storage via kill tasks. Because those retention-rule mutations are a privileged WRITE action on a datasource, they belong under security boundaries for segment access — a mis-scoped role can silently rewrite the lifecycle of a tenant's data. This page gives you the diagnostic one-liners, a validated rule spec, an idempotent Python enforcer, and the exact verification commands to confirm the fix landed.

Retention rules are evaluated per datasource with first-match-wins semantics, so ordering is decisive. The diagram traces a segment interval through a typical load → drop-before → drop-forever rule chain.

Failure Modes & Diagnostics

Retention misconfigurations manifest through a small set of predictable, high-impact failure modes. Each has a shell one-liner that isolates it against the live Coordinator and Overlord REST APIs.

1. Premature segment eviction. The most common defect is inverted rule ordering, where dropBeforeByPeriod precedes loadByPeriod. First-match-wins then evicts recent segments that the query routing and segment discovery layer still expects to be loadable, producing missing-segment errors. Dump the effective rule chain and confirm load rules come first:

curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/rules/<datasource>" \
  | jq '.[] | {type, period}'

2. Coordinator sync lag. State propagation stalls on ZooKeeper session timeouts or network partitions, so the metadata store's used counts diverge from what Historicals actually hold. Compare the load queue against the served set:

# Segments the Coordinator still wants loaded/dropped but hasn't reconciled
curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/loadqueue?simple" \
  | jq 'to_entries | map({server: .key, todo: (.value.segmentsToLoad + .value.segmentsToDrop)})'

A persistently non-empty todo during a quiet ingestion window points at a stuck duty cycle rather than normal churn. Correlate with Coordinator heap pressure — a duty cycle that runs long under GC pauses looks identical from the outside:

jstat -gcutil $(pgrep -f org.apache.druid.cli.Main.*coordinator) 2000 5

3. Orphaned segments in deep storage. When automatic kill is disabled (druid.coordinator.kill.on=false) or the Coordinator/Overlord service account lacks delete permission on the object store, segments marked unused never leave deep storage. List what has been dropped but not reclaimed:

curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/metadata/datasources/<datasource>/segments?full" \
  | jq '[.[] | select(.used == false)] | length'

If that count climbs while your storage bill does, the fix is a permissions problem, not a rules problem: verify the Coordinator's principal holds s3:DeleteObject (or the GCS/Azure equivalent) on the segments/ prefix, exactly as scoped in the parent security boundaries for segment access baseline.

Target Spec & Validated JSON

Druid persists retention rules in the relational metadata store and propagates them to Historicals on each duty cycle. A robust production spec layers perpetual loading for recent partitions over strict time-bound expiration, with a terminal catch-all so malformed intervals can never accumulate. POST this array to /druid/coordinator/v1/rules/<datasource>:

[
  {
    "type": "loadByPeriod",
    "period": "P1M",
    "includeFuture": true,
    "tieredReplicants": {
      "_default_tier": 2
    }
  },
  {
    "type": "dropBeforeByPeriod",
    "period": "P6M"
  },
  {
    "type": "dropForever"
  }
]

The loadByPeriod directive guarantees rolling availability for the most recent month, replicated twice across the default tier; set includeFuture: true so segments landing ahead of wall-clock time (late-arriving or future-dated events) are still loaded. dropBeforeByPeriod with P6M marks anything older than six months unused, and dropForever is the terminal rule that prevents metadata bloat from orphaned partitions. Every period must be a valid ISO 8601 duration so all Coordinator nodes parse it identically.

The retention boundary only lands cleanly if it aligns with your partitioning. Because dropBeforeByPeriod evaluates whole segment intervals, the segment granularity chosen at ingestion sets the resolution of expiration — a DAY-granular datasource drops one clean day at a time, whereas coarse granularity leaves partial intervals straddling the six-month boundary. For long-lived tables, pair these rules with automated compaction scheduling so that the still-loaded window stays optimally sized, and treat drop rules as the enforcement layer for the broader TTL mapping and data expiration policy your compliance model requires.

Python Automation Script

Applying rules by hand introduces drift. Enforce them from a pipeline instead: the following idempotent client uses only requests plus the standard library, mounts an exponential-backoff retry adapter for transient Coordinator unavailability, validates the payload, and then confirms the Coordinator echoes the exact rules back before returning.

import time
from typing import Dict, List
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


class DruidRetentionManager:
    def __init__(self, coordinator_url: str, datasource: str, timeout: int = 15):
        self.base_url = coordinator_url.rstrip("/")
        self.datasource = datasource
        self.session = requests.Session()
        retry_strategy = Retry(
            total=3,
            backoff_factor=1.5,  # sleeps 0s, 1.5s, 3s between transient failures
            status_forcelist=[429, 500, 502, 503, 504],
            allowed_methods=["GET", "POST"],
        )
        adapter = HTTPAdapter(max_retries=retry_strategy)
        self.session.mount("https://", adapter)
        self.session.mount("http://", adapter)
        self.timeout = timeout

    def apply_rules(self, rules: List[Dict]) -> None:
        endpoint = f"{self.base_url}/druid/coordinator/v1/rules/{self.datasource}"
        if not rules or not all("type" in r for r in rules):
            raise ValueError("Invalid retention rule payload: missing 'type' field")

        response = self.session.post(
            endpoint,
            json=rules,
            headers={"Content-Type": "application/json"},
            timeout=self.timeout,
        )
        response.raise_for_status()
        self._verify_sync(rules)

    def _verify_sync(self, expected_rules: List[Dict], retries: int = 5) -> None:
        endpoint = f"{self.base_url}/druid/coordinator/v1/rules/{self.datasource}"
        for attempt in range(retries):
            resp = self.session.get(endpoint, timeout=self.timeout)
            resp.raise_for_status()
            if resp.json() == expected_rules:
                return
            time.sleep(2 ** attempt)  # 1s, 2s, 4s, 8s, 16s
        raise RuntimeError("Retention rule sync verification timed out")


if __name__ == "__main__":
    manager = DruidRetentionManager("http://coordinator-host:8081", "clickstream")
    manager.apply_rules([
        {"type": "loadByPeriod", "period": "P1M", "includeFuture": True,
         "tieredReplicants": {"_default_tier": 2}},
        {"type": "dropBeforeByPeriod", "period": "P6M"},
        {"type": "dropForever"},
    ])

Because _verify_sync compares the Coordinator's echoed rules to the intended payload, re-running the script is a no-op once the deployment has converged — safe to wire into a cron job or Kubernetes CronJob that detects and corrects drift before it reaches query routing or storage SLAs. The retry adapter absorbs the transient 503s that appear during rolling upgrades or heavy compaction, and the pattern matches the interaction contract in the official Druid rule-configuration docs.

Verification Steps

After applying rules, confirm three things: the Coordinator persisted them, segments outside the load window actually became unused, and — if auto-kill is on — deep storage was reclaimed.

First, read the rules back. The output must match the applied array, with the load rule first:

curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/rules/clickstream" | jq '.'

[
  { "type": "loadByPeriod", "period": "P1M", "includeFuture": true,
    "tieredReplicants": { "_default_tier": 2 } },
  { "type": "dropBeforeByPeriod", "period": "P6M" },
  { "type": "dropForever" }
]

Next, confirm no segments inside the retained window are still pending assignment. loadstatus returns per-datasource availability; a fully reconciled cluster reports 100.0:

curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/loadstatus?simple" \
  | jq '.clickstream'

A value of 0 here is the count of segments not yet available — zero means every segment the rules require is loaded. If auto-kill (druid.coordinator.kill.on=true) is enabled, verify reclaimed storage by checking that dropped intervals no longer report unused segments:

curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/metadata/datasources/clickstream/segments?full" \
  | jq '[.[] | select(.used == false)] | length'

If auto-kill is disabled but you need to reclaim a specific historical interval immediately, submit a one-off kill task to the Overlord:

curl -X POST "http://<overlord-host>:8090/druid/indexer/v1/task" \
  -H "Content-Type: application/json" \
  -d '{"type": "kill", "dataSource": "clickstream", "interval": "2020-01-01/2020-07-01"}'

Security Boundaries for Segment Access — the parent reference; retention-rule mutation is a privileged WRITE action governed by the datasource authorization model described there.
TTL Mapping and Data Expiration — how drop rules implement the broader time-to-live and compliance-retention policy.
Reducing Historical Node Storage Costs — pairs expiration with compaction to keep the retained window cost-efficient.
Configuring Druid Native Compaction Rules — keeps the still-loaded segments optimally sized as retention trims the tail.

Configuring Segment Retention Policies in Apache Druid

Failure Modes & Diagnostics #

Target Spec & Validated JSON #

Python Automation Script #

Verification Steps #

Related #

Failure Modes & Diagnostics

Target Spec & Validated JSON

Python Automation Script

Verification Steps

Related