Configuring Segment Retention Policies

Segment retention in Apache Druid operates as a deterministic, rule-based lifecycle mechanism orchestrated exclusively by the Coordinator process. Retention policies govern the state transitions of segments from ACTIVE to DROPPED, trigger asynchronous deep storage cleanup, and directly influence cluster resource allocation. Misaligned retention configurations frequently cascade into query routing failures, uncontrolled storage expenditure, and metadata database bloat. This reference provides production-grade retention orchestration patterns, failure diagnostics, and automated recovery workflows tailored for OLAP data engineers and DevOps teams managing high-throughput ingestion pipelines.

How Rules Are Evaluated

Retention rules are evaluated per datasource with first-match-wins semantics, so ordering is decisive. The diagram traces a segment interval through a typical load → drop-before → drop-forever rule chain. Click the diagram to open a full-screen version.

flowchart TD S[Segment interval] --> R1{loadByPeriod P1M matches?} R1 -- yes --> L[Load and replicate on Historicals] R1 -- no --> R2{dropBeforeByPeriod P6M matches?} R2 -- yes --> D[Drop / mark unused] R2 -- no --> R3[dropForever] R3 --> D

Rule Evaluation and JSON Specification

Druid evaluates retention rules sequentially per datasource, applying a strict first-match-wins logic. Consequently, rule ordering is non-negotiable. The Coordinator persists these rules in the relational metadata store and propagates them to Historical nodes during its periodic scan cycle. A robust production specification typically layers perpetual loading for recent partitions with strict time-bound expiration for historical data.

[
  {
    "type": "loadByPeriod",
    "period": "P1M",
    "includeFuture": true,
    "tieredReplicants": {
      "_default_tier": 2
    }
  },
  {
    "type": "dropBeforeByPeriod",
    "period": "P6M"
  },
  {
    "type": "dropForever"
  }
]

The loadByPeriod directive guarantees rolling availability for the most recent month across designated hardware tiers. dropBeforeByPeriod schedules segments older than six months for unloading, while dropForever serves as a terminal catch-all to prevent metadata accumulation from malformed intervals or orphaned partitions. Rule evaluation semantics are deeply coupled with the underlying segment state machine, as documented in Apache Druid Segment Architecture & Lifecycle Fundamentals. All period strings must strictly conform to ISO 8601 duration syntax to ensure deterministic parsing across cluster nodes.

Failure Modes and Diagnostic Patterns

Retention misconfigurations manifest through predictable, high-impact failure modes. The most prevalent is premature segment eviction, typically triggered by inverted rule ordering where dropBeforeByPeriod precedes loadByPeriod. This forces Historicals to unload segments that Brokers still route queries against, resulting in SegmentMissingException errors and degraded SLAs.

Coordinator Sync Lag emerges when state propagation fails due to ZooKeeper session timeouts or network partitions. Operators will observe divergence between segment/used counts in the metadata database and actual disk utilization on Historical nodes. Validate synchronization health using the following diagnostic pipeline:

curl -s http://<coordinator-host>:8081/druid/coordinator/v1/metadata/segments?used=true \
  | jq '[.[].state] | group_by(.) | map({state: .[0], count: length})'

Orphaned segments in deep storage accumulate when automatic kill is disabled (druid.coordinator.kill.on=false) or when the Coordinator/Overlord lacks IAM permissions for the target object store bucket. Cross-reference access logs against the cluster's Security Boundaries for Segment Access to verify that the Coordinator's service account holds s3:DeleteObject or equivalent GCS/Azure permissions.

Python Orchestration and API Automation

Manual rule updates introduce configuration drift and human error. Production pipelines should enforce retention policies via the Coordinator REST API, wrapped in idempotent Python automation. The following snippet demonstrates a production-ready implementation using requests with exponential backoff, payload validation, and strict error handling.

import requests
import time
from typing import Dict, List
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class DruidRetentionManager:
    def __init__(self, coordinator_url: str, datasource: str, timeout: int = 15):
        self.base_url = coordinator_url.rstrip("/")
        self.datasource = datasource
        self.session = requests.Session()
        retry_strategy = Retry(
            total=3,
            backoff_factor=1.5,
            status_forcelist=[429, 500, 502, 503, 504],
        )
        self.session.mount("https://", HTTPAdapter(max_retries=retry_strategy))
        self.session.mount("http://", HTTPAdapter(max_retries=retry_strategy))
        self.timeout = timeout

    def apply_rules(self, rules: List[Dict]) -> None:
        endpoint = f"{self.base_url}/druid/coordinator/v1/rules/{self.datasource}"
        headers = {"Content-Type": "application/json"}

        if not rules or not all("type" in r for r in rules):
            raise ValueError("Invalid retention rule payload: missing 'type' field")

        response = self.session.post(
            endpoint, json=rules, headers=headers, timeout=self.timeout
        )
        response.raise_for_status()
        self._verify_sync(rules)

    def _verify_sync(self, expected_rules: List[Dict], retries: int = 5) -> None:
        endpoint = f"{self.base_url}/druid/coordinator/v1/rules/{self.datasource}"
        for attempt in range(retries):
            resp = self.session.get(endpoint, timeout=self.timeout)
            resp.raise_for_status()
            current_rules = resp.json()
            if current_rules == expected_rules:
                return
            time.sleep(2 ** attempt)
        raise RuntimeError("Retention rule sync verification timed out")

This orchestration pattern aligns with standard API interaction guidelines outlined in the official Druid documentation. The retry logic mitigates transient Coordinator unavailability during rolling upgrades or heavy compaction workloads.

Recovery Patterns and State Enforcement

When retention drift occurs, operators must execute a structured recovery sequence to restore cluster equilibrium without triggering cascading unloads.

  1. Rule Rollback & Freeze: Immediately revert to a known-good retention spec using the API. Set includeFuture: false on all drop* rules to halt active eviction while diagnostics run.
  2. Historical Segment Audit: Query the /druid/coordinator/v1/metadata/segments endpoint with includeUnused=true to identify segments marked DROPPED but still present on disk. Force a reload via POST /druid/coordinator/v1/loadstatus if Brokers report missing segments.
  3. Deep Storage Reconciliation: If automatic kill is disabled or skipped, manually reclaim deep storage by submitting a kill task to the Overlord for the affected interval (auto-kill requires druid.coordinator.kill.on=true):
  curl -X POST http://<overlord-host>:8090/druid/indexer/v1/task \
    -H "Content-Type: application/json" \
    -d '{"type": "kill", "dataSource": "<datasource>", "interval": "2020-01-01/2020-07-01"}'
  1. Metadata Compaction: Run OPTIMIZE TABLE (MySQL) or VACUUM (PostgreSQL) on the Druid metadata database to reclaim space from deleted segment records. This prevents query planner degradation during segment discovery.

Implementing automated retention audits via cron or Kubernetes CronJobs ensures that policy drift is detected before it impacts query routing or storage SLAs. By coupling deterministic rule evaluation with programmatic enforcement, OLAP platforms maintain predictable lifecycle management across petabyte-scale deployments.

Back to Apache Druid Segment Architecture & Lifecycle Fundamentals