Configuring Segment Retention Policies
Segment retention in Apache Druid operates as a deterministic, rule-based lifecycle mechanism orchestrated exclusively by the Coordinator process. Retention policies govern the state transitions of segments from ACTIVE to DROPPED, trigger asynchronous deep storage cleanup, and directly influence cluster resource allocation. Misaligned retention configurations frequently cascade into query routing failures, uncontrolled storage expenditure, and metadata database bloat. This reference provides production-grade retention orchestration patterns, failure diagnostics, and automated recovery workflows tailored for OLAP data engineers and DevOps teams managing high-throughput ingestion pipelines.
How Rules Are Evaluated
Retention rules are evaluated per datasource with first-match-wins semantics, so ordering is decisive. The diagram traces a segment interval through a typical load → drop-before → drop-forever rule chain. Click the diagram to open a full-screen version.
Rule Evaluation and JSON Specification
Druid evaluates retention rules sequentially per datasource, applying a strict first-match-wins logic. Consequently, rule ordering is non-negotiable. The Coordinator persists these rules in the relational metadata store and propagates them to Historical nodes during its periodic scan cycle. A robust production specification typically layers perpetual loading for recent partitions with strict time-bound expiration for historical data.
[
{
"type": "loadByPeriod",
"period": "P1M",
"includeFuture": true,
"tieredReplicants": {
"_default_tier": 2
}
},
{
"type": "dropBeforeByPeriod",
"period": "P6M"
},
{
"type": "dropForever"
}
]
The loadByPeriod directive guarantees rolling availability for the most recent month across designated hardware tiers. dropBeforeByPeriod schedules segments older than six months for unloading, while dropForever serves as a terminal catch-all to prevent metadata accumulation from malformed intervals or orphaned partitions. Rule evaluation semantics are deeply coupled with the underlying segment state machine, as documented in Apache Druid Segment Architecture & Lifecycle Fundamentals. All period strings must strictly conform to ISO 8601 duration syntax to ensure deterministic parsing across cluster nodes.
Failure Modes and Diagnostic Patterns
Retention misconfigurations manifest through predictable, high-impact failure modes. The most prevalent is premature segment eviction, typically triggered by inverted rule ordering where dropBeforeByPeriod precedes loadByPeriod. This forces Historicals to unload segments that Brokers still route queries against, resulting in SegmentMissingException errors and degraded SLAs.
Coordinator Sync Lag emerges when state propagation fails due to ZooKeeper session timeouts or network partitions. Operators will observe divergence between segment/used counts in the metadata database and actual disk utilization on Historical nodes. Validate synchronization health using the following diagnostic pipeline:
curl -s http://<coordinator-host>:8081/druid/coordinator/v1/metadata/segments?used=true \
| jq '[.[].state] | group_by(.) | map({state: .[0], count: length})'
Orphaned segments in deep storage accumulate when automatic kill is disabled (druid.coordinator.kill.on=false) or when the Coordinator/Overlord lacks IAM permissions for the target object store bucket. Cross-reference access logs against the cluster's Security Boundaries for Segment Access to verify that the Coordinator's service account holds s3:DeleteObject or equivalent GCS/Azure permissions.
Python Orchestration and API Automation
Manual rule updates introduce configuration drift and human error. Production pipelines should enforce retention policies via the Coordinator REST API, wrapped in idempotent Python automation. The following snippet demonstrates a production-ready implementation using requests with exponential backoff, payload validation, and strict error handling.
import requests
import time
from typing import Dict, List
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
class DruidRetentionManager:
def __init__(self, coordinator_url: str, datasource: str, timeout: int = 15):
self.base_url = coordinator_url.rstrip("/")
self.datasource = datasource
self.session = requests.Session()
retry_strategy = Retry(
total=3,
backoff_factor=1.5,
status_forcelist=[429, 500, 502, 503, 504],
)
self.session.mount("https://", HTTPAdapter(max_retries=retry_strategy))
self.session.mount("http://", HTTPAdapter(max_retries=retry_strategy))
self.timeout = timeout
def apply_rules(self, rules: List[Dict]) -> None:
endpoint = f"{self.base_url}/druid/coordinator/v1/rules/{self.datasource}"
headers = {"Content-Type": "application/json"}
if not rules or not all("type" in r for r in rules):
raise ValueError("Invalid retention rule payload: missing 'type' field")
response = self.session.post(
endpoint, json=rules, headers=headers, timeout=self.timeout
)
response.raise_for_status()
self._verify_sync(rules)
def _verify_sync(self, expected_rules: List[Dict], retries: int = 5) -> None:
endpoint = f"{self.base_url}/druid/coordinator/v1/rules/{self.datasource}"
for attempt in range(retries):
resp = self.session.get(endpoint, timeout=self.timeout)
resp.raise_for_status()
current_rules = resp.json()
if current_rules == expected_rules:
return
time.sleep(2 ** attempt)
raise RuntimeError("Retention rule sync verification timed out")
This orchestration pattern aligns with standard API interaction guidelines outlined in the official Druid documentation. The retry logic mitigates transient Coordinator unavailability during rolling upgrades or heavy compaction workloads.
Recovery Patterns and State Enforcement
When retention drift occurs, operators must execute a structured recovery sequence to restore cluster equilibrium without triggering cascading unloads.
- Rule Rollback & Freeze: Immediately revert to a known-good retention spec using the API. Set
includeFuture: falseon alldrop*rules to halt active eviction while diagnostics run. - Historical Segment Audit: Query the
/druid/coordinator/v1/metadata/segmentsendpoint withincludeUnused=trueto identify segments markedDROPPEDbut still present on disk. Force a reload viaPOST /druid/coordinator/v1/loadstatusif Brokers report missing segments. - Deep Storage Reconciliation: If automatic kill is disabled or skipped, manually reclaim deep storage by submitting a
killtask to the Overlord for the affected interval (auto-kill requiresdruid.coordinator.kill.on=true):
curl -X POST http://<overlord-host>:8090/druid/indexer/v1/task \
-H "Content-Type: application/json" \
-d '{"type": "kill", "dataSource": "<datasource>", "interval": "2020-01-01/2020-07-01"}'
- Metadata Compaction: Run
OPTIMIZE TABLE(MySQL) orVACUUM(PostgreSQL) on the Druid metadata database to reclaim space from deleted segment records. This prevents query planner degradation during segment discovery.
Implementing automated retention audits via cron or Kubernetes CronJobs ensures that policy drift is detected before it impacts query routing or storage SLAs. By coupling deterministic rule evaluation with programmatic enforcement, OLAP platforms maintain predictable lifecycle management across petabyte-scale deployments.