Understanding Druid Segment Granularity: Temporal Partitioning, Sizing & Orchestration

Apache Druid's query performance, storage efficiency, and compaction behaviour are all governed by how it partitions time-series data into immutable, columnar time-chunk units called segments, and segmentGranularity is the primary lever that sets those temporal boundaries. For OLAP data engineers, analytics platform developers, and Python pipeline builders, this single field decides the physical size of segment files, the number of segments the Coordinator must track, indexing-task heap pressure, and deep-storage object counts. This page sits under Apache Druid Segment Architecture & Lifecycle Fundamentals, which frames how segments are built, distributed, and retired; here we focus specifically on the temporal partitioning decision and how to compute, validate, and automate it rather than hardcode it once and forget.

Mechanics & Internals

segmentGranularity lives inside the granularitySpec block of dataSchema and deterministically slices incoming rows by their __time value into discrete, immutable time-chunk segments that are persisted to deep storage. It accepts the standard period buckets — HOUR, DAY, WEEK, MONTH, QUARTER, YEAR — as well as arbitrary ISO-8601 periods (for example PT6H or P1D) when the named constants are too coarse or too fine.

segmentGranularity = DAY slices events into one immutable segment family per UTC day, each persisted to deep storage. Inside a single file, queryGranularity = HOUR sets the finest retained timestamp — the two knobs are independent.

Druid enforces partitioning strictly along the __time dimension, and the mapping is UTC-aligned and boundary-locked. A DAY setting anchors every segment at midnight UTC — [YYYY-MM-DDT00:00:00.000Z, YYYY-MM-DDT00:00:00.000Z) with an exclusive end — while HOUR produces 24 discrete segments per calendar day. Each bucket materialises as a self-contained segment carrying its own metadata descriptor, columnar value stores and Roaring bitmap indexes, so the granularity you pick multiplies directly through every downstream encoding cost. The precise timestamp-to-interval arithmetic — including how late-arriving events and timezone-shifted logs are bucketed — is worked through in how Druid segments map to time intervals.

segmentGranularity must not be confused with the sibling queryGranularity field. segmentGranularity sets the physical file boundary; queryGranularity sets the finest timestamp resolution retained inside a segment (rows are truncated to it, enabling rollup). A common production pattern is segmentGranularity: DAY with queryGranularity: HOUR or MINUTE: one file per day, but per-hour aggregation buckets within it. The two are independent knobs, and conflating them is a frequent source of either oversized files or lost temporal precision.

The consequence chain is what makes this a high-leverage decision. Finer granularity means more segments per interval, which inflates the segment count the Coordinator holds in its metadata maps and the segment set the Broker must resolve, prune, and scatter/gather across at query time. Because segments are immutable, the granularity chosen at ingestion is effectively frozen until a compaction pass re-partitions the interval — you cannot mutate a segment's boundary in place. That immutability is what lets Druid atomically swap higher-version segments into an interval for corrections, but it also means a granularity mistake compounds silently until it is compacted away. How the resulting segment set is discovered and dispatched is detailed in query routing and segment discovery.

Validated Configuration Spec

segmentGranularity is declared in granularitySpec and works in concert with partitionsSpec in tuningConfig — granularity sets the time boundary, targetRowsPerSegment (or maxRowsPerSegment) controls the secondary split within each time chunk. The batch (index_parallel) spec below is copy-ready against a recent stable Druid release; every field that touches temporal partitioning is annotated.

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "events_granularity",
      "timestampSpec": { "column": "ts", "format": "iso" },
      "dimensionsSpec": {
        "dimensions": [
          { "type": "string", "name": "country", "createBitmapIndex": true },
          { "type": "string", "name": "device", "createBitmapIndex": true },
          { "type": "long", "name": "status_code" }
        ]
      },
      "metricsSpec": [
        { "type": "count", "name": "events" },
        { "type": "longSum", "name": "bytes_sent", "fieldName": "bytes" }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "DAY",
        "queryGranularity": "HOUR",
        "rollup": true,
        "intervals": ["2026-07-01/2026-07-08"]
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": { "type": "s3", "prefixes": ["s3://analytics-bucket/raw/"] },
      "inputFormat": { "type": "json" }
    },
    "tuningConfig": {
      "type": "index_parallel",
      "maxRowsInMemory": 1000000,
      "partitionsSpec": {
        "type": "hashed",
        "targetRowsPerSegment": 5000000
      }
    }
  }
}

Field-by-field, the temporal-partitioning keys are:

granularitySpec.type — uniform applies one segmentGranularity across the whole ingestion. Use arbitrary only when you must hand-pin non-uniform intervals; almost all pipelines want uniform.
granularitySpec.segmentGranularity — the physical time-chunk boundary. DAY here yields one segment family per UTC day. Accepts named periods (HOUR…YEAR) or ISO periods (PT6H, P1D, P1M).
granularitySpec.queryGranularity — the finest retained timestamp inside each segment. HOUR truncates __time to the hour, which (with rollup: true) collapses rows that share the same hour and dimension tuple.
granularitySpec.rollup — with true, pre-aggregation runs before encoding, shrinking segments; its effectiveness depends directly on how coarse queryGranularity is.
granularitySpec.intervals — for batch, the explicit set of intervals to build. Rows whose __time falls outside these intervals are dropped, not routed to a neighbour — the single most common cause of silent data loss on backfills.
tuningConfig.partitionsSpec.targetRowsPerSegment — the secondary (within-time-chunk) split. If one day's rows exceed this, Druid produces multiple numbered shards for that day; this is how you keep each physical file inside the target size band regardless of granularity.

For streaming ingestion the same field appears in the supervisor spec's granularitySpec, paired with lateMessageRejectionPeriod / earlyMessageRejectionPeriod to bound how far outside the current time window events may still land in a segment.

Sizing Heuristics & Formulas

Granularity is really a segment-count and segment-size problem in disguise. Start from the target on-disk size and the compressed bytes per row to get the rows a single physical segment should hold:

$$ \text{targetRowsPerSegment} \approx \frac{\text{targetBytes}}{\text{avgCompressedBytesPerRow}} $$

For a 600 MB target and a measured 200 compressed bytes per row:

$$ \text{targetRowsPerSegment} \approx \frac{600 \times 1{,}048{,}576}{200} \approx 3{,}145{,}000 $$

The granularity you should choose then falls out of your event velocity. Given an average ingest rate of $R$ rows per hour, the rows landing in one time chunk of duration $G$ hours is $R \times G$, so the number of shards that time chunk splits into is:

$$ \text{shardsPerChunk} \approx \frac{R \times G}{\text{targetRowsPerSegment}} $$

The operational goal is shardsPerChunk at or slightly above 1: a granularity coarse enough that each chunk fills roughly one well-sized segment, but not so coarse that a single chunk explodes into dozens of shards or so fine that most chunks are near-empty. Rearranging gives the ideal chunk width directly:

$$ G_{\text{ideal}} \approx \frac{\text{targetRowsPerSegment}}{R} $$

A datasource ingesting $R \approx 400{,}000$ rows/hour against a 3.1M-row target yields $G_{\text{ideal}} \approx 7.9$ hours — round to PT6H or HOUR-with-multiple-shards rather than DAY, which would pack ~9.6M rows into each chunk and force a 3-way split. Conversely a low-velocity datasource at $R \approx 5{,}000$ rows/hour wants $G_{\text{ideal}} \approx 629$ hours ≈ 26 days, so MONTH is appropriate and DAY would scatter thousands of tiny under-filled segments.

Finally, keep an eye on the total segment count the Druid cluster must track, since it drives Coordinator and Broker overhead:

$$ \text{segmentCount} \approx \frac{\text{retentionHours}}{G} \times \text{shardsPerChunk} \times \text{replicas} $$

Aim to keep published segments in the 300 MB–700 MB range; larger strains Historical heap and lengthens load, smaller multiplies metadata and Broker coordination cost. These relationships feed straight into the applied tuning covered in segment size optimization strategies.

Python Orchestration Snippet

Granularity only stays correct if a pipeline derives it from live telemetry and audits the result. The orchestrator below computes an ideal granularity from a measured row rate, submits the ingestion task, polls to a terminal state with exponential backoff, then audits the produced segment sizes so a mis-sized granularity is caught at ingestion time rather than in production. It uses only the standard library plus requests.

import time
import logging
import requests

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("granularity_pipeline")

# Named granularities mapped to their width in hours, ordered coarse-to-fine.
_GRANULARITIES = [
    ("YEAR", 8760), ("QUARTER", 2190), ("MONTH", 730),
    ("WEEK", 168), ("DAY", 24), ("HOUR", 1),
]


def choose_segment_granularity(rows_per_hour, target_rows_per_segment):
    """Pick the coarsest named granularity whose chunk stays near one segment."""
    if rows_per_hour <= 0:
        return "DAY"
    ideal_hours = target_rows_per_segment / rows_per_hour
    # Coarsest granularity that does not overshoot the ideal chunk width.
    for name, hours in _GRANULARITIES:
        if hours <= ideal_hours:
            return name
    return "HOUR"


class DruidGranularityPipeline:
    def __init__(self, overlord_url, coordinator_url, auth=None):
        self.overlord = overlord_url.rstrip("/")
        self.coordinator = coordinator_url.rstrip("/")
        self.session = requests.Session()
        self.session.auth = auth

    def submit(self, ingestion_spec):
        resp = self.session.post(
            f"{self.overlord}/druid/indexer/v1/task",
            json=ingestion_spec,
            timeout=30,
        )
        resp.raise_for_status()
        task_id = resp.json()["task"]
        logger.info("Submitted ingestion task %s", task_id)
        return task_id

    def poll_until_terminal(self, task_id, base_delay=5, max_delay=60, max_wait=3600):
        start = time.time()
        delay = base_delay
        while time.time() - start < max_wait:
            resp = self.session.get(
                f"{self.overlord}/druid/indexer/v1/task/{task_id}/status",
                timeout=10,
            )
            resp.raise_for_status()
            status = resp.json().get("status", {}).get("status")
            if status in ("SUCCESS", "FAILED", "INTERRUPTED"):
                logger.info("Task %s terminal state: %s", task_id, status)
                return status
            logger.debug("Task %s pending (%s); sleeping %ss", task_id, status, delay)
            time.sleep(delay)
            delay = min(delay * 2, max_delay)  # exponential backoff, capped
        raise TimeoutError(f"Task {task_id} exceeded {max_wait}s")

    def audit_segment_sizes(self, datasource, low_mb=300, high_mb=700):
        resp = self.session.get(
            f"{self.coordinator}/druid/coordinator/v1/datasources/"
            f"{datasource}/segments?full=true",
            timeout=30,
        )
        resp.raise_for_status()
        sizes_mb = [s["size"] / 1048576 for s in resp.json()]
        if not sizes_mb:
            raise RuntimeError(f"No segments found for {datasource}")
        out_of_band = [round(m, 1) for m in sizes_mb if m < low_mb or m > high_mb]
        logger.info(
            "%s: %d segments, avg %.1f MB, %d outside %d-%d MB band",
            datasource, len(sizes_mb), sum(sizes_mb) / len(sizes_mb),
            len(out_of_band), low_mb, high_mb,
        )
        return {"count": len(sizes_mb), "out_of_band": out_of_band}


# Usage
# pipe = DruidGranularityPipeline("http://overlord:8090", "http://coordinator:8081")
# gran = choose_segment_granularity(rows_per_hour=400_000,
#                                   target_rows_per_segment=3_145_000)  # -> "HOUR"
# spec["spec"]["dataSchema"]["granularitySpec"]["segmentGranularity"] = gran
# task_id = pipe.submit(spec)
# if pipe.poll_until_terminal(task_id) == "SUCCESS":
#     report = pipe.audit_segment_sizes("events_granularity")
#     if report["out_of_band"]:
#         raise SystemExit(f"Granularity drift detected: {report['out_of_band']}")

Because the audit fails loudly when segments fall outside the target band, this pattern slots directly into a CI/CD gate. For the broader templating of these specs across environments, see dynamic ingestion spec generation, and for structural validation before submission see schema validation for Druid specs.

Failure Modes & Diagnostics

Granularity problems rarely announce themselves as "granularity" — they show up as segment-count explosions, oversized files, or dropped rows. The shell workflows below isolate the actual cause against the Coordinator and Overlord REST APIs.

Too-fine granularity: segment-count explosion. Symptom: rising Coordinator CPU, slow segment discovery, many under-sized files. Count segments and inspect their size distribution:

# Total segment count and size distribution for a datasource
curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/datasources/events_granularity/segments?full=true" \
  | jq '[.[] | .size / 1048576] | {segments: length, avg_mb: (add / length | floor), min_mb: (min | floor), max_mb: (max | floor)}'

Root cause is a segmentGranularity finer than the event velocity warrants, producing many tiny segments. Remediation: recompute G_ideal from the measured row rate and run a compaction pass that re-partitions the interval to a coarser granularity.

Too-coarse granularity: oversized segments / OOM. Symptom: indexing-task OOM, slow Historical load, single time chunks splitting into many numbered shards. Check the JVM heap of the running peon and the shard count per interval:

# Shard count per interval (partitionNum spread) for the datasource
curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/datasources/events_granularity/segments" \
  | jq -r '.[]' | sed -E 's/_[0-9]+$//' | sort | uniq -c | sort -rn | head

# Indexing peon heap / GC pressure while a chunk is built
jstat -gcutil <peon_pid> 1000 5

Root cause is a chunk holding far more rows than one target segment. Remediation: choose a finer segmentGranularity or lower targetRowsPerSegment so each chunk fills roughly one well-sized segment.

Dropped rows from interval misalignment. Symptom: a SUCCESS task but fewer rows than expected on backfills. Confirm the declared intervals and compare row counts:

# What intervals actually carry segments?
curl -s "http://<coordinator-host>:8081/druid/coordinator/v1/datasources/events_granularity/intervals" | jq

Root cause is __time values outside the granularitySpec.intervals window being silently dropped, or a timezone assumption that shifts events across a UTC boundary. Remediation: widen intervals to cover the full backfill range; the boundary arithmetic is detailed in how Druid segments map to time intervals.

Compaction fighting the ingestion granularity. Symptom: repeated full re-index cycles, CPU and I/O spikes. Inspect the running/pending compaction tasks and their target granularity:

curl -s "http://<overlord-host>:8090/druid/indexer/v1/tasks?type=compact" \
  | jq '.[] | {id: .id, status: .status, created: .createdTime}'

Root cause is a compaction segmentGranularity mismatched with the ingestion granularity, forcing every interval to be rebuilt. Remediation: align the compaction config's granularity with the ingestion boundary — the tuning surface is covered in compaction threshold tuning.

Automation Checklist

Derive segmentGranularity from a measured row rate and target segment size, not a hardcoded constant, and re-evaluate it when velocity changes.
Assert that every batch spec sets granularitySpec.intervals covering the full backfill range so no rows are silently dropped.
Keep segmentGranularity and queryGranularity distinct in review — verify the physical boundary and the in-segment resolution are each intentional.
Pair granularity with targetRowsPerSegment so each time chunk fills roughly one 300–700 MB segment.
Configure lateMessageRejectionPeriod / earlyMessageRejectionPeriod on streaming supervisors to bound out-of-window events.
Run the post-ingestion size and count audit and fail the gate when segments fall outside the target band or the count breaches the per-datasource threshold.
Align compaction segmentGranularity with the ingestion boundary so compaction consolidates rather than fully re-indexes.
Export segment/count and segment/size to Prometheus and alert on segment-count growth and average-size drift per datasource.

Apache Druid Segment Architecture & Lifecycle Fundamentals — the parent overview of how segments are built, distributed, and retired.
How Druid Segments Map to Time Intervals — the exact timestamp-to-interval arithmetic and how late/timezone-shifted events are bucketed.
Columnar Storage Formats in Druid — how the rows in each time chunk are encoded, indexed, and compressed.
Query Routing and Segment Discovery — how the resulting segment set is pruned and dispatched at query time.
Segment Size Optimization Strategies — applies these sizing formulas to keep files inside the target band over the segment lifecycle.
Automated Compaction Task Scheduling — re-partitions intervals when the original granularity drifts from demand.

Understanding Druid Segment Granularity: Temporal Partitioning, Sizing & Orchestration

Mechanics & Internals #

Validated Configuration Spec #

Sizing Heuristics & Formulas #

Python Orchestration Snippet #

Failure Modes & Diagnostics #

Automation Checklist #

Related #

Explore this section