Why we bet on GreptimeDB instead of ClickHouse

March 16, 2026

When building the observability layer for Hebo Platform, we evaluated several databases typically used for telemetry and analytics.

One name inevitably comes up in those discussions: ClickHouse.

ClickHouse is extremely fast and widely used across observability platforms. Many modern systems, from log analytics to product telemetry, are built on top of it.

But after evaluating several options, we decided to build Hebo’s observability layer on GreptimeDB instead. The reason comes down to a fundamental difference in what these databases were designed for.

The name tells half the story

GreptimeDB was designed from the beginning as a time-series database. Timestamps drive the storage layout: data is partitioned by time ranges, time columns are automatically indexed, and the write path is optimized for the append-heavy, monotonically increasing pattern that telemetry produces. Compaction, TTL, and continuous aggregation are built into the engine rather than bolted on.

Compare that to ClickHouse, which started as a general-purpose analytical database optimized for large-scale OLAP queries. It stores data in a columnar format and excels at aggregations across massive datasets, but time is just another dimension, not a first-class concern in the storage engine or query planner.

Observability data is fundamentally time-series data. Every trace, span, metric, and log entry is anchored in time, and when your entire workload revolves around time-based ingestion and range queries, a storage engine built for that access pattern has a real advantage.

We are ingesting traces, token usage, inference latency, model metadata, and many other signals generated by AI systems. This workload looks much more like time-series telemetry than traditional analytics, and GreptimeDB’s architecture reflects that directly.

Object Storage Native (Without Cost Explosion)

Observability data grows quickly. A system that records traces, logs, and AI telemetry can accumulate terabytes of data in a relatively short time.

Most databases handle this by eventually tiering cold data to object storage, but that usually means operating two separate systems: the primary DB on block storage and a separate cold tier in S3, with synchronization logic, duplicated bytes, and two sets of infrastructure to maintain.

GreptimeDB’s architecture is different. It uses a disaggregated storage model where the storage layer is separated from the compute layer from the start. Under the hood, GreptimeDB runs a WAL for durability, an in-memory memtable for recent writes, and flushes SST files (sorted string tables) directly to S3-compatible object storage. A local cache layer keeps hot segments fast. There’s no second system to operate and no duplicated data: the object store is the database storage.

The Billion JSON-Document Challenge demonstrates that this architecture doesn't sacrifice performance. GreptimeDB outperforms every other database tested, including ClickHouse.

That level of performance also eliminate another common pattern in analytical workloads: maintaining a separate NoSQL store alongside a SQL database, with data replicated between them. One for fast writes, one for fast reads: a dual-store architecture that doubles complexity and storage costs. With GreptimeDB, a single system handles both.

A Simpler Observability Architecture

The other major reason we chose GreptimeDB is that it dramatically simplifies the ingestion pipeline.

If you look at the Langfuse v3 infrastructure evolution post, you’ll see a representative serious observability stack: Redis for buffering, ingestion workers, transformation pipelines, ClickHouse for storage, and S3 for cold data. That architecture is well-reasoned and handles Langfuse’s scale. It also means operating five or six separate systems.

GreptimeDB collapses most of that because it exposes a native OTLP endpoint, the same protocol your SDK already speaks. Traces are written directly over HTTP without any intermediate queue or processing worker needed.

Standard:

  ┌─────┐   ┌───────┐   ┌─────────┐   ┌────────────────┐   ┌──────────┐
  │ SDK │──▶│ Queue │──▶│ Workers │──▶│ Transformation │──▶│ Database │
  └─────┘   └───────┘   └─────────┘   └────────────────┘   └──────────┘

With GreptimeDB:

  ┌─────┐  OTLP  ┌────────────┐
  │ SDK │───────▶│ GreptimeDB │
  └─────┘        └────────────┘

This works because GreptimeDB handles the separation of concerns internally. It’s built around three roles: a stateless frontend that handles query and ingestion protocols, datanodes that store the actual data, and metasrv that coordinates routing across the cluster. These scale independently of each other, so you don’t need to build that horizontal scalability yourself with Redis and workers. The database provides it. The result is fewer moving parts, less operational overhead, and a system that’s much easier to reason about.

Built for OpenTelemetry (and Gen-AI Observability)

At Hebo, OpenTelemetry is our primary signal format. GreptimeDB has native OTLP support and maps trace and span data directly to its columnar storage model.

This integrates cleanly with the emerging Gen-AI semantic conventions, which define a standard schema for LLM observability. Span attributes like:

gen_ai.operation.name: the type of operation (chat, completion, embedding, etc.)
gen_ai.request.model / gen_ai.response.model: the model requested and the model that actually served the response
gen_ai.usage.input_tokens / gen_ai.usage.output_tokens: token counts for cost and quota tracking
gen_ai.input.messages / gen_ai.output.messages: the full message input and output

...are stored in the opentelemetry_traces table, flattened directly into columns. GreptimeDB uses a dynamic schema: when a new span attribute arrives that hasn't been seen before, the column is created automatically with no migrations required. As the Gen-AI semantic conventions evolve, new fields just appear.

Metrics like gen_ai.server.request.duration (inference latency per request) are handled differently. Each metric name becomes its own table, created automatically on first write. Both signal types flow through the same OTLP ingestion path but are queryable independently.

You can query your trace data directly, with no custom schema design or ETL step required:

SELECT
  "span_attributes.gen_ai.request.model",
  SUM("span_attributes.gen_ai.usage.input_tokens")
FROM opentelemetry_traces
GROUP BY "span_attributes.gen_ai.request.model"

The GreptimeDB team published a practical walkthrough of LLM observability with OTel GenAI conventions that goes further — cross-signal SQL joins linking traces to conversation logs via trace_id, flow-based metric derivation, and Grafana integration.

Great Developer Experience from Laptop to Cluster

GreptimeDB transitions smoothly from development to production.

Locally, it runs as a single Docker container in standalone mode, with all three roles bundled into one process. That’s sufficient for development and works fine for smaller production deployments too.

When you need to scale, Helm charts deploy cluster mode, where the three roles described above split into independently scalable components. Heavy query load? Scale the frontends. More ingestion volume? Add datanodes. You don’t have to scale everything together.

The open-source version covers all of this. The enterprise edition adds operational features like auto-scaling / -partitioning / -indexing, incremental backups, and enterprise auth (RBAC/LDAP). Things that matter at scale but aren’t needed to get started.

There’s also GreptimeCloud if you’d rather not operate the infrastructure yourself. The architecture is the same across all tiers, so there’s no lock-in to a particular deployment model.

An Extremely Responsive Team

An impressive aspect of working with GreptimeDB has been the responsiveness of the team.

Their Slack community is active, and when we ran into issues, the team responded quickly every time. The CTO himself opened GitHub issues and attended to them within days — Unicode encoding in query results, binary-encoded timestamp parameters hanging in cluster mode, S3 authentication via EKS Pod Identity. You can't ask for more.

For a younger project, that kind of engagement matters a lot. We’re also looking forward to the upcoming JSON v2 data type, which will make working with structured observability metadata (tool call arguments, model parameters, custom attributes) even faster.

What’s Next

Observability is the first use case, but not the last. We’re planning to use GreptimeDB for the upcoming Hebo /conversations API as well, storing and querying large-scale AI interaction histories. The same architecture that handles telemetry applies directly to interaction logs at scale.

If you’re building AI infrastructure, observability platforms, or LLM tooling, GreptimeDB is worth a serious look. The combination of a time-series-native storage engine, built-in object storage, and native OTLP ingestion is a strong fit for this class of workload.

And if you want to see it in action, try the Hebo observability dashboard: traces, token usage, inference latency, and Gen-AI signals, all in one place.