Skip to content

Fix: ClickHouse Not Working — MergeTree ORDER BY, INSERT Batching, LowCardinality, and Replication

FixDevs ·

Quick Answer

How to fix ClickHouse errors — table engine choice, ORDER BY for primary key, INSERT too many small parts, LowCardinality types, Nullable performance, ZooKeeper for Replicated tables, and HTTP vs Native client.

The Error

You create a table without ORDER BY and the CREATE fails:

DB::Exception: Storage MergeTree requires either ORDER BY or PRIMARY KEY.

Or INSERT performance is terrible — 100 rows/second instead of millions:

# Inserting one row at a time:
for row in rows:
    client.execute("INSERT INTO events VALUES", [row])
# Each call creates a small part. ClickHouse merges them in background but slows down.

Or SELECT count(*) over a column is slow despite a billion-row table:

SELECT count() FROM events WHERE user_country = 'JP';
-- Scans 100 GB instead of using the index.

Or queries fail with replication errors:

DB::Exception: ZooKeeper session has expired. Please try reconnecting.

Why This Happens

ClickHouse is a columnar OLAP database optimized for analytics. Most issues map to:

  • MergeTree engine variants. MergeTree, ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree, ReplicatedMergeTree. Each has different semantics and tradeoffs. Picking wrong gives correct-looking output but wrong behavior at scale.
  • INSERT model. ClickHouse expects batched inserts. Row-by-row inserts create one “part” per call, and parts merge in background. Too many parts → slow merges, errors, OOM.
  • Primary key isn’t enforced. ORDER BY is the “primary key” (used for sparse index), but doesn’t dedupe — MergeTree allows duplicates. Use ReplacingMergeTree for dedup-on-merge.
  • LowCardinality dictionary encoding. Columns with few distinct values benefit from LowCardinality(String) — 10-100x storage reduction and faster filters. Without it, every row stores the full string.

Fix 1: Always Specify ORDER BY

CREATE TABLE events (
    event_time DateTime,
    user_id UInt64,
    event_type LowCardinality(String),
    country LowCardinality(String),
    revenue Float64
) ENGINE = MergeTree()
ORDER BY (event_time, user_id);

ORDER BY does two things:

  1. Physical sort. Data is stored in this order on disk.
  2. Sparse index. ClickHouse skips parts that don’t match the ORDER BY prefix.

For queries like WHERE event_time > now() - INTERVAL 1 HOUR AND user_id = 42, both predicates use the ORDER BY index — fast.

Choose ORDER BY by:

  • Most-selective filter first. If event_time is always in your WHERE, put it first.
  • Bounded cardinality. Avoid high-cardinality columns first (e.g. UUIDs) — they don’t compress or skip well.
  • 3-5 columns max. Too many keys = wide index, slower writes.

For wider table tuning:

CREATE TABLE events (...) ENGINE = MergeTree()
ORDER BY (event_time, user_id)
PARTITION BY toYYYYMM(event_time)  -- Optional: separate parts per month
TTL event_time + INTERVAL 90 DAY DELETE;  -- Auto-delete old rows

PARTITION BY lets you drop entire partitions in O(1) (ALTER TABLE events DROP PARTITION 202401) — much faster than DELETE.

Pro Tip: Don’t over-partition. Each partition is a directory; thousands of small partitions hurts performance. Aim for partitions of 1-100M rows.

Fix 2: Batch INSERTs

ClickHouse expects batches. The recommended pattern:

# Bad — one INSERT per row:
for event in events:
    client.execute("INSERT INTO events VALUES", [event])

# Good — one INSERT per batch:
client.execute("INSERT INTO events VALUES", events)  # All at once

# Better for very large batches — async insert:
client.execute("INSERT INTO events SETTINGS async_insert=1, wait_for_async_insert=1 VALUES", events)

For streaming inserts where you can’t accumulate large batches client-side, use async_insert:

INSERT INTO events SETTINGS async_insert = 1, wait_for_async_insert = 0
VALUES (now(), 1, 'click', 'JP', 0.0);

ClickHouse buffers async inserts server-side and flushes them in batches (default 200ms or 10 MB). Per-row cost drops dramatically.

For very high write throughput, use Buffer engine or write to a Kafka table function — ClickHouse pulls from Kafka in batches automatically.

Common Mistake: Setting async_insert = 1 without wait_for_async_insert. Inserts return immediately, before they’re flushed. If the server crashes between buffer and flush, data is lost. Use wait_for_async_insert = 1 for durability.

Fix 3: Use LowCardinality for String Enums

-- BAD: every row stores the full string
CREATE TABLE events (event_type String, country String, ...);

-- GOOD: dictionary-encoded
CREATE TABLE events (event_type LowCardinality(String), country LowCardinality(String), ...);

LowCardinality(String) stores strings as int dictionary keys. For fields with ≤ ~10K distinct values, this is 5-100x more compact and faster to filter.

Rule of thumb: use LowCardinality when you have:

  • Country codes
  • Event type enums
  • HTTP status codes
  • Service names
  • Region/zone identifiers
  • Any “category” string

Don’t use for:

  • User IDs (high cardinality)
  • URLs (often unique)
  • Free-form text

Pro Tip: You can convert later: ALTER TABLE events MODIFY COLUMN event_type LowCardinality(String). But ClickHouse rewrites the column — expensive on big tables. Set it correctly at table create.

Fix 4: Avoid Nullable Unless You Need It

-- Slow: Nullable adds overhead (separate bitmap)
CREATE TABLE events (revenue Nullable(Float64), ...);

-- Fast: use a sentinel value if NULL semantics aren't critical
CREATE TABLE events (revenue Float64 DEFAULT 0, ...);

Nullable(T) columns have ~2x storage overhead and slower filters. For numeric columns where 0 or -1 can serve as “no value,” skip Nullable.

For strings, use empty string instead of NULL:

CREATE TABLE events (referer String DEFAULT '', ...);

If you genuinely need NULL semantics (distinct from “empty”), use Nullable but be deliberate.

Fix 5: Match Engine to Use Case

EngineUse case
MergeTreeGeneral-purpose. Allows duplicates.
ReplacingMergeTree(version_col)Deduplicate on background merge.
SummingMergeTree(sum_col)Sum-aggregate on background merge.
AggregatingMergeTreeCustom aggregates via aggregate functions.
CollapsingMergeTree(sign)Mark rows as deleted (-1 sign).
VersionedCollapsingMergeTreeVersioned collapsing.

For event sourcing where you want the “latest version of each entity”:

CREATE TABLE users (
    user_id UInt64,
    name String,
    updated_at DateTime
) ENGINE = ReplacingMergeTree(updated_at)
ORDER BY user_id;

ClickHouse keeps the row with the highest updated_at for each user_id after background merges. Note: merges are async — you may see duplicates briefly. Use FINAL modifier (SELECT * FROM users FINAL) to force dedup at query time (slower).

For pre-aggregated rollups:

CREATE MATERIALIZED VIEW events_daily
ENGINE = SummingMergeTree()
ORDER BY (date, country, event_type)
AS SELECT
    toDate(event_time) AS date,
    country,
    event_type,
    count() AS event_count,
    sum(revenue) AS total_revenue
FROM events
GROUP BY date, country, event_type;

Each new INSERT to events triggers a small INSERT into events_daily. Background merges sum the matching rows. Queries against events_daily are massively faster than aggregating raw events.

Common Mistake: Querying a ReplacingMergeTree table without FINAL and expecting dedup. Without FINAL (or aggregate argMax), you see all versions.

Fix 6: Replication via ZooKeeper / Keeper

For HA, use ReplicatedMergeTree:

CREATE TABLE events ON CLUSTER my_cluster (...) ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/events',  -- ZK path
    '{replica}'                            -- Replica name
)
ORDER BY (event_time, user_id);

{shard} and {replica} are macros defined per-server in config.

Each shard has multiple replicas. INSERTs to one replica propagate to others. SELECTs can hit any replica.

For multi-shard tables (sharded by hash), wrap in Distributed:

CREATE TABLE events_dist ON CLUSTER my_cluster AS events
ENGINE = Distributed(my_cluster, default, events, rand());

Inserts to events_dist route to shards by rand() (or your hash). Queries fan out across all shards.

Note: ClickHouse Keeper is a built-in ZooKeeper replacement. Newer deployments should use Keeper — fewer dependencies, better performance, ClickHouse-native config.

For monitoring replication lag:

SELECT * FROM system.replication_queue WHERE table = 'events';

A growing queue means a replica is falling behind — investigate before it becomes critical.

Fix 7: HTTP vs Native Protocol

ClickHouse exposes two protocols:

  • Native (port 9000): Binary, used by official clickhouse-client and most language libraries. Fastest.
  • HTTP (port 8123): Plain HTTP with JSON or TSV responses. Easier to debug; works from any HTTP client.

For Python with the native driver:

from clickhouse_driver import Client

client = Client(host="localhost", port=9000, user="default", password="")
result = client.execute("SELECT count() FROM events")

For HTTP-based:

import requests
r = requests.post(
    "http://localhost:8123",
    params={"query": "SELECT count() FROM events FORMAT JSON"},
)
print(r.json())

For Node:

import { createClient } from "@clickhouse/client";

const client = createClient({ url: "http://localhost:8123" });
const result = await client.query({
  query: "SELECT count() FROM events",
  format: "JSON",
});
console.log(await result.json());

The official JS client uses HTTP under the hood. There’s no widely-used native Node driver.

Pro Tip: For high-throughput INSERT, prefer native (Python). For occasional queries from a web app, HTTP is fine.

Fix 8: Common Query Patterns

Time-bucketed counts:

SELECT
    toStartOfHour(event_time) AS hour,
    count() AS events
FROM events
WHERE event_time >= now() - INTERVAL 1 DAY
GROUP BY hour
ORDER BY hour;

Top-N per group:

SELECT country, event_type, count() AS c
FROM events
WHERE event_time >= today()
GROUP BY country, event_type
ORDER BY c DESC
LIMIT 10 BY country;

LIMIT N BY column is a ClickHouse extension — returns top N per group, more efficient than window functions.

Percentile:

SELECT
    quantile(0.99)(latency_ms) AS p99,
    quantile(0.95)(latency_ms) AS p95,
    quantile(0.50)(latency_ms) AS p50
FROM requests
WHERE event_time >= now() - INTERVAL 1 HOUR;

ClickHouse’s quantile is approximate by default — fast on huge tables. For exact, use quantileExact.

JOIN large tables:

-- Use the smaller table on the right:
SELECT users.name, events.event_type
FROM events
INNER JOIN users ON users.id = events.user_id
WHERE events.event_time >= today();

ClickHouse builds a hash table from the right side. Put the smaller table there.

For very large joins where neither side fits in memory:

SET join_algorithm = 'partial_merge';  -- Disk-based merge join

Still Not Working?

A few less-obvious failures:

  • Too many parts error. Background merges can’t keep up. Throttle inserts, increase batch size, or check disk I/O. system.parts shows part counts per table.
  • OOM during a large query. Set max_memory_usage per query. For aggregations, enable max_bytes_before_external_group_by to spill to disk.
  • DB::Exception: Code: 159. Timeout exceeded. Increase max_execution_time or break the query into smaller chunks.
  • Replication queue grows unbounded. Network split or one replica is down. Restart the lagging replica or remove it temporarily.
  • Code: 252. DB::Exception: Too many partitions. Each partition is a directory. Don’t partition by hour if you have years of data — partition by month or week instead.
  • SELECT returns duplicate rows after INSERT. ReplacingMergeTree hasn’t merged yet. Use FINAL or argMax if you need dedupe immediately.
  • Cannot bind to address. Port conflict. ClickHouse defaults to 8123 (HTTP) and 9000 (Native). Other tools (Portainer, MinIO) use 9000 too.
  • High cardinality LowCardinality field warning. If a column has >10K distinct values, LowCardinality adds overhead. ClickHouse can warn or error depending on settings — check low_cardinality_max_dictionary_size.

For related analytics and OLAP database issues, see DuckDB not working, Postgres slow query, MySQL slow query optimization, and Postgres index not used.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles