Fix: PostgreSQL Slow Query — Finding and Optimizing with EXPLAIN ANALYZE

Q: How do I fix "PostgreSQL Slow Query — Finding and Optimizing with EXPLAIN ANALYZE"?

How to diagnose and fix slow PostgreSQL queries — reading EXPLAIN ANALYZE output, adding the right indexes, fixing N+1 queries, optimizing joins, and using pg_stat_statements.

The Problem

A PostgreSQL query runs much slower than expected:

SELECT * FROM orders WHERE customer_id = 42;
-- Takes 8 seconds on a table with 10 million rows

Or an application endpoint is slow and tracing shows the database query is the bottleneck:

GET /api/orders/customer/42  — 8234ms
  DB: SELECT * FROM orders WHERE customer_id = 42  — 7891ms

Or a query was fast in development but is slow in production with real data volumes.

Why This Happens

Slow queries almost always trace back to one of three things: a missing index, a bad query plan, or excessive data transfer. Understanding which one you’re dealing with determines the fix.

Missing indexes are the most common cause. Without an index on the WHERE clause column, PostgreSQL must perform a sequential scan — reading every single row in the table to find matches. On a 10-million-row table, this means reading gigabytes of data to return a handful of rows. The database isn’t broken; it’s doing exactly what you asked. You just didn’t give it a shortcut. PostgreSQL won’t create indexes for you. Unlike some managed databases that suggest indexes, vanilla PostgreSQL relies entirely on the developer to identify which columns need indexing based on query patterns.

Bad query plans happen when PostgreSQL’s planner makes a suboptimal choice. The planner uses table statistics (row counts, value distributions, correlation between columns) to estimate costs and choose between sequential scans, index scans, hash joins, merge joins, and nested loops. If those statistics are stale — because ANALYZE hasn’t run recently or autovacuum is lagging — the planner works with wrong assumptions. It might choose a sequential scan on a large table because it thinks the table has 100 rows instead of 10 million. Or it might choose a nested loop join when a hash join would be orders of magnitude faster, because it underestimates the number of rows in the outer relation.

Excessive data transfer is subtler. Queries that return more columns than needed (SELECT * when you only need id and status), fetch more rows than necessary (missing LIMIT), or use OFFSET for deep pagination force PostgreSQL to process and transfer data the application will discard. Combined with network latency between the application and database, this wasted work adds up quickly. N+1 query patterns — where the application makes one query per row in a loop instead of a single JOIN — compound the problem by multiplying both query overhead and round-trip latency.

How Other Tools Handle This

Every database has query profiling tools, but they differ significantly in output format, available metrics, and optimization strategies. Knowing the equivalents helps when migrating between databases or diagnosing issues in polyglot architectures.

MySQL’s EXPLAIN serves the same purpose as PostgreSQL’s EXPLAIN ANALYZE but produces different output. MySQL shows a tabular format with one row per table in the query, listing the access type (ALL for full scan, ref for index lookup, range for range scan), the key used, and estimated rows. MySQL 8.0+ supports EXPLAIN FORMAT=JSON and EXPLAIN ANALYZE (added in 8.0.18), which outputs a tree format closer to PostgreSQL’s. One key difference: MySQL’s query optimizer is more aggressive about using indexes but less sophisticated about join reordering. PostgreSQL generally produces better plans for complex multi-table joins, while MySQL can outperform on simple lookup queries because its B-tree implementation is tightly integrated with InnoDB’s clustered index.

SQL Server uses execution plans viewed through SSMS (SQL Server Management Studio) or the SET STATISTICS PROFILE ON command. Plans are graphical by default — a visual tree of operators with tooltips showing estimated vs. actual rows, cost percentages, and warnings. SQL Server’s optimizer is highly advanced, with features like adaptive joins (switching between nested loop and hash join at runtime based on actual row counts) and automatic plan correction that PostgreSQL lacks. However, SQL Server’s plan cache can cause “parameter sniffing” issues — a plan compiled for one parameter value performs poorly for others — which rarely affects PostgreSQL because it re-plans parameterized queries more frequently.

MongoDB’s explain() works at the document level. The output shows whether the query used an index (IXSCAN) or a full collection scan (COLLSCAN), and for aggregation pipelines, it shows the stages executed. MongoDB’s optimizer is simpler than PostgreSQL’s — there are no join algorithms because MongoDB doesn’t support joins natively (the $lookup stage runs as a nested loop). The equivalent of pg_stat_statements is MongoDB’s profiler (db.setProfilingLevel(2)), which logs all queries above a configurable threshold to the system.profile collection.

For long-term monitoring, PostgreSQL’s pg_stat_statements extension is the gold standard for finding slow queries across an application. MySQL has the slow_query_log (file-based, requires post-processing with mysqldumpslow or pt-query-digest). AWS provides Performance Insights for RDS instances across all engines, which surfaces top SQL by load and wait events — useful when you can’t install extensions like pg_stat_statements on managed instances.

Step 1: Run EXPLAIN ANALYZE

EXPLAIN ANALYZE is the starting point for every slow query investigation. It shows the query plan and actual execution times:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

Sample output:

Seq Scan on orders  (cost=0.00..250000.00 rows=150 width=128) (actual time=0.042..7823.443 rows=150 loops=1)
  Filter: (customer_id = 42)
  Rows Removed by Filter: 9999850
Planning Time: 0.124 ms
Execution Time: 7823.512 ms

Key things to look for:

Node type	Meaning
`Seq Scan`	Full table scan — reads every row. Fast for small tables, very slow for large ones.
`Index Scan`	Uses an index to jump directly to matching rows. Usually fast.
`Index Only Scan`	All needed data is in the index — fastest.
`Hash Join`	Builds a hash table from one side, probes with the other. Good for large joins.
`Nested Loop`	For each row of outer, scans inner. Can be slow if inner is large.
`Bitmap Heap Scan`	Uses an index to find matching pages, then reads those pages.

The Seq Scan with 9,999,850 rows removed means PostgreSQL is reading the entire 10M row table to find 150 matching rows. An index would fix this.

Fix 1: Add a Missing Index

For equality lookups, range queries, and foreign key columns, an index dramatically reduces scan cost:

-- The slow query
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders... Execution Time: 7823.512 ms

-- Add an index on the filter column
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

-- Now run the same query
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using idx_orders_customer_id on orders...  Execution Time: 0.234 ms

CONCURRENTLY creates the index without locking the table — safe for production. Without it, the table is locked for writes during index creation.

Composite indexes — when queries filter on multiple columns:

-- Query filtering on status AND created_at
SELECT * FROM orders WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days';

-- Single-column indexes help but a composite index is more efficient
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC);

-- Column order matters: put the equality column first, range column second

Partial indexes — for queries that filter on a specific value:

-- If most queries only look at 'pending' orders
CREATE INDEX CONCURRENTLY idx_orders_pending
ON orders (customer_id, created_at)
WHERE status = 'pending';
-- Much smaller index — only indexes rows where status = 'pending'

Don’t add indexes blindly. Each index slows down INSERT, UPDATE, and DELETE operations and consumes disk space. Add indexes for columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses of slow queries.

Fix 2: Read EXPLAIN Output for Joins

Slow joins are another common cause of slow queries. Look for Nested Loop on large tables:

EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending';

Nested Loop  (cost=...) (actual time=0.124..45231.223 rows=50000 loops=1)
  ->  Seq Scan on orders o  (actual time=0.021..5231.234 rows=50000 loops=1)
        Filter: (status = 'pending')
  ->  Index Scan using customers_pkey on customers c  (actual time=0.001..0.001 rows=1 loops=50000)
        Index Cond: (id = o.customer_id)

The nested loop runs 50,000 times — once per pending order. Each lookup is fast (index scan), but 50,000 lookups add up.

Fix — ensure the join column is indexed (foreign key):

-- If orders.customer_id doesn't have an index, every inner loop is a seq scan
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

Fix — add index on the filter column:

CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
-- Now PostgreSQL can use an index scan instead of seq scan for the outer loop

Fix 3: Fix N+1 Query Problems

N+1 queries — making one query to get N records, then N more queries to get related data — are invisible to PostgreSQL but cause massive slowness in applications:

# N+1 in Python/SQLAlchemy — looks innocent, runs 1001 queries
customers = session.query(Customer).limit(100).all()
for customer in customers:
    orders = session.query(Order).filter_by(customer_id=customer.id).all()
    # 1 query for customers + 100 queries for orders = 101 queries total

Fix — use a JOIN to fetch all data in one query:

# SQLAlchemy — eager load with joinedload
from sqlalchemy.orm import joinedload

customers = (
    session.query(Customer)
    .options(joinedload(Customer.orders))
    .limit(100)
    .all()
)
# 1 query total — customers and orders fetched together

-- Direct SQL equivalent
SELECT c.*, o.*
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
LIMIT 100;

In TypeORM:

// N+1 — DON'T do this
const customers = await customerRepo.find({ take: 100 });
for (const customer of customers) {
  customer.orders = await orderRepo.find({ where: { customerId: customer.id } });
}

// Correct — eager load with relations
const customers = await customerRepo.find({
  take: 100,
  relations: { orders: true },  // JOIN in one query
});

Fix 4: Use pg_stat_statements to Find Slow Queries

Instead of profiling individual queries, use pg_stat_statements to find the worst-performing queries across your entire application:

-- Enable the extension (requires superuser, needs postgresql.conf change + restart)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- In postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.track = all

-- Find the slowest queries by total time
SELECT
  round(total_exec_time::numeric, 2) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

This shows you which queries collectively consume the most time — the best targets for optimization. A query that runs 10ms but is called 100,000 times/hour (1000 total seconds) is a better optimization target than a 5-second query called once a day.

Reset statistics after optimizing:

SELECT pg_stat_statements_reset();

Fix 5: Optimize Queries with LIMIT and Pagination

Fetching all matching rows when you only need the first page is wasteful:

-- SLOW — fetches and sorts all 1M rows to return 20
SELECT * FROM posts ORDER BY created_at DESC;

-- FAST — fetches only 20 rows
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20;

-- Keyset pagination (faster than OFFSET for large pages)
-- Get next page after the last seen id
SELECT * FROM posts
WHERE created_at < $last_seen_created_at
ORDER BY created_at DESC
LIMIT 20;

Avoid OFFSET for large page numbers:

-- SLOW — must scan and discard 100,000 rows to reach page 5001
SELECT * FROM posts ORDER BY id OFFSET 100000 LIMIT 20;

-- FAST — use keyset pagination instead
SELECT * FROM posts WHERE id < $last_seen_id ORDER BY id DESC LIMIT 20;

Fix 6: Optimize with Covering Indexes (Index Only Scans)

An Index Only Scan — where all needed columns are in the index — is the fastest possible read:

-- Query only needs id, status, created_at
SELECT id, status, created_at FROM orders WHERE customer_id = 42;

-- Regular index — still needs to fetch table rows for status and created_at
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Covering index — includes all needed columns
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id)
INCLUDE (status, created_at);
-- Now: Index Only Scan — no table access needed

EXPLAIN ANALYZE SELECT id, status, created_at FROM orders WHERE customer_id = 42;
-- Index Only Scan using idx_orders_customer_covering on orders...
-- Heap Fetches: 0   ← No table rows accessed at all

Fix 7: Check for Missing Statistics and Vacuum

If PostgreSQL’s query planner makes bad decisions (choosing a seq scan when an index would be faster), it may have stale statistics:

-- Check table statistics age
SELECT schemaname, tablename, last_vacuum, last_analyze, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manually analyze (update statistics)
ANALYZE orders;

-- Vacuum to remove dead tuples and reclaim space
VACUUM ANALYZE orders;

If a table has many dead tuples (from frequent updates/deletes), seq scans slow down because PostgreSQL reads dead rows too. VACUUM removes dead tuples.

Set autovacuum more aggressively for high-churn tables:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,   -- Vacuum when 1% of rows are dead (default 20%)
  autovacuum_analyze_scale_factor = 0.005  -- Analyze when 0.5% of rows change
);

Still Not Working?

Force an index to test if PostgreSQL is ignoring it:

-- Temporarily disable seq scan to force index use
SET enable_seqscan = OFF;
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
SET enable_seqscan = ON;

-- If the query is faster with the index forced, PostgreSQL's statistics are wrong
-- Run ANALYZE orders; to fix the statistics

Check if the index is actually being used:

SELECT indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'orders';
-- idx_scan = 0 means the index has never been used

Identify table bloat — a table with many dead rows acts as if it’s larger than it is:

SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS total_size,
  pg_size_pretty(pg_relation_size(tablename::regclass)) AS table_size,
  n_dead_tup AS dead_rows
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Check for implicit type casts that prevent index use. PostgreSQL will not use an index on customer_id (integer) if the query compares it to a string: WHERE customer_id = '42'. While PostgreSQL can implicitly cast in some cases, mismatched types on indexed columns — especially with varchar vs text, or timestamp vs timestamptz — can silently force a sequential scan. Use pg_typeof() to verify column types and ensure filter values match.

Look for correlated subqueries hiding in SELECT clauses. A subquery in the SELECT list runs once per row of the outer query, creating an invisible N+1 at the database level:

-- Hidden N+1 — subquery runs once per order row
SELECT o.id, o.total,
  (SELECT c.name FROM customers c WHERE c.id = o.customer_id) AS customer_name
FROM orders o;

-- Fix — use a JOIN instead
SELECT o.id, o.total, c.name AS customer_name
FROM orders o
JOIN customers c ON c.id = o.customer_id;

Check work_mem for large sort and hash operations. If a query involves Sort or Hash nodes that spill to disk (look for Sort Method: external merge in EXPLAIN output), increasing work_mem for that session can dramatically speed up the query. Set it per-session with SET work_mem = '256MB'; rather than globally, as a high global value multiplied by many concurrent connections can exhaust server memory.