Fix: BigQuery Not Working — Partitioning, Slots, Streaming Inserts, and Cost Surprises

Q: How do I fix "BigQuery Not Working — Partitioning, Slots, Streaming Inserts, and Cost Surprises"?

How to fix BigQuery errors — query scans full table without partition filter, slot contention with on-demand pricing, streaming insert quota exceeded, DML row limits, service account auth, and INFORMATION_SCHEMA for monitoring.

The Error

You query a partitioned table and it scans the entire 10 TB:

SELECT count() FROM `myproject.events.raw`
WHERE user_id = 42;
-- Bytes processed: 10.2 TB (full scan)

Or streaming inserts fail with quota:

Quota exceeded: Your project exceeded quota for streaming insert API.

Or MERGE/UPDATE/DELETE errors with too many partitions:

Cannot update more than 4000 partitions in a single MERGE statement.

Or the service account auth fails:

google.api_core.exceptions.PermissionDenied: 403 Permission denied: 
User does not have bigquery.tables.getData permission for table myproject.events.raw.

Why This Happens

BigQuery is a serverless data warehouse with unique pricing and operational characteristics:

Pay per scan. On-demand pricing charges by bytes scanned. A query without a partition filter scans the whole table — expensive and slow.
Partitioning and clustering reduce scans. A query with WHERE event_date = '2026-05-20' on a date-partitioned table scans one day, not the whole history. Cluster keys provide finer-grain pruning.
Streaming inserts have separate quotas. API requests/sec, bytes/sec — independent of batch loads.
DML has partition limits. A single UPDATE or DELETE can touch only N partitions (configurable, default 4000).
IAM is finer-grained than other DBs. Project access ≠ dataset access ≠ table access. Permissions like bigquery.tables.getData are separate from bigquery.jobs.create.

The pricing model is the single most surprising thing for engineers coming from row-store databases. BigQuery is columnar and serverless, so the question “how expensive is this query” is answered by “how many bytes did the columns referenced occupy in the scanned partitions.” A SELECT * against an unpartitioned table reads every column from every row, even if a filter eliminates most rows downstream — the filter happens after the scan, so the bill is the same. The fix is twofold: only SELECT the columns you need, and design partitions/clusters so the partition pruning eliminates whole files before scan. Once those two habits land, most cost surprises disappear.

The second surprise is the lifecycle of streaming data. Rows inserted via the legacy streaming API live in a write-optimized buffer for several minutes before merging into long-term storage. DML against that buffer is restricted (you cannot DELETE from it), exports skip it, and queries against _PARTITIONTIME IS NULL are the only way to see buffered rows. The Storage Write API replaces this with a streaming model that’s queryable in seconds and DML-compatible immediately, which is one of several reasons new code should prefer it over insert_rows_json. If you have streaming pipelines older than a couple of years, migration to Storage Write often pays for itself in reduced operational confusion alone.

Fix 1: Partition and Cluster Your Tables

CREATE TABLE `myproject.events.raw` (
  event_time TIMESTAMP,
  user_id INT64,
  event_type STRING,
  payload JSON
)
PARTITION BY DATE(event_time)
CLUSTER BY user_id, event_type
OPTIONS (
  partition_expiration_days = 365,
  require_partition_filter = TRUE
);

PARTITION BY DATE(event_time) creates one partition per day. Queries with WHERE event_time >= TIMESTAMP("2026-05-01") scan only matching partitions.

CLUSTER BY user_id, event_type sorts each partition by those columns. Predicates on cluster keys further reduce scan (BigQuery uses block-level pruning).

require_partition_filter = TRUE forces queries to include a partition filter — saves you from accidental full scans:

-- Fails: no partition filter
SELECT * FROM `myproject.events.raw` WHERE user_id = 42;
-- Error: Cannot query over table 'myproject.events.raw' without a filter over column(s)

-- Works:
SELECT * FROM `myproject.events.raw`
WHERE DATE(event_time) = "2026-05-20" AND user_id = 42;

For tables already created without partitioning, create a new partitioned table and INSERT INTO ... SELECT ... FROM old_table. You can’t add partitioning in place.

Pro Tip: For time-series, partition by event_time truncated to DAY or MONTH. For high-volume tables (billions of rows/day), use HOUR partitioning.

Fix 2: Use Standard SQL (Not Legacy)

BigQuery has two SQL dialects:

Standard SQL (recommended) — ANSI-compliant, modern features.
Legacy SQL — deprecated, but still around for backward compat.

In the BigQuery Console, set query options to Standard SQL. In code:

from google.cloud import bigquery

client = bigquery.Client()
job = client.query(
    "SELECT count() FROM `myproject.events.raw` WHERE DATE(event_time) = CURRENT_DATE()",
    job_config=bigquery.QueryJobConfig(
        use_legacy_sql=False,  # Standard SQL — usually the default in new SDKs
    ),
)
results = job.result()

Differences worth knowing:

Backtick table names: `project.dataset.table` (Standard) vs [project:dataset.table] (Legacy).
JOIN syntax: Standard supports JOIN ON; Legacy used different syntax.
Subqueries: Standard supports correlated subqueries.

If you see “Legacy SQL” anywhere in your code or scripts, migrate.

Fix 3: Stream Inserts in Batches

Streaming insert API has quotas:

~100 MB per project per second (per region).
Per-table: variable based on size and partitioning.

For high-throughput streaming, batch on the client side:

from google.cloud import bigquery

client = bigquery.Client()
table_ref = client.dataset("events").table("raw")

# Accumulate up to 500 rows per request:
batch = []
for event in event_stream:
    batch.append({
        "event_time": event.timestamp.isoformat(),
        "user_id": event.user_id,
        "event_type": event.type,
    })
    if len(batch) >= 500:
        errors = client.insert_rows_json(table_ref, batch)
        if errors:
            print("Errors:", errors)
        batch = []

# Flush remaining:
if batch:
    client.insert_rows_json(table_ref, batch)

For higher throughput than streaming allows, use the BigQuery Storage Write API:

from google.cloud.bigquery_storage_v1 import BigQueryWriteClient
from google.cloud.bigquery_storage_v1.types import WriteStream

write_client = BigQueryWriteClient()
# Storage Write API — higher throughput, exactly-once, lower cost.

Storage Write API is the modern replacement — better throughput, exactly-once semantics, lower per-row cost. For new code, prefer it over insert_rows_json.

For batch loads (millions of rows from CSV/Parquet/JSON in GCS):

job_config = bigquery.LoadJobConfig(
    source_format=bigquery.SourceFormat.PARQUET,
    write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
)
load_job = client.load_table_from_uri(
    "gs://my-bucket/events/*.parquet",
    table_ref,
    job_config=job_config,
)
load_job.result()  # Wait for completion

Batch loads are free (vs streaming inserts, which cost). For non-realtime data, always use batch loads.

Pro Tip: Streaming inserts have a 24-hour delay before data is available in some DML/export operations. Batch loads are queryable immediately.

Fix 4: DML Partition Limits

UPDATE/DELETE/MERGE can touch only a limited number of partitions per statement (4000 by default).

For large updates across many partitions, split into batches:

-- Bad: touches 5000 partitions
MERGE `events.raw` T
USING `events.updates` U
ON T.id = U.id
WHEN MATCHED THEN UPDATE SET T.status = U.status;
-- Error: too many partitions

-- Good: target one month at a time
DECLARE month DATE DEFAULT "2026-01-01";

WHILE month < "2027-01-01" DO
  MERGE `events.raw` T
  USING `events.updates` U
  ON T.id = U.id
    AND DATE(T.event_time) BETWEEN month AND DATE_ADD(month, INTERVAL 1 MONTH)
  WHEN MATCHED THEN UPDATE SET T.status = U.status;
  SET month = DATE_ADD(month, INTERVAL 1 MONTH);
END WHILE;

For very large DML, consider rewriting:

Replace partitions via INSERT OVERWRITE PARTITION (Standard SQL doesn’t have this directly; use MERGE WHEN NOT MATCHED THEN INSERT patterns).
Recreate the table from a SELECT — sometimes faster than incremental updates for huge changes.

Common Mistake: Running UPDATE table SET col = ... without a WHERE clause. Touches every partition; usually exceeds the limit. Always scope by partition column.

Fix 5: Service Account Authentication

For server-side code, use a service account with the right roles:

# Roles needed:
- roles/bigquery.dataViewer       # Read tables
- roles/bigquery.dataEditor       # Write to tables
- roles/bigquery.jobUser          # Submit queries
- roles/bigquery.user             # Read metadata + submit

In code:

from google.cloud import bigquery
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file(
    "service-account.json",
    scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = bigquery.Client(credentials=credentials, project="my-project")

For Workload Identity Federation (no JSON key files needed — recommended):

# In GKE / Cloud Run / GCE — Auto-detected:
gcloud auth application-default login

# Then in Python — uses Application Default Credentials:
client = bigquery.Client(project="my-project")

For tighter scoping, grant access at the dataset level instead of project:

GRANT `roles/bigquery.dataViewer`
ON SCHEMA `myproject.events`
TO "serviceAccount:[email protected]";

Common Mistake: Granting project-level roles/bigquery.admin to all your services. Scope to specific datasets — leaks if your service is compromised.

Fix 6: Monitor Query Cost Before Running

Before running an expensive query, get an estimate:

job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
job = client.query(
    "SELECT * FROM `myproject.events.raw` WHERE user_id = 42",
    job_config=job_config,
)
print(f"This query will process {job.total_bytes_processed / 1e9:.1f} GB")
# Doesn't actually run; just estimates.

Or in the BigQuery Console: the query editor shows “This query will process X bytes” before you click Run.

For programmatic monitoring of past queries:

SELECT
  user_email,
  total_bytes_processed,
  total_slot_ms,
  query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
  AND job_type = 'QUERY'
ORDER BY total_bytes_processed DESC
LIMIT 20;

INFORMATION_SCHEMA.JOBS_BY_PROJECT is your view into query history — sliceable by user, by hour, by cost.

Pro Tip: Set a per-query byte limit via maximum_bytes_billed:

job_config = bigquery.QueryJobConfig(maximum_bytes_billed=100 * 1024**3)  # 100 GB
# Query fails if it would exceed 100 GB scanned.

This protects against accidental expensive queries.

Fix 7: Materialized Views and BI Engine

For dashboards that hit the same aggregation hundreds of times a day, use materialized views:

CREATE MATERIALIZED VIEW `myproject.events.daily_summary`
PARTITION BY date
CLUSTER BY country
AS
SELECT
  DATE(event_time) AS date,
  country,
  event_type,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_id) AS unique_users
FROM `myproject.events.raw`
GROUP BY date, country, event_type;

BigQuery maintains the materialized view incrementally — new inserts to events.raw trigger small updates to the view. Queries against the view are much faster and cheaper than the equivalent against raw.

For low-latency dashboards, also enable BI Engine (in-memory analysis):

Console → BI Engine → Add reservation → 1 GB / 10 GB / 100 GB per region.
BI Engine caches hot data in memory; queries against cached data are millisecond-latency.

Cost: per GB-hour of reservation. For 10 GB at $0.0416/GB/hr, that’s ~$30/month — much cheaper than running the queries against on-demand pricing for high-traffic dashboards.

Fix 8: Slot Reservations vs On-Demand

On-demand pricing — pay per TB scanned, no slot management. Best for sporadic workloads.
Slot reservations — buy slots (capacity) per hour/month, queries share the reservation. Best for predictable workloads.

For slot reservations:

# 100 slots for $4/hour:
gcloud bigquery reservations create my-reservation \
  --location=US \
  --slots=100 \
  --autoscale-max-slots=200

Then assign to projects:

gcloud bigquery reservation-assignments create \
  --reservation=my-reservation \
  --assignee-id=my-project \
  --assignee-type=PROJECT \
  --job-type=QUERY

--autoscale-max-slots lets the reservation grow during peak load.

For workloads doing heavy daily aggregation, reservations are 30-70% cheaper than equivalent on-demand. For workloads with rare bursts, on-demand is simpler.

Note: BigQuery editions (Standard, Enterprise, Enterprise Plus) replaced “flat-rate” pricing in 2023. Standard is similar to old flat-rate; Enterprise adds features like CMEK and column-level security.

BigQuery vs Snowflake, Databricks SQL, Redshift, ClickHouse, and DuckDB Cloud

Picking a cloud data warehouse comes down to query model, pricing shape, and where your data lives.

BigQuery is fully serverless. There are no clusters to start, stop, or size; you submit a query, it runs against allocated slots, and you pay per byte scanned (on-demand) or per slot-hour (reservations). The strength is operational simplicity — no capacity planning — and the integration with GCP-native sources (Cloud Storage, Pub/Sub, Dataflow). The weakness is that workloads with unpredictable cost shape can produce surprise bills, which is why maximum_bytes_billed and partition filters matter.

Snowflake is the closest competitor. Virtual warehouses (XS, S, M…) are compute clusters you size and pause; storage is decoupled and billed separately. Snowflake’s killer features are zero-copy clones, time travel, and excellent multi-account data sharing (Snowflake Marketplace, secure data shares). Bills are more predictable than BigQuery’s on-demand because you control warehouse size, but you pay even when warehouses are idle if auto-suspend is misconfigured. Choose Snowflake when you want predictable per-cluster pricing, when data sharing across organizations matters, or when you’re multi-cloud and want a single SQL engine across AWS/Azure/GCP.

Databricks SQL is the SQL surface on top of the Databricks lakehouse. The differentiator is that the same engine (Photon) runs both BI queries and ML/Spark workloads against open Delta Lake tables, so data scientists and analysts share storage without ETL. If your stack already includes Spark notebooks for ML or transformation, Databricks SQL is the path of least resistance for analytics. If you’re SQL-only, BigQuery or Snowflake usually beat it on simplicity.

Amazon Redshift is the AWS-native warehouse. RA3 nodes decouple storage from compute, and Redshift Serverless gives an auto-scaling experience similar to BigQuery’s. Redshift is the right call when you’re deeply embedded in AWS (Glue, S3, Kinesis) and want pricing inside an existing AWS commit. The cost is that Redshift’s SQL flavor and operational quirks lag behind BigQuery and Snowflake on niceties — query optimization, materialized view refresh semantics, and JSON handling all feel a half-generation behind.

ClickHouse Cloud is the OLAP outlier. It is open-source ClickHouse offered as a managed service, with single-digit-millisecond latency on aggregations over billions of rows. Use it for user-facing analytics dashboards (think “show this report to every logged-in user”) where BigQuery’s query latency (seconds) is too high. Trade-off: ClickHouse’s SQL is its own dialect, the join model is less flexible, and it’s not a drop-in replacement for a general-purpose warehouse.

MotherDuck (DuckDB Cloud) is the up-and-comer. DuckDB runs the engine, you point it at Parquet in S3/GCS, and you can run hybrid queries that combine local DuckDB tables with cloud data. The sweet spot is small-to-medium warehouses where the dataset fits in tens of GB and you want laptop-style ergonomics with cloud-scale storage. For multi-TB warehouses, it is not yet a replacement for BigQuery or Snowflake.

Practical guidance: BigQuery for serverless on GCP, Snowflake for predictable multi-cloud, Databricks SQL when ML and analytics share a lakehouse, Redshift inside AWS commit deals, ClickHouse for low-latency analytics products, MotherDuck for small-warehouse ergonomics.

Still Not Working?

A few less-obvious failures:

Resources exceeded during query execution. Query needs more memory than available. Reduce JOIN cardinality, add filters, or move to slot reservations.
Slot contention — queries queue. INFORMATION_SCHEMA.JOBS_BY_PROJECT shows state = "PENDING". Add reservation or reduce concurrent jobs.
bigquery.tables.getData denied. Need dataset-level grant, not just project-level. Check IAM at the dataset.
Streaming insert appears successful but data is missing. Streaming buffer takes a few seconds to flush. Queries with WHERE _PARTITIONTIME IS NULL see the buffer; default queries see only flushed data.
Wildcard table query slow. FROM \events.*`scans all matched tables. Add_TABLE_SUFFIXfilter:WHERE _TABLE_SUFFIX BETWEEN ‘20260101’ AND ‘20260131’`.
JOIN explosion. Cartesian product blew up. Inspect EXPLAIN ... (via BigQuery’s query plan in the UI) for unexpected joins.
CTAS doesn’t preserve partitioning. CREATE TABLE X AS SELECT ... defaults to unpartitioned. Specify PARTITION BY and CLUSTER BY explicitly:

CREATE TABLE `new` PARTITION BY DATE(event_time) CLUSTER BY user_id AS
SELECT * FROM `old`;

UNIQUE constraint doesn’t exist. BigQuery doesn’t enforce uniqueness. Dedup via ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) and filter.
Query cache silently disabled by non-deterministic functions. Calling CURRENT_TIMESTAMP() or RAND() disables the cached-result optimization for the whole query. If you depend on cache hits for a dashboard, parameterize the timestamp instead of calling the function inline.
External Sheets/Drive tables fail intermittently. Federated queries against Google Sheets share a quota with Drive APIs and rate-limit at scale. For high-volume reads, materialize the sheet into a native BigQuery table on a schedule.
SAFE. functions silently mask errors. SAFE.CAST(x AS INT64) returns NULL on failure instead of erroring. Useful for cleaning dirty data, dangerous when you actually want to know that 20% of rows failed conversion — log a count of NULLs after every SAFE conversion in dev.

For related data warehouse and analytics issues, see DuckDB not working, ClickHouse not working, Postgres slow query, and Pandas SettingWithCopyWarning.