Fix: AWS SQS Not Working — Messages Not Received, Duplicate Processing, or DLQ Filling Up
Part of: JavaScript & TypeScript Errors
Quick Answer
How to fix AWS SQS issues — visibility timeout, message not delivered, duplicate messages, Dead Letter Queue configuration, FIFO queue ordering, and Lambda trigger problems.
The Problem
Messages are sent to SQS but never received by the consumer:
// Producer sends successfully
await sqs.sendMessage({
QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123/my-queue',
MessageBody: JSON.stringify({ orderId: '123' }),
}).promise();
// MessageId returned — but consumer never sees the messageOr the same message is processed multiple times despite being deleted:
// Consumer processes and deletes the message
await sqs.deleteMessage({
QueueUrl: queueUrl,
ReceiptHandle: message.ReceiptHandle,
}).promise();
// But the same message appears again 30 seconds laterOr messages go to the Dead Letter Queue immediately without being processed:
DLQ is filling up, but the main queue consumer shows no errorsOr a Lambda function triggered by SQS processes messages out of order.
Why This Happens
SQS has several behaviors that differ from traditional message queues. Most production problems trace back to four root causes:
- Visibility timeout too short — when a consumer receives a message, it becomes invisible to other consumers for the
VisibilityTimeoutperiod. If processing takes longer, the message becomes visible again and is delivered a second time. - Consumer not polling —
ReceiveMessageonly returns messages that are currently visible. If polling stops (consumer crash, rate limiting), messages queue up but aren’t processed. - At-least-once delivery — standard SQS queues guarantee at-least-once delivery. Even with successful deletion, rare cases can deliver the message twice. Your processing must be idempotent.
- DLQ
maxReceiveCounttoo low — ifmaxReceiveCountis 1, any processing failure sends the message to the DLQ immediately without retry. - Wrong queue URL or IAM permissions — sending to the wrong URL or missing
sqs:ReceiveMessagepermissions causes silent failures.
The second layer of confusion is that SQS has evolved heavily, and the behavior you read about on Stack Overflow in 2018 is not the behavior of the queue you provisioned this week. Knowing which feature shipped with which release decides which fix actually applies to your setup.
SQS Version History — What Shipped When
SQS was one of the first AWS services (launched 2006), but the features developers reach for today are much newer. Many “SQS not working” threads online predate the feature you’re trying to use.
- November 2016 — FIFO queues. Originally SQS only offered Standard queues with at-least-once delivery and no ordering guarantees. FIFO (
.fifosuffix) added exactly-once processing and strict ordering within aMessageGroupId. Pre-2016 articles that say “SQS cannot guarantee order” are wrong if you’re on FIFO. - November 2018 — server-side encryption with KMS. SSE-SQS arrived later in 2020 as a managed option. If a queue still uses an old KMS key with restrictive policies, consumers can receive
KMS.AccessDeniedExceptioninstead of messages. - November 2019 — long polling default of 20s on console-created queues. Older Terraform/CloudFormation templates often set
ReceiveMessageWaitTimeSeconds = 0(short polling). Empty receives still bill, so audit older queues. - November 2020 — partial batch failures for Lambda triggers. Before this, a single failed record in a batch of 10 forced the entire batch to retry. Set
ReportBatchItemFailuresand returnbatchItemFailuresto retry only the failures. - November 2021 — high throughput for FIFO. Originally FIFO was capped at 300 TPS (3,000 with batching). High-throughput FIFO raised the per-API-action limit to 9,000 messages/sec in some regions. If you migrated from Standard to FIFO and saw a throughput collapse, you may not have opted into high-throughput mode.
- March 2023 — Dead Letter Queue redrive API (
StartMessageMoveTask). Before this, “redriving” a DLQ back to the source queue meant the DLQ Redrive page in the console or a homegrown Lambda. The new API is scriptable and rate-limited per call. - April 2024 — FIFO dead-letter queue redrive. The original redrive API only supported Standard queues at launch. FIFO support was added later.
The 256KB maximum message size has not changed since launch. If you need to send a larger payload, use the Extended Client Library pattern: store the body in S3 and put a pointer in the SQS message. Articles that suggest “compress the payload” miss that 256KB is a hard service limit, not a guideline.
Fix 1: Configure Visibility Timeout Correctly
The visibility timeout must be longer than your maximum processing time:
const { SQSClient, ReceiveMessageCommand, ChangeMessageVisibilityCommand, DeleteMessageCommand } = require('@aws-sdk/client-sqs');
const sqs = new SQSClient({ region: 'us-east-1' });
const QUEUE_URL = 'https://sqs.us-east-1.amazonaws.com/123456789/my-queue';
async function processMessages() {
const response = await sqs.send(new ReceiveMessageCommand({
QueueUrl: QUEUE_URL,
MaxNumberOfMessages: 10, // Receive up to 10 messages at once
WaitTimeSeconds: 20, // Long polling — wait up to 20s for messages
VisibilityTimeout: 300, // 5 minutes — must be > max processing time
AttributeNames: ['All'],
MessageAttributeNames: ['All'],
}));
if (!response.Messages || response.Messages.length === 0) return;
for (const message of response.Messages) {
try {
// If processing might take a long time, extend visibility timeout periodically
const heartbeat = setInterval(async () => {
await sqs.send(new ChangeMessageVisibilityCommand({
QueueUrl: QUEUE_URL,
ReceiptHandle: message.ReceiptHandle,
VisibilityTimeout: 300, // Reset the clock
}));
}, 240_000); // Extend every 4 minutes
await processMessage(JSON.parse(message.Body));
clearInterval(heartbeat);
// Delete only after successful processing
await sqs.send(new DeleteMessageCommand({
QueueUrl: QUEUE_URL,
ReceiptHandle: message.ReceiptHandle,
}));
} catch (error) {
console.error('Processing failed:', error);
// Don't delete — let SQS retry (message becomes visible after timeout)
}
}
}Set queue visibility timeout in CDK/Terraform:
// AWS CDK
const queue = new sqs.Queue(this, 'MyQueue', {
visibilityTimeout: Duration.minutes(5),
receiveMessageWaitTime: Duration.seconds(20), // Long polling
retentionPeriod: Duration.days(4),
});# Terraform
resource "aws_sqs_queue" "my_queue" {
name = "my-queue"
visibility_timeout_seconds = 300 # 5 minutes
receive_wait_time_seconds = 20 # Long polling
message_retention_seconds = 345600 # 4 days
}The maximum visibility timeout is 12 hours. If a single message genuinely needs more than that to process, your architecture is wrong — break the work into smaller stages with Step Functions or use a different pattern. Setting VisibilityTimeout higher than 43200 throws InvalidParameterValue.
Fix 2: Implement Long Polling
Short polling returns immediately even with no messages, wasting requests. Use long polling:
// SHORT POLLING (default, wasteful)
// WaitTimeSeconds = 0 — returns immediately if no messages
const response = await sqs.send(new ReceiveMessageCommand({
QueueUrl: QUEUE_URL,
WaitTimeSeconds: 0, // Returns instantly
}));
// LONG POLLING (recommended)
// WaitTimeSeconds = 1-20 — waits up to N seconds for messages
const response = await sqs.send(new ReceiveMessageCommand({
QueueUrl: QUEUE_URL,
WaitTimeSeconds: 20, // Wait up to 20 seconds
MaxNumberOfMessages: 10,
}));Continuous polling loop:
async function startConsumer() {
console.log('Consumer started');
while (true) {
try {
const response = await sqs.send(new ReceiveMessageCommand({
QueueUrl: QUEUE_URL,
WaitTimeSeconds: 20,
MaxNumberOfMessages: 10,
}));
if (response.Messages && response.Messages.length > 0) {
await Promise.all(response.Messages.map(processAndDelete));
}
} catch (error) {
console.error('Poll error:', error);
await new Promise(r => setTimeout(r, 5000)); // Back off on error
}
}
}
async function processAndDelete(message) {
try {
await processMessage(JSON.parse(message.Body));
await sqs.send(new DeleteMessageCommand({
QueueUrl: QUEUE_URL,
ReceiptHandle: message.ReceiptHandle,
}));
} catch (error) {
console.error(`Failed to process message ${message.MessageId}:`, error);
// Message becomes visible again after VisibilityTimeout
}
}Fix 3: Configure Dead Letter Queue Properly
A DLQ captures messages that fail processing repeatedly:
// AWS CDK — proper DLQ setup
const dlq = new sqs.Queue(this, 'MyDLQ', {
queueName: 'my-queue-dlq',
retentionPeriod: Duration.days(14), // Keep failed messages 14 days for analysis
});
const mainQueue = new sqs.Queue(this, 'MyQueue', {
queueName: 'my-queue',
visibilityTimeout: Duration.minutes(5),
deadLetterQueue: {
queue: dlq,
maxReceiveCount: 3, // After 3 failed attempts, move to DLQ
},
});# Terraform — DLQ configuration
resource "aws_sqs_queue" "dlq" {
name = "my-queue-dlq"
message_retention_seconds = 1209600 # 14 days
}
resource "aws_sqs_queue" "main" {
name = "my-queue"
visibility_timeout_seconds = 300
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.dlq.arn
maxReceiveCount = 3 # Move to DLQ after 3 failures
})
}Monitor and replay DLQ messages (March 2023+ API):
# Check DLQ depth
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123/my-queue-dlq \
--attribute-names ApproximateNumberOfMessages
# Redrive DLQ messages back to the main queue
aws sqs start-message-move-task \
--source-arn arn:aws:sqs:us-east-1:123:my-queue-dlq \
--destination-arn arn:aws:sqs:us-east-1:123:my-queue \
--max-number-of-messages-per-second 10If start-message-move-task returns UnsupportedOperation, your CLI version predates the March 2023 release. Upgrade to AWS CLI v2.11+ or fall back to the console redrive UI.
Fix 4: Make Processing Idempotent
Standard SQS delivers at-least-once — the same message may be delivered multiple times. Design processing to handle duplicates:
const redis = require('redis');
const client = redis.createClient();
async function processMessage(message) {
const messageId = message.MessageId;
// Check if already processed (using Redis as idempotency store)
const alreadyProcessed = await client.set(
`processed:${messageId}`,
'1',
{ NX: true, EX: 86400 } // Only set if not exists, expire after 24h
);
if (!alreadyProcessed) {
console.log(`Skipping duplicate message: ${messageId}`);
return;
}
// Process the message
const body = JSON.parse(message.Body);
await handleOrder(body.orderId);
}Database-level idempotency:
-- Postgres: use INSERT ... ON CONFLICT DO NOTHING
INSERT INTO processed_messages (message_id, processed_at)
VALUES ($1, NOW())
ON CONFLICT (message_id) DO NOTHING;
-- Check if it was actually inserted
-- If 0 rows affected, this is a duplicateFix 5: Fix Lambda + SQS Integration
When Lambda is triggered by SQS, there are specific behaviors to handle. The partial-batch-failure pattern below requires the November 2020 feature — older accounts that haven’t opted in still see whole-batch retries.
// Lambda handler for SQS trigger
import { SQSHandler, SQSRecord } from 'aws-lambda';
export const handler: SQSHandler = async (event) => {
// event.Records contains all messages in this batch
const failures: { itemIdentifier: string }[] = [];
for (const record of event.Records) {
try {
await processRecord(record);
} catch (error) {
console.error(`Failed to process ${record.messageId}:`, error);
// Report partial batch failure — only failed messages go back to queue
failures.push({ itemIdentifier: record.messageId });
}
}
// Return failed message IDs for partial batch failure reporting
if (failures.length > 0) {
return { batchItemFailures: failures };
}
};
async function processRecord(record: SQSRecord) {
const body = JSON.parse(record.body);
// Process...
}Configure the Lambda event source mapping:
// CDK — Lambda + SQS event source
const processFunction = new lambda.Function(this, 'Processor', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('lambda'),
timeout: Duration.minutes(5), // Must be less than queue's VisibilityTimeout
});
processFunction.addEventSource(new lambdaEventSources.SqsEventSource(mainQueue, {
batchSize: 10,
maxBatchingWindow: Duration.seconds(30), // Wait up to 30s to fill a batch
reportBatchItemFailures: true, // Enable partial batch failure handling
}));Warning: Lambda’s timeout must be less than the SQS visibility timeout. If Lambda times out, the message becomes visible again before Lambda can report the failure, causing duplicate processing.
Fix 6: Use FIFO Queues for Ordered Processing
Standard queues don’t guarantee order. Use FIFO queues when order matters:
// FIFO queue — name must end in .fifo
const fifoQueue = new sqs.Queue(this, 'OrdersQueue', {
queueName: 'orders.fifo',
fifo: true,
contentBasedDeduplication: true, // Auto-dedup based on message body hash
});// Sending to FIFO queue — requires MessageGroupId
await sqs.send(new SendMessageCommand({
QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123/orders.fifo',
MessageBody: JSON.stringify({ orderId: '123', status: 'shipped' }),
MessageGroupId: 'order-123', // All messages for same order in order
MessageDeduplicationId: 'order-123-shipped-v1', // Prevent duplicates
}));FIFO limitations and the 2021 high-throughput option:
- Default cap: 300 transactions/second (3,000 with batching).
- High-throughput FIFO (Nov 2021): up to 9,000 messages/sec per API action in supported regions. Enable it via
DeduplicationScope=messageGroupandFifoThroughputLimit=perMessageGroupIdon the queue. Many CDK/Terraform examples still default to the olderperQueuescope. - Not available in every AWS region — check the regional service list before deploying.
- Ordering is per
MessageGroupId, not per queue. If every message uses the same group ID, you serialize everything to one consumer at a time.
Still Not Working?
IAM permissions — the consumer role needs sqs:ReceiveMessage, sqs:DeleteMessage, and sqs:ChangeMessageVisibility. The producer needs sqs:SendMessage. Missing permissions cause silent failures (403 errors that look like empty queues):
{
"Effect": "Allow",
"Action": [
"sqs:SendMessage",
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:ChangeMessageVisibility",
"sqs:GetQueueAttributes"
],
"Resource": "arn:aws:sqs:us-east-1:123456789:my-queue"
}Cross-account or cross-region queues — SQS queue URLs are region-specific. If your producer is in us-east-1 but the queue is in eu-west-1, use the correct region in the SQSClient configuration, and ensure the queue policy allows cross-account access.
ApproximateNumberOfMessages shows 0 but messages aren’t processing — messages may be in flight (currently invisible, being processed). Check ApproximateNumberOfMessagesNotVisible. If it’s high, your consumers are receiving messages but not deleting them (processing is stuck or failing silently).
Messages over 256KB rejected with MessageTooLong — this limit has not moved since launch. Use the SQS Extended Client to offload the body to S3 and put a pointer in the message, or split the payload into multiple messages with a correlation ID.
KMS errors after enabling encryption — if the queue uses a customer-managed KMS key, every consumer role needs kms:Decrypt on that key, and every producer needs kms:GenerateDataKey. Missing KMS permissions surface as KMS.AccessDeniedException, not as the normal SQS 403.
FIFO queue throughput collapsed after migration from Standard — the default FIFO mode caps you at 300 TPS. Enable high-throughput FIFO with FifoThroughputLimit=perMessageGroupId and choose a MessageGroupId strategy that fans out (e.g. customer ID, not a single constant).
For related AWS issues, see Fix: AWS Lambda Timeout, Fix: AWS Lambda Cold Start Timeout, Fix: AWS IAM AccessDeniedException, and Fix: Celery Task Not Received.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: AWS Lambda Layer Not Working — Module Not Found or Layer Not Applied
How to fix AWS Lambda Layer issues — directory structure, runtime compatibility, layer ARN configuration, dependency conflicts, size limits, and container image alternatives.
Fix: AWS S3 CORS Error — Access to Fetch Blocked by CORS Policy
How to fix AWS S3 CORS errors — S3 bucket CORS configuration, pre-signed URL CORS, CloudFront CORS headers, OPTIONS preflight requests, and presigned POST uploads.
Fix: AWS Access Denied — IAM Permission Errors and Policy Debugging
How to fix AWS Access Denied errors — understanding IAM policies, using IAM policy simulator, fixing AssumeRole errors, resource-based policies, and SCPs blocking actions.
Fix: Redis Pub/Sub Not Working — Messages Not Received by Subscribers
How to fix Redis Pub/Sub issues — subscriber not receiving messages, channel name mismatches, connection handling, pattern subscriptions, and scaling with multiple processes.