Fix: AWS SQS Not Working — Messages Not Received, Duplicate Processing, or DLQ Filling Up

Q: How do I fix "AWS SQS Not Working — Messages Not Received, Duplicate Processing, or DLQ Filling Up"?

How to fix AWS SQS issues — visibility timeout, message not delivered, duplicate messages, Dead Letter Queue configuration, FIFO queue ordering, and Lambda trigger problems.

The Problem

Messages are sent to SQS but never received by the consumer:

// Producer sends successfully
await sqs.sendMessage({
  QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123/my-queue',
  MessageBody: JSON.stringify({ orderId: '123' }),
}).promise();
// MessageId returned — but consumer never sees the message

Or the same message is processed multiple times despite being deleted:

// Consumer processes and deletes the message
await sqs.deleteMessage({
  QueueUrl: queueUrl,
  ReceiptHandle: message.ReceiptHandle,
}).promise();
// But the same message appears again 30 seconds later

Or messages go to the Dead Letter Queue immediately without being processed:

DLQ is filling up, but the main queue consumer shows no errors

Or a Lambda function triggered by SQS processes messages out of order.

Why This Happens

SQS has several behaviors that differ from traditional message queues. Most production problems trace back to four root causes:

Visibility timeout too short — when a consumer receives a message, it becomes invisible to other consumers for the VisibilityTimeout period. If processing takes longer, the message becomes visible again and is delivered a second time.
Consumer not polling — ReceiveMessage only returns messages that are currently visible. If polling stops (consumer crash, rate limiting), messages queue up but aren’t processed.
At-least-once delivery — standard SQS queues guarantee at-least-once delivery. Even with successful deletion, rare cases can deliver the message twice. Your processing must be idempotent.
DLQ maxReceiveCount too low — if maxReceiveCount is 1, any processing failure sends the message to the DLQ immediately without retry.
Wrong queue URL or IAM permissions — sending to the wrong URL or missing sqs:ReceiveMessage permissions causes silent failures.

The second layer of confusion is that SQS has evolved heavily, and the behavior you read about on Stack Overflow in 2018 is not the behavior of the queue you provisioned this week. Knowing which feature shipped with which release decides which fix actually applies to your setup.

SQS Version History — What Shipped When

SQS was one of the first AWS services (launched 2006), but the features developers reach for today are much newer. Many “SQS not working” threads online predate the feature you’re trying to use.

November 2016 — FIFO queues. Originally SQS only offered Standard queues with at-least-once delivery and no ordering guarantees. FIFO (.fifo suffix) added exactly-once processing and strict ordering within a MessageGroupId. Pre-2016 articles that say “SQS cannot guarantee order” are wrong if you’re on FIFO.
November 2018 — server-side encryption with KMS. SSE-SQS arrived later in 2020 as a managed option. If a queue still uses an old KMS key with restrictive policies, consumers can receive KMS.AccessDeniedException instead of messages.
November 2019 — long polling default of 20s on console-created queues. Older Terraform/CloudFormation templates often set ReceiveMessageWaitTimeSeconds = 0 (short polling). Empty receives still bill, so audit older queues.
November 2020 — partial batch failures for Lambda triggers. Before this, a single failed record in a batch of 10 forced the entire batch to retry. Set ReportBatchItemFailures and return batchItemFailures to retry only the failures.
November 2021 — high throughput for FIFO. Originally FIFO was capped at 300 TPS (3,000 with batching). High-throughput FIFO raised the per-API-action limit to 9,000 messages/sec in some regions. If you migrated from Standard to FIFO and saw a throughput collapse, you may not have opted into high-throughput mode.
March 2023 — Dead Letter Queue redrive API (StartMessageMoveTask). Before this, “redriving” a DLQ back to the source queue meant the DLQ Redrive page in the console or a homegrown Lambda. The new API is scriptable and rate-limited per call.
April 2024 — FIFO dead-letter queue redrive. The original redrive API only supported Standard queues at launch. FIFO support was added later.

The 256KB maximum message size has not changed since launch. If you need to send a larger payload, use the Extended Client Library pattern: store the body in S3 and put a pointer in the SQS message. Articles that suggest “compress the payload” miss that 256KB is a hard service limit, not a guideline.

Fix 1: Configure Visibility Timeout Correctly

The visibility timeout must be longer than your maximum processing time:

const { SQSClient, ReceiveMessageCommand, ChangeMessageVisibilityCommand, DeleteMessageCommand } = require('@aws-sdk/client-sqs');

const sqs = new SQSClient({ region: 'us-east-1' });
const QUEUE_URL = 'https://sqs.us-east-1.amazonaws.com/123456789/my-queue';

async function processMessages() {
  const response = await sqs.send(new ReceiveMessageCommand({
    QueueUrl: QUEUE_URL,
    MaxNumberOfMessages: 10,      // Receive up to 10 messages at once
    WaitTimeSeconds: 20,          // Long polling — wait up to 20s for messages
    VisibilityTimeout: 300,       // 5 minutes — must be > max processing time
    AttributeNames: ['All'],
    MessageAttributeNames: ['All'],
  }));

  if (!response.Messages || response.Messages.length === 0) return;

  for (const message of response.Messages) {
    try {
      // If processing might take a long time, extend visibility timeout periodically
      const heartbeat = setInterval(async () => {
        await sqs.send(new ChangeMessageVisibilityCommand({
          QueueUrl: QUEUE_URL,
          ReceiptHandle: message.ReceiptHandle,
          VisibilityTimeout: 300,  // Reset the clock
        }));
      }, 240_000);  // Extend every 4 minutes

      await processMessage(JSON.parse(message.Body));

      clearInterval(heartbeat);

      // Delete only after successful processing
      await sqs.send(new DeleteMessageCommand({
        QueueUrl: QUEUE_URL,
        ReceiptHandle: message.ReceiptHandle,
      }));
    } catch (error) {
      console.error('Processing failed:', error);
      // Don't delete — let SQS retry (message becomes visible after timeout)
    }
  }
}

Set queue visibility timeout in CDK/Terraform:

// AWS CDK
const queue = new sqs.Queue(this, 'MyQueue', {
  visibilityTimeout: Duration.minutes(5),
  receiveMessageWaitTime: Duration.seconds(20),  // Long polling
  retentionPeriod: Duration.days(4),
});

# Terraform
resource "aws_sqs_queue" "my_queue" {
  name                       = "my-queue"
  visibility_timeout_seconds = 300   # 5 minutes
  receive_wait_time_seconds  = 20    # Long polling
  message_retention_seconds  = 345600  # 4 days
}

The maximum visibility timeout is 12 hours. If a single message genuinely needs more than that to process, your architecture is wrong — break the work into smaller stages with Step Functions or use a different pattern. Setting VisibilityTimeout higher than 43200 throws InvalidParameterValue.

Fix 2: Implement Long Polling

Short polling returns immediately even with no messages, wasting requests. Use long polling:

// SHORT POLLING (default, wasteful)
// WaitTimeSeconds = 0 — returns immediately if no messages
const response = await sqs.send(new ReceiveMessageCommand({
  QueueUrl: QUEUE_URL,
  WaitTimeSeconds: 0,  // Returns instantly
}));

// LONG POLLING (recommended)
// WaitTimeSeconds = 1-20 — waits up to N seconds for messages
const response = await sqs.send(new ReceiveMessageCommand({
  QueueUrl: QUEUE_URL,
  WaitTimeSeconds: 20,  // Wait up to 20 seconds
  MaxNumberOfMessages: 10,
}));

Continuous polling loop:

async function startConsumer() {
  console.log('Consumer started');

  while (true) {
    try {
      const response = await sqs.send(new ReceiveMessageCommand({
        QueueUrl: QUEUE_URL,
        WaitTimeSeconds: 20,
        MaxNumberOfMessages: 10,
      }));

      if (response.Messages && response.Messages.length > 0) {
        await Promise.all(response.Messages.map(processAndDelete));
      }
    } catch (error) {
      console.error('Poll error:', error);
      await new Promise(r => setTimeout(r, 5000));  // Back off on error
    }
  }
}

async function processAndDelete(message) {
  try {
    await processMessage(JSON.parse(message.Body));
    await sqs.send(new DeleteMessageCommand({
      QueueUrl: QUEUE_URL,
      ReceiptHandle: message.ReceiptHandle,
    }));
  } catch (error) {
    console.error(`Failed to process message ${message.MessageId}:`, error);
    // Message becomes visible again after VisibilityTimeout
  }
}

Fix 3: Configure Dead Letter Queue Properly

A DLQ captures messages that fail processing repeatedly:

// AWS CDK — proper DLQ setup
const dlq = new sqs.Queue(this, 'MyDLQ', {
  queueName: 'my-queue-dlq',
  retentionPeriod: Duration.days(14),  // Keep failed messages 14 days for analysis
});

const mainQueue = new sqs.Queue(this, 'MyQueue', {
  queueName: 'my-queue',
  visibilityTimeout: Duration.minutes(5),
  deadLetterQueue: {
    queue: dlq,
    maxReceiveCount: 3,  // After 3 failed attempts, move to DLQ
  },
});

# Terraform — DLQ configuration
resource "aws_sqs_queue" "dlq" {
  name                      = "my-queue-dlq"
  message_retention_seconds = 1209600  # 14 days
}

resource "aws_sqs_queue" "main" {
  name                       = "my-queue"
  visibility_timeout_seconds = 300

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq.arn
    maxReceiveCount     = 3  # Move to DLQ after 3 failures
  })
}

Monitor and replay DLQ messages (March 2023+ API):

# Check DLQ depth
aws sqs get-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123/my-queue-dlq \
  --attribute-names ApproximateNumberOfMessages

# Redrive DLQ messages back to the main queue
aws sqs start-message-move-task \
  --source-arn arn:aws:sqs:us-east-1:123:my-queue-dlq \
  --destination-arn arn:aws:sqs:us-east-1:123:my-queue \
  --max-number-of-messages-per-second 10

If start-message-move-task returns UnsupportedOperation, your CLI version predates the March 2023 release. Upgrade to AWS CLI v2.11+ or fall back to the console redrive UI.

Fix 4: Make Processing Idempotent

Standard SQS delivers at-least-once — the same message may be delivered multiple times. Design processing to handle duplicates:

const redis = require('redis');
const client = redis.createClient();

async function processMessage(message) {
  const messageId = message.MessageId;

  // Check if already processed (using Redis as idempotency store)
  const alreadyProcessed = await client.set(
    `processed:${messageId}`,
    '1',
    { NX: true, EX: 86400 }  // Only set if not exists, expire after 24h
  );

  if (!alreadyProcessed) {
    console.log(`Skipping duplicate message: ${messageId}`);
    return;
  }

  // Process the message
  const body = JSON.parse(message.Body);
  await handleOrder(body.orderId);
}

Database-level idempotency:

-- Postgres: use INSERT ... ON CONFLICT DO NOTHING
INSERT INTO processed_messages (message_id, processed_at)
VALUES ($1, NOW())
ON CONFLICT (message_id) DO NOTHING;

-- Check if it was actually inserted
-- If 0 rows affected, this is a duplicate

Fix 5: Fix Lambda + SQS Integration

When Lambda is triggered by SQS, there are specific behaviors to handle. The partial-batch-failure pattern below requires the November 2020 feature — older accounts that haven’t opted in still see whole-batch retries.

// Lambda handler for SQS trigger
import { SQSHandler, SQSRecord } from 'aws-lambda';

export const handler: SQSHandler = async (event) => {
  // event.Records contains all messages in this batch
  const failures: { itemIdentifier: string }[] = [];

  for (const record of event.Records) {
    try {
      await processRecord(record);
    } catch (error) {
      console.error(`Failed to process ${record.messageId}:`, error);

      // Report partial batch failure — only failed messages go back to queue
      failures.push({ itemIdentifier: record.messageId });
    }
  }

  // Return failed message IDs for partial batch failure reporting
  if (failures.length > 0) {
    return { batchItemFailures: failures };
  }
};

async function processRecord(record: SQSRecord) {
  const body = JSON.parse(record.body);
  // Process...
}

Configure the Lambda event source mapping:

// CDK — Lambda + SQS event source
const processFunction = new lambda.Function(this, 'Processor', {
  runtime: lambda.Runtime.NODEJS_20_X,
  handler: 'index.handler',
  code: lambda.Code.fromAsset('lambda'),
  timeout: Duration.minutes(5),  // Must be less than queue's VisibilityTimeout
});

processFunction.addEventSource(new lambdaEventSources.SqsEventSource(mainQueue, {
  batchSize: 10,
  maxBatchingWindow: Duration.seconds(30),  // Wait up to 30s to fill a batch
  reportBatchItemFailures: true,  // Enable partial batch failure handling
}));

Warning: Lambda’s timeout must be less than the SQS visibility timeout. If Lambda times out, the message becomes visible again before Lambda can report the failure, causing duplicate processing.

Fix 6: Use FIFO Queues for Ordered Processing

Standard queues don’t guarantee order. Use FIFO queues when order matters:

// FIFO queue — name must end in .fifo
const fifoQueue = new sqs.Queue(this, 'OrdersQueue', {
  queueName: 'orders.fifo',
  fifo: true,
  contentBasedDeduplication: true,  // Auto-dedup based on message body hash
});

// Sending to FIFO queue — requires MessageGroupId
await sqs.send(new SendMessageCommand({
  QueueUrl: 'https://sqs.us-east-1.amazonaws.com/123/orders.fifo',
  MessageBody: JSON.stringify({ orderId: '123', status: 'shipped' }),
  MessageGroupId: 'order-123',         // All messages for same order in order
  MessageDeduplicationId: 'order-123-shipped-v1',  // Prevent duplicates
}));

FIFO limitations and the 2021 high-throughput option:

Default cap: 300 transactions/second (3,000 with batching).
High-throughput FIFO (Nov 2021): up to 9,000 messages/sec per API action in supported regions. Enable it via DeduplicationScope=messageGroup and FifoThroughputLimit=perMessageGroupId on the queue. Many CDK/Terraform examples still default to the older perQueue scope.
Not available in every AWS region — check the regional service list before deploying.
Ordering is per MessageGroupId, not per queue. If every message uses the same group ID, you serialize everything to one consumer at a time.

Still Not Working?

IAM permissions — the consumer role needs sqs:ReceiveMessage, sqs:DeleteMessage, and sqs:ChangeMessageVisibility. The producer needs sqs:SendMessage. Missing permissions cause silent failures (403 errors that look like empty queues):

{
  "Effect": "Allow",
  "Action": [
    "sqs:SendMessage",
    "sqs:ReceiveMessage",
    "sqs:DeleteMessage",
    "sqs:ChangeMessageVisibility",
    "sqs:GetQueueAttributes"
  ],
  "Resource": "arn:aws:sqs:us-east-1:123456789:my-queue"
}

Cross-account or cross-region queues — SQS queue URLs are region-specific. If your producer is in us-east-1 but the queue is in eu-west-1, use the correct region in the SQSClient configuration, and ensure the queue policy allows cross-account access.

ApproximateNumberOfMessages shows 0 but messages aren’t processing — messages may be in flight (currently invisible, being processed). Check ApproximateNumberOfMessagesNotVisible. If it’s high, your consumers are receiving messages but not deleting them (processing is stuck or failing silently).

Messages over 256KB rejected with MessageTooLong — this limit has not moved since launch. Use the SQS Extended Client to offload the body to S3 and put a pointer in the message, or split the payload into multiple messages with a correlation ID.

KMS errors after enabling encryption — if the queue uses a customer-managed KMS key, every consumer role needs kms:Decrypt on that key, and every producer needs kms:GenerateDataKey. Missing KMS permissions surface as KMS.AccessDeniedException, not as the normal SQS 403.

FIFO queue throughput collapsed after migration from Standard — the default FIFO mode caps you at 300 TPS. Enable high-throughput FIFO with FifoThroughputLimit=perMessageGroupId and choose a MessageGroupId strategy that fans out (e.g. customer ID, not a single constant).

Fix: AWS SQS Not Working — Messages Not Received, Duplicate Processing, or DLQ Filling Up

The Problem

Why This Happens

SQS Version History — What Shipped When

Fix 1: Configure Visibility Timeout Correctly

Fix 2: Implement Long Polling

Fix 3: Configure Dead Letter Queue Properly

Fix 4: Make Processing Idempotent

Fix 5: Fix Lambda + SQS Integration

Fix 6: Use FIFO Queues for Ordered Processing

Still Not Working?

Related Articles

Fix: AWS Lambda Layer Not Working — Module Not Found or Layer Not Applied

Fix: AWS S3 CORS Error — Access to Fetch Blocked by CORS Policy

Fix: AWS Access Denied — IAM Permission Errors and Policy Debugging

Fix: Redis Pub/Sub Not Working — Messages Not Received by Subscribers