Skip to content

Fix: AWS Bedrock Not Working — Model Access, IAM, Converse API, Streaming, and Cross-Region

FixDevs ·

Quick Answer

How to fix AWS Bedrock errors — AccessDeniedException for model access, bedrock vs bedrock-runtime client, Converse vs InvokeModel API, streaming with ConverseStream, regional availability, and Knowledge Bases setup.

The Error

You call Bedrock and AWS denies access:

AccessDeniedException: You don't have access to the model with 
the specified model ID.

Or the client throws when invoking:

import boto3
client = boto3.client("bedrock")
response = client.invoke_model(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", body=...)
# AttributeError: 'BedrockClient' object has no attribute 'invoke_model'

Or Converse returns a model-not-found error:

ValidationException: The model ID anthropic.claude-3-5-sonnet-20241022-v2:0 
is not available in this region.

Or streaming hangs:

response = client.converse_stream(modelId="...", messages=...)
for event in response["stream"]:
    print(event)
# Hangs waiting for first event.

Why This Happens

Bedrock is AWS’s managed foundation model service. Most issues map to one of:

  • Model access is opt-in. Each model (Claude, Llama, Mistral, Titan) must be enabled in the Bedrock console per AWS account, per region. New accounts have none enabled by default.
  • Two clients: bedrock vs bedrock-runtime. bedrock is for management (list models, manage Knowledge Bases). bedrock-runtime is for actually calling models (invoke_model, converse).
  • Two APIs: InvokeModel vs Converse. InvokeModel is provider-specific (different JSON shape per model). Converse is unified across models. Use Converse unless you have a reason not to.
  • Regional availability. Not every model is in every region. us-east-1 and us-west-2 typically have the most. Some models require “cross-region inference” to access from other regions.

Fix 1: Enable Model Access

In the AWS Console:

  1. Go to Bedrock in the AWS Console.
  2. Switch to the region you want (top-right region selector).
  3. Left sidebar → Model access.
  4. Click Modify model access (or Manage model access).
  5. Check the models you want (e.g. Claude 3.5 Sonnet, Llama 3.1, etc.).
  6. Submit. Some models require justification (1-2 sentence form); approval is usually instant or within minutes.

To verify via CLI:

aws bedrock list-foundation-models --region us-east-1 \
  --query 'modelSummaries[?contains(modelId, `claude`)]'

This lists Claude models in the region. If a model is in the catalog but AccessDeniedException on invoke, you haven’t enabled access for that specific model.

Pro Tip: Enable all models you might use upfront in your primary region. Approval is usually free and fast; partial approval gets you confused later when you switch models.

Fix 2: Use the Right Client

import boto3

# WRONG — bedrock (management):
client = boto3.client("bedrock")
client.invoke_model(...)  # AttributeError

# RIGHT — bedrock-runtime (data plane):
client = boto3.client("bedrock-runtime", region_name="us-east-1")
client.converse(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", ...)

Two clients, completely separate:

  • bedrock — list models, manage Provisioned Throughput, manage Guardrails, manage Knowledge Bases.
  • bedrock-runtime — invoke models, converse, stream.

For JS:

import { BedrockRuntimeClient, ConverseCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new ConverseCommand({
  modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
  messages: [{ role: "user", content: [{ text: "Hello" }] }],
}));

The bedrock-runtime package is separate from client-bedrock. Install the right one.

Fix 3: Use the Converse API (Unified Across Models)

The unified API works the same way regardless of which model you target:

import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[
        {"role": "user", "content": [{"text": "What's the capital of France?"}]},
    ],
    inferenceConfig={
        "maxTokens": 1024,
        "temperature": 0.7,
        "topP": 0.9,
    },
)

print(response["output"]["message"]["content"][0]["text"])

For multi-turn conversations:

messages = [
    {"role": "user", "content": [{"text": "What's 2+2?"}]},
    {"role": "assistant", "content": [{"text": "4."}]},
    {"role": "user", "content": [{"text": "And 3+3?"}]},
]

response = client.converse(modelId=model_id, messages=messages)

For system prompts:

response = client.converse(
    modelId=model_id,
    system=[{"text": "You are a concise assistant. Answer in one sentence."}],
    messages=[{"role": "user", "content": [{"text": "Tell me about Python."}]}],
)

For tool use:

response = client.converse(
    modelId=model_id,
    messages=messages,
    toolConfig={
        "tools": [
            {
                "toolSpec": {
                    "name": "get_weather",
                    "description": "Get current weather",
                    "inputSchema": {
                        "json": {
                            "type": "object",
                            "properties": {"city": {"type": "string"}},
                            "required": ["city"],
                        }
                    },
                }
            }
        ],
    },
)

# Check for tool use:
for block in response["output"]["message"]["content"]:
    if "toolUse" in block:
        tool_name = block["toolUse"]["name"]
        tool_input = block["toolUse"]["input"]
        # Execute tool, send result back in next turn.

Pro Tip: Use Converse over invoke_model. The latter is provider-specific (different JSON for Claude vs Llama vs Titan); Converse is the same regardless of model. Easier to switch models.

Fix 4: Streaming With ConverseStream

For token-by-token streaming:

response = client.converse_stream(
    modelId=model_id,
    messages=[{"role": "user", "content": [{"text": "Write a poem about Python."}]}],
)

for event in response["stream"]:
    if "contentBlockDelta" in event:
        text = event["contentBlockDelta"]["delta"]["text"]
        print(text, end="", flush=True)
    elif "messageStop" in event:
        print(f"\nStop reason: {event['messageStop']['stopReason']}")

The event types in the stream:

  • messageStart — beginning of the assistant’s message.
  • contentBlockStart — beginning of a content block (text, toolUse).
  • contentBlockDelta — incremental update (text tokens).
  • contentBlockStop — end of a content block.
  • messageStop — end of the message with stopReason (end_turn, max_tokens, tool_use, etc.).
  • metadata — token usage stats and latency.

For Node:

import { BedrockRuntimeClient, ConverseStreamCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new ConverseStreamCommand({
  modelId,
  messages: [{ role: "user", content: [{ text: "Write a poem about JavaScript." }] }],
}));

for await (const event of response.stream) {
  if (event.contentBlockDelta) {
    process.stdout.write(event.contentBlockDelta.delta.text);
  }
}

response.stream is an async iterable.

Common Mistake: Awaiting individual stream events. The whole stream is one async iteration; don’t await events individually (they fire as a stream).

Fix 5: IAM Permissions

Your IAM role/user needs bedrock:InvokeModel, bedrock:Converse, etc.:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "Resource": "arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
    }
  ]
}

For all models in a region:

"Resource": "arn:aws:bedrock:us-east-1::foundation-model/*"

The empty account ID in the ARN (bedrock:us-east-1::) is because foundation models are AWS-owned.

For cross-region inference (calling a model in a different region):

"Resource": [
  "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
  "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
  "arn:aws:bedrock:*:*:inference-profile/*"
]

The inference-profile/* resource is for cross-region routing — Bedrock load-balances across regions, requiring access to the profile resource too.

Common Mistake: Granting bedrock:* to everyone. Scope to specific actions for read-only or invocation-only access. bedrock:InvokeModel doesn’t let you modify models; bedrock:* does.

Fix 6: Cross-Region Inference

For Claude on Bedrock in Asia or Europe, you often need cross-region inference. Use an inference profile ID:

response = client.converse(
    modelId="us.anthropic.claude-3-5-sonnet-20241022-v2:0",  # "us." prefix
    messages=...,
)

The us. prefix routes through US regions. Available prefixes typically include us., eu., apac..

To find available inference profiles:

aws bedrock list-inference-profiles --region us-east-1

Cross-region inference improves availability (multiple regions can serve) but adds slight latency (~50ms).

Pro Tip: Always test with inference profiles first if you’re in a region with limited model availability. They’re transparent — your code is the same except for the model ID prefix.

Fix 7: Knowledge Bases for RAG

For built-in RAG, Bedrock Knowledge Bases connect to S3, Confluence, Salesforce, etc., embed documents, and serve queries:

client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = client.retrieve_and_generate(
    input={"text": "What does the company policy say about parental leave?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "ABC123XYZ",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
        },
    },
)

print(response["output"]["text"])
print("Sources:", response.get("citations", []))

retrieve_and_generate does retrieval + LLM call in one API call. For separate retrieval (so you can post-process):

response = client.retrieve(
    knowledgeBaseId="ABC123XYZ",
    retrievalQuery={"text": "parental leave"},
    retrievalConfiguration={
        "vectorSearchConfiguration": {"numberOfResults": 5},
    },
)

for result in response["retrievalResults"]:
    print(result["content"]["text"])
    print(result["score"])

Use a third client: bedrock-agent-runtime (yes, a third one). It’s separate from bedrock-runtime.

Fix 8: Cost and Provisioned Throughput

Bedrock pricing:

  • On-demand — pay per input/output token. No commitment. Cold start variable.
  • Provisioned Throughput — pay hourly/monthly for guaranteed capacity. Faster, no rate limits, predictable cost.

For on-demand monitoring:

# Token usage in Converse response:
response = client.converse(modelId=..., messages=...)
usage = response["usage"]
print(f"Input: {usage['inputTokens']}, Output: {usage['outputTokens']}")

For Provisioned Throughput:

aws bedrock create-provisioned-model-throughput \
  --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
  --provisioned-model-name my-prod \
  --model-units 1 \
  --commitment-duration OneMonth

Then invoke using the provisioned ARN as the modelId:

response = client.converse(
    modelId="arn:aws:bedrock:us-east-1:123456:provisioned-model/abc-123",
    messages=...,
)

Note: Provisioned Throughput is expensive ($10K+/month per unit for top models). Use only when you have sustained, predictable load that warrants the commitment. For variable load, on-demand is much cheaper.

Still Not Working?

A few less-obvious failures:

  • Could not load credentials. Standard AWS auth issue. Set AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY or run on an EC2/Lambda role with Bedrock permissions.
  • Rate limited (ThrottlingException). On-demand has per-account quotas. Request quota increases via AWS Support or use Provisioned Throughput.
  • Model returns wildly different output than via Anthropic API directly. Some Bedrock models have additional safety filters or default system prompts. Compare prompt + response exactly between providers.
  • Streaming buffers entire response. Some proxies (CloudFront, ALB) buffer SSE. Use the SDK’s streaming API directly — don’t proxy.
  • Different model versions on Bedrock vs Anthropic. Bedrock model IDs include the version date (-20241022). Always pin the date — latest aliases can shift, breaking reproducibility.
  • Tool use returns weird formats. The Converse API normalizes tool format, but model behavior differs. Test tool prompts with the specific model you’ll deploy.
  • Bedrock LLM in VPC fails. Bedrock supports VPC Endpoints for private network access. Without one, your VPC-only EC2 can’t reach Bedrock.
  • Image inputs (multimodal) fail. Each model has its own image format limits (size, format, count). Claude 3 accepts up to 5 images per message, max 5 MB each.

For related LLM / API issues, see OpenAI API not working, LiteLLM not working, LangChain Python not working, and AWS IAM permission denied.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles