Fix: AWS Bedrock Not Working — Model Access, IAM, Converse API, Streaming, and Cross-Region
Quick Answer
How to fix AWS Bedrock errors — AccessDeniedException for model access, bedrock vs bedrock-runtime client, Converse vs InvokeModel API, streaming with ConverseStream, regional availability, and Knowledge Bases setup.
The Error
You call Bedrock and AWS denies access:
AccessDeniedException: You don't have access to the model with
the specified model ID.Or the client throws when invoking:
import boto3
client = boto3.client("bedrock")
response = client.invoke_model(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", body=...)
# AttributeError: 'BedrockClient' object has no attribute 'invoke_model'Or Converse returns a model-not-found error:
ValidationException: The model ID anthropic.claude-3-5-sonnet-20241022-v2:0
is not available in this region.Or streaming hangs:
response = client.converse_stream(modelId="...", messages=...)
for event in response["stream"]:
print(event)
# Hangs waiting for first event.Why This Happens
Bedrock is AWS’s managed foundation model service. Most issues map to one of:
- Model access is opt-in. Each model (Claude, Llama, Mistral, Titan) must be enabled in the Bedrock console per AWS account, per region. New accounts have none enabled by default.
- Two clients:
bedrockvsbedrock-runtime.bedrockis for management (list models, manage Knowledge Bases).bedrock-runtimeis for actually calling models (invoke_model,converse). - Two APIs: InvokeModel vs Converse.
InvokeModelis provider-specific (different JSON shape per model).Converseis unified across models. Use Converse unless you have a reason not to. - Regional availability. Not every model is in every region.
us-east-1andus-west-2typically have the most. Some models require “cross-region inference” to access from other regions.
Fix 1: Enable Model Access
In the AWS Console:
- Go to Bedrock in the AWS Console.
- Switch to the region you want (top-right region selector).
- Left sidebar → Model access.
- Click Modify model access (or Manage model access).
- Check the models you want (e.g. Claude 3.5 Sonnet, Llama 3.1, etc.).
- Submit. Some models require justification (1-2 sentence form); approval is usually instant or within minutes.
To verify via CLI:
aws bedrock list-foundation-models --region us-east-1 \
--query 'modelSummaries[?contains(modelId, `claude`)]'This lists Claude models in the region. If a model is in the catalog but AccessDeniedException on invoke, you haven’t enabled access for that specific model.
Pro Tip: Enable all models you might use upfront in your primary region. Approval is usually free and fast; partial approval gets you confused later when you switch models.
Fix 2: Use the Right Client
import boto3
# WRONG — bedrock (management):
client = boto3.client("bedrock")
client.invoke_model(...) # AttributeError
# RIGHT — bedrock-runtime (data plane):
client = boto3.client("bedrock-runtime", region_name="us-east-1")
client.converse(modelId="anthropic.claude-3-5-sonnet-20241022-v2:0", ...)Two clients, completely separate:
bedrock— list models, manage Provisioned Throughput, manage Guardrails, manage Knowledge Bases.bedrock-runtime— invoke models, converse, stream.
For JS:
import { BedrockRuntimeClient, ConverseCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new ConverseCommand({
modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
messages: [{ role: "user", content: [{ text: "Hello" }] }],
}));The bedrock-runtime package is separate from client-bedrock. Install the right one.
Fix 3: Use the Converse API (Unified Across Models)
The unified API works the same way regardless of which model you target:
import boto3
client = boto3.client("bedrock-runtime", region_name="us-east-1")
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[
{"role": "user", "content": [{"text": "What's the capital of France?"}]},
],
inferenceConfig={
"maxTokens": 1024,
"temperature": 0.7,
"topP": 0.9,
},
)
print(response["output"]["message"]["content"][0]["text"])For multi-turn conversations:
messages = [
{"role": "user", "content": [{"text": "What's 2+2?"}]},
{"role": "assistant", "content": [{"text": "4."}]},
{"role": "user", "content": [{"text": "And 3+3?"}]},
]
response = client.converse(modelId=model_id, messages=messages)For system prompts:
response = client.converse(
modelId=model_id,
system=[{"text": "You are a concise assistant. Answer in one sentence."}],
messages=[{"role": "user", "content": [{"text": "Tell me about Python."}]}],
)For tool use:
response = client.converse(
modelId=model_id,
messages=messages,
toolConfig={
"tools": [
{
"toolSpec": {
"name": "get_weather",
"description": "Get current weather",
"inputSchema": {
"json": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
}
},
}
}
],
},
)
# Check for tool use:
for block in response["output"]["message"]["content"]:
if "toolUse" in block:
tool_name = block["toolUse"]["name"]
tool_input = block["toolUse"]["input"]
# Execute tool, send result back in next turn.Pro Tip: Use Converse over invoke_model. The latter is provider-specific (different JSON for Claude vs Llama vs Titan); Converse is the same regardless of model. Easier to switch models.
Fix 4: Streaming With ConverseStream
For token-by-token streaming:
response = client.converse_stream(
modelId=model_id,
messages=[{"role": "user", "content": [{"text": "Write a poem about Python."}]}],
)
for event in response["stream"]:
if "contentBlockDelta" in event:
text = event["contentBlockDelta"]["delta"]["text"]
print(text, end="", flush=True)
elif "messageStop" in event:
print(f"\nStop reason: {event['messageStop']['stopReason']}")The event types in the stream:
messageStart— beginning of the assistant’s message.contentBlockStart— beginning of a content block (text, toolUse).contentBlockDelta— incremental update (text tokens).contentBlockStop— end of a content block.messageStop— end of the message withstopReason(end_turn,max_tokens,tool_use, etc.).metadata— token usage stats and latency.
For Node:
import { BedrockRuntimeClient, ConverseStreamCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-east-1" });
const response = await client.send(new ConverseStreamCommand({
modelId,
messages: [{ role: "user", content: [{ text: "Write a poem about JavaScript." }] }],
}));
for await (const event of response.stream) {
if (event.contentBlockDelta) {
process.stdout.write(event.contentBlockDelta.delta.text);
}
}response.stream is an async iterable.
Common Mistake: Awaiting individual stream events. The whole stream is one async iteration; don’t await events individually (they fire as a stream).
Fix 5: IAM Permissions
Your IAM role/user needs bedrock:InvokeModel, bedrock:Converse, etc.:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "arn:aws:bedrock:*:*:foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
}
]
}For all models in a region:
"Resource": "arn:aws:bedrock:us-east-1::foundation-model/*"The empty account ID in the ARN (bedrock:us-east-1::) is because foundation models are AWS-owned.
For cross-region inference (calling a model in a different region):
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
"arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
"arn:aws:bedrock:*:*:inference-profile/*"
]The inference-profile/* resource is for cross-region routing — Bedrock load-balances across regions, requiring access to the profile resource too.
Common Mistake: Granting bedrock:* to everyone. Scope to specific actions for read-only or invocation-only access. bedrock:InvokeModel doesn’t let you modify models; bedrock:* does.
Fix 6: Cross-Region Inference
For Claude on Bedrock in Asia or Europe, you often need cross-region inference. Use an inference profile ID:
response = client.converse(
modelId="us.anthropic.claude-3-5-sonnet-20241022-v2:0", # "us." prefix
messages=...,
)The us. prefix routes through US regions. Available prefixes typically include us., eu., apac..
To find available inference profiles:
aws bedrock list-inference-profiles --region us-east-1Cross-region inference improves availability (multiple regions can serve) but adds slight latency (~50ms).
Pro Tip: Always test with inference profiles first if you’re in a region with limited model availability. They’re transparent — your code is the same except for the model ID prefix.
Fix 7: Knowledge Bases for RAG
For built-in RAG, Bedrock Knowledge Bases connect to S3, Confluence, Salesforce, etc., embed documents, and serve queries:
client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = client.retrieve_and_generate(
input={"text": "What does the company policy say about parental leave?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "ABC123XYZ",
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
},
},
)
print(response["output"]["text"])
print("Sources:", response.get("citations", []))retrieve_and_generate does retrieval + LLM call in one API call. For separate retrieval (so you can post-process):
response = client.retrieve(
knowledgeBaseId="ABC123XYZ",
retrievalQuery={"text": "parental leave"},
retrievalConfiguration={
"vectorSearchConfiguration": {"numberOfResults": 5},
},
)
for result in response["retrievalResults"]:
print(result["content"]["text"])
print(result["score"])Use a third client: bedrock-agent-runtime (yes, a third one). It’s separate from bedrock-runtime.
Fix 8: Cost and Provisioned Throughput
Bedrock pricing:
- On-demand — pay per input/output token. No commitment. Cold start variable.
- Provisioned Throughput — pay hourly/monthly for guaranteed capacity. Faster, no rate limits, predictable cost.
For on-demand monitoring:
# Token usage in Converse response:
response = client.converse(modelId=..., messages=...)
usage = response["usage"]
print(f"Input: {usage['inputTokens']}, Output: {usage['outputTokens']}")For Provisioned Throughput:
aws bedrock create-provisioned-model-throughput \
--model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
--provisioned-model-name my-prod \
--model-units 1 \
--commitment-duration OneMonthThen invoke using the provisioned ARN as the modelId:
response = client.converse(
modelId="arn:aws:bedrock:us-east-1:123456:provisioned-model/abc-123",
messages=...,
)Note: Provisioned Throughput is expensive ($10K+/month per unit for top models). Use only when you have sustained, predictable load that warrants the commitment. For variable load, on-demand is much cheaper.
Still Not Working?
A few less-obvious failures:
Could not load credentials. Standard AWS auth issue. SetAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYor run on an EC2/Lambda role with Bedrock permissions.- Rate limited (
ThrottlingException). On-demand has per-account quotas. Request quota increases via AWS Support or use Provisioned Throughput. - Model returns wildly different output than via Anthropic API directly. Some Bedrock models have additional safety filters or default system prompts. Compare prompt + response exactly between providers.
- Streaming buffers entire response. Some proxies (CloudFront, ALB) buffer SSE. Use the SDK’s streaming API directly — don’t proxy.
- Different model versions on Bedrock vs Anthropic. Bedrock model IDs include the version date (
-20241022). Always pin the date —latestaliases can shift, breaking reproducibility. - Tool use returns weird formats. The Converse API normalizes tool format, but model behavior differs. Test tool prompts with the specific model you’ll deploy.
- Bedrock LLM in VPC fails. Bedrock supports VPC Endpoints for private network access. Without one, your VPC-only EC2 can’t reach Bedrock.
- Image inputs (multimodal) fail. Each model has its own image format limits (size, format, count). Claude 3 accepts up to 5 images per message, max 5 MB each.
For related LLM / API issues, see OpenAI API not working, LiteLLM not working, LangChain Python not working, and AWS IAM permission denied.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Cloudflare Workers AI Not Working — AI Binding, Model IDs, Streaming, and Vectorize Integration
How to fix Cloudflare Workers AI errors — env.AI binding setup, model ID format, text-generation streaming with ReadableStream, AI Gateway, Vectorize embeddings, region availability, and Neuron-based pricing.
Fix: CrewAI Not Working — Agent Delegation, Task Context, and LLM Configuration Errors
How to fix CrewAI errors — LLM not configured ValidationError, agent delegation loop, task context not passed between agents, tool output truncated, process hierarchical vs sequential, and memory not persisting across runs.
Fix: LangGraph Not Working — State Errors, Checkpointer Setup, and Cyclic Graph Failures
How to fix LangGraph errors — state not updating between nodes, checkpointer thread_id required, StateGraph compile error, conditional edges not routing, streaming events missing, recursion limit exceeded, and interrupt handling.
Fix: Hugging Face Transformers Not Working — OSError, CUDA OOM, and Generation Errors
How to fix Hugging Face Transformers errors — OSError can't load tokenizer, gated repo access, CUDA out of memory with device_map auto, bitsandbytes not installed, tokenizer padding mismatch, pad_token_id warning, and LoRA adapter loading failures.