Skip to content

Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors

FixDevs ·

Quick Answer

How to fix LiteLLM errors — BadRequestError model not found, missing API key env vars, streaming chunk differences, fallback model not triggering, async drop_params, and proxy server 401.

The Error

You call litellm.completion with what looks like a valid model name and get this:

import litellm

response = litellm.completion(
    model="claude-3-5-sonnet",
    messages=[{"role": "user", "content": "Hi"}],
)
# litellm.exceptions.BadRequestError:
# LLM Provider NOT provided. Pass in the LLM provider you are trying to call.
# You passed model=claude-3-5-sonnet
# Pass model as E.g. For 'Huggingface' inference endpoints pass in
# `completion(model='huggingface/starcoder',..)`

Or you set OPENAI_API_KEY and switch to Anthropic, and it still tries OpenAI:

AuthenticationError: Anthropic API key not provided. Set ANTHROPIC_API_KEY env var.

Or fallback doesn’t trigger when the primary model fails:

response = litellm.completion(
    model="gpt-4o",
    fallbacks=["claude-3-5-sonnet-20241022"],
    messages=[...],
)
# Primary fails. No fallback attempted. Error raised.

Or streaming chunks have a different shape across providers:

# OpenAI: chunk.choices[0].delta.content is str
# Anthropic via LiteLLM: chunk.choices[0].delta.content is str
# Gemini: sometimes None for the first chunk
# Bedrock: chunk shape differs entirely

Why This Happens

LiteLLM normalizes 100+ LLM providers behind one API, but the normalization isn’t perfect. Three sources of pain:

  • Provider prefix in the model name. LiteLLM uses the model string to route the request. gpt-4o implies OpenAI, but claude-3-5-sonnet doesn’t unambiguously imply Anthropic — you must write anthropic/claude-3-5-sonnet-20241022 so the router picks the right SDK.
  • Per-provider env vars. Each provider needs its own key: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY for Bedrock, etc. LiteLLM reads them lazily — a missing one only fails when you actually call that provider.
  • Provider-specific kwargs. OpenAI accepts response_format={"type": "json_object"} but Anthropic doesn’t. Pass it anyway and you get BadRequestError. LiteLLM offers drop_params=True to silently drop unsupported kwargs.

Fallbacks have a separate gotcha: the fallbacks parameter only triggers on errors LiteLLM classifies as retryable (rate limits, timeouts, server errors). A BadRequestError from a bad prompt is not retried — by design.

Fix 1: Use the provider/model Format

The safest pattern: always prefix the model with the provider. It removes ambiguity and survives the next time LiteLLM adds a new model that overlaps a name from another provider:

import litellm

# OpenAI
litellm.completion(model="openai/gpt-4o", messages=[...])
litellm.completion(model="openai/gpt-4o-mini", messages=[...])

# Anthropic
litellm.completion(model="anthropic/claude-3-5-sonnet-20241022", messages=[...])
litellm.completion(model="anthropic/claude-3-5-haiku-20241022", messages=[...])

# Gemini
litellm.completion(model="gemini/gemini-1.5-pro", messages=[...])

# Bedrock
litellm.completion(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0", messages=[...])

# Ollama (local)
litellm.completion(model="ollama/llama3.1", api_base="http://localhost:11434", messages=[...])

# Together AI
litellm.completion(model="together_ai/meta-llama/Llama-3-70b-chat-hf", messages=[...])

Pro Tip: Look up exact provider prefixes at LiteLLM’s provider list. The pattern is {provider}/{model_id} where model_id is what the provider’s API expects — including the version date for Anthropic.

Fix 2: Set the Right Env Vars

LiteLLM doesn’t share keys between providers. Set each one explicitly:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GEMINI_API_KEY=AIza...
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION_NAME=us-east-1
export GROQ_API_KEY=gsk_...

Or pass keys per call (useful in multi-tenant code):

litellm.completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    api_key="sk-ant-...",
    messages=[...],
)

Verify which keys are loaded with:

import litellm
litellm.set_verbose = True  # Logs the resolved key source and provider routing

Common Mistake: Putting keys in .env but forgetting python-dotenv. LiteLLM reads os.environ, not .env files. Call load_dotenv() at app startup or use pydantic-settings.

Fix 3: Configure Fallbacks Correctly

fallbacks is a list of model strings tried in order when the primary fails with a retryable error. Catch the exception types LiteLLM treats as retryable: RateLimitError, Timeout, APIConnectionError, ServiceUnavailableError.

import litellm
from litellm import completion

response = completion(
    model="openai/gpt-4o",
    messages=[...],
    fallbacks=[
        "anthropic/claude-3-5-sonnet-20241022",
        "gemini/gemini-1.5-pro",
    ],
    num_retries=2,
)

For more control, use the Router with explicit fallback policies:

from litellm import Router

router = Router(
    model_list=[
        {"model_name": "primary", "litellm_params": {"model": "openai/gpt-4o"}},
        {"model_name": "backup", "litellm_params": {"model": "anthropic/claude-3-5-sonnet-20241022"}},
    ],
    fallbacks=[{"primary": ["backup"]}],
    context_window_fallbacks=[{"primary": ["anthropic/claude-3-5-sonnet-20241022"]}],
)

response = router.completion(
    model="primary",
    messages=[...],
)

context_window_fallbacks is gold for long-context prompts — when the primary’s context limit is too small, LiteLLM automatically retries on a larger-context model.

Fix 4: Drop Provider-Incompatible Params

If you pass response_format, seed, or logprobs (OpenAI features) to Anthropic, you get a BadRequestError. Two fixes:

Drop unsupported params silently:

litellm.drop_params = True

# Or per call:
litellm.completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    response_format={"type": "json_object"},  # Dropped, not sent.
    messages=[...],
    drop_params=True,
)

Or branch on the provider:

def supports_json_mode(model: str) -> bool:
    return model.startswith("openai/") or "json" in model.lower()

kwargs = {"messages": messages}
if supports_json_mode(model):
    kwargs["response_format"] = {"type": "json_object"}

litellm.completion(model=model, **kwargs)

Note: drop_params=True is convenient but silent. If you rely on JSON mode and switch to a provider that doesn’t support it, your prompts will start returning prose. Pair drop_params=True with explicit “respond in JSON” instructions in the system prompt as a safety net.

Fix 5: Handle Streaming Across Providers

LiteLLM normalizes streaming chunks to the OpenAI shape, but the content of those chunks varies — some providers send chunks with delta.content=None to signal role/tool start. Filter for content:

response = litellm.completion(
    model="anthropic/claude-3-5-sonnet-20241022",
    messages=[...],
    stream=True,
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content:  # Skip None and empty
        print(content, end="", flush=True)

For async streaming:

import asyncio
from litellm import acompletion

async def main():
    response = await acompletion(
        model="anthropic/claude-3-5-sonnet-20241022",
        messages=[...],
        stream=True,
    )
    async for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

The final usage data arrives on the last chunk. Sum it from the stream or use litellm.stream_chunk_builder(chunks) to rebuild the full response object after iterating.

Fix 6: Track Cost and Token Usage

LiteLLM has built-in cost tracking for every supported provider. Read it off the response:

response = litellm.completion(model="openai/gpt-4o", messages=[...])
print("Cost:", response._hidden_params["response_cost"])
print("Tokens:", response.usage.total_tokens)

For aggregate tracking, register a success callback:

def track(kwargs, completion_response, start_time, end_time):
    print(f"{kwargs['model']}: ${completion_response._hidden_params['response_cost']:.4f}")

litellm.success_callback = [track]

If response_cost shows 0.0, LiteLLM doesn’t have pricing for that model. Add custom pricing:

litellm.register_model({
    "my-custom-model": {
        "max_tokens": 8192,
        "input_cost_per_token": 0.000001,
        "output_cost_per_token": 0.000002,
        "litellm_provider": "openai",
        "mode": "chat",
    }
})

Fix 7: Proxy Server (litellm --model ...) Returns 401

The LiteLLM proxy server is a separate use case — you run it as a gateway and your apps point at http://localhost:4000. If clients get 401, check the master key:

# Set the master key the proxy will require from clients
export LITELLM_MASTER_KEY="sk-1234"

# Start the proxy
litellm --config config.yaml --port 4000

In your client:

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234",  # The LiteLLM master key, not OpenAI's
    base_url="http://localhost:4000",
)
client.chat.completions.create(model="gpt-4o", messages=[...])

For per-team or per-user keys, create virtual keys via the proxy’s /key/generate endpoint or the UI. Don’t share the master key with end users — it has admin privileges.

Fix 8: Logging and Debugging

When something silently misbehaves, turn on verbose mode and watch the resolved request:

import litellm

litellm.set_verbose = True  # Print SDK-level routing and request details

# Or in production, use structured logging
import logging
logging.basicConfig(level=logging.INFO)
logging.getLogger("LiteLLM").setLevel(logging.DEBUG)

You’ll see the exact provider, model ID, API base, and which env var was used. This is the fastest way to debug “why is my Anthropic call going to OpenAI.”

Still Not Working?

A few less-obvious failures:

  • json_repair import errors. Older LiteLLM optional deps. Run pip install -U litellm[proxy] or pip install litellm[extra_proxy].
  • Anthropic prompt caching not triggering. Pass cache_control: {"type": "ephemeral"} on a message block — LiteLLM forwards it through. Cache hits require identical prefixes.
  • Tool calls with mismatched IDs across providers. Anthropic uses tool_use_id, OpenAI uses tool_call_id. LiteLLM normalizes the response, but if you hand-build the next turn’s messages you need to use the same id the model returned, not a regenerated one.
  • Embeddings work for one provider but not another. embedding() is a separate function. Use litellm.embedding(model="openai/text-embedding-3-small", input=[...]). Provider prefixes work the same way as completion.
  • litellm.completion blocks the event loop in FastAPI. You called the sync function from async code. Use litellm.acompletion and await.
  • Token counting is off by 5-10%. LiteLLM estimates tokens for providers that don’t return exact counts. For billing, treat counts as approximate and add a safety margin.
  • Proxy returns 429 even when the upstream provider isn’t rate-limited. Check the proxy’s tpm_limit / rpm_limit settings in config.yaml — they apply before the upstream call.
  • drop_params=True drops a param you actually need. Set litellm.drop_params = False globally and handle compatibility in your code — explicit beats implicit when output quality matters.

For related LLM SDK and proxy issues, see OpenAI API not working, Ollama not working, LangChain Python not working, and Instructor not working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles