Fix: DSPy Not Working — LM Configuration, Signatures, Modules, Optimizers, and Cache Surprises
Quick Answer
How to fix DSPy errors — no LM configured, signature field types, ChainOfThought vs Predict, optimizer (MIPROv2) setup, retrieval module wiring, async usage, and cache invalidation between runs.
The Error
You write your first DSPy program and it complains about a missing LM:
import dspy
predict = dspy.Predict("question -> answer")
result = predict(question="What's the capital of France?")
# dspy.utils.callbacks.UserError: No LM is loaded. Please configure your LM using `dspy.configure(lm=...)`.Or your signature throws an unexpected output type:
class QA(dspy.Signature):
"""Answer a question concisely."""
question: str = dspy.InputField()
answer: int = dspy.OutputField() # Expecting int
qa = dspy.Predict(QA)
result = qa(question="How many planets are in our solar system?")
# pydantic.ValidationError: Input should be a valid integer, got 'eight'Or the optimizer runs but the resulting program is no better than the baseline:
optimizer = dspy.MIPROv2(metric=my_metric, auto="light")
optimized = optimizer.compile(student=program, trainset=trainset)
# No improvement. Sometimes worse.Or you change a prompt and DSPy keeps returning cached old results:
# Edit the signature docstring → behavior unchanged.
# Edit the LM model name → unchanged.Why This Happens
DSPy compiles natural-language modules into prompts. Three layers cause most issues:
- Global LM configuration. Every
dspy.Predict,dspy.ChainOfThought, or custom Module uses the LM set bydspy.configure(lm=...). Without it, the first call raises. You can also passlm=per-call to override. - Signatures define the I/O contract. Strings like
"question -> answer"are shorthand. For type validation or multiple fields, use a class-baseddspy.Signaturewith typedInputField/OutputField. Mismatches between declared types and the model’s output cause Pydantic validation errors at runtime. - Caching is by call args. DSPy caches LM responses by the resolved prompt and model name. Editing your Python code (signature docstring, module composition) sometimes changes the prompt, sometimes doesn’t. When it doesn’t, cached results stick.
The “optimizer didn’t help” issue is usually too few or bad-quality training examples, or a metric that doesn’t differentiate good from bad outputs.
Fix 1: Configure an LM Globally
import dspy
lm = dspy.LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
dspy.configure(lm=lm)
predict = dspy.Predict("question -> answer")
print(predict(question="What's the capital of France?").answer)
# ParisDSPy uses LiteLLM under the hood, so any LiteLLM-supported provider works:
# Anthropic
dspy.configure(lm=dspy.LM("anthropic/claude-3-5-sonnet-20241022"))
# Ollama (local)
dspy.configure(lm=dspy.LM("ollama/llama3.1", api_base="http://localhost:11434"))
# Google Gemini
dspy.configure(lm=dspy.LM("gemini/gemini-1.5-pro"))
# With cost tracking and config:
lm = dspy.LM(
"openai/gpt-4o",
max_tokens=1000,
temperature=0.0,
cache=True,
)
dspy.configure(lm=lm)For multi-LM workflows (cheap model for retrieval, expensive for synthesis), use dspy.context:
cheap = dspy.LM("openai/gpt-4o-mini")
expensive = dspy.LM("openai/gpt-4o")
dspy.configure(lm=cheap)
with dspy.context(lm=expensive):
result = synthesize(...) # Uses expensive
result = retrieve(...) # Back to cheapPro Tip: Lock the LM down for production reproducibility — pin the model version, set temperature=0.0, and pin the DSPy version in your lockfile.
Fix 2: Use Class-Based Signatures for Type Safety
The string shorthand is fine for prototyping. For anything real, use a class:
class AnswerWithCitation(dspy.Signature):
"""Answer the question with at least one citation URL."""
question: str = dspy.InputField(desc="A user question")
context: list[str] = dspy.InputField(desc="Retrieved passages")
answer: str = dspy.OutputField(desc="A concise answer")
citation: str = dspy.OutputField(desc="A URL supporting the answer")
qa = dspy.Predict(AnswerWithCitation)
result = qa(question="...", context=[...])
print(result.answer, result.citation)Typed fields use Pydantic for validation. If the LM returns text that can’t be coerced to the declared type, DSPy raises a clear error (and retries, depending on the LM config).
For complex output structures, use Pydantic models:
from pydantic import BaseModel
class Result(BaseModel):
summary: str
confidence: float
sources: list[str]
class Summarize(dspy.Signature):
"""Summarize the document."""
document: str = dspy.InputField()
result: Result = dspy.OutputField()
summarize = dspy.Predict(Summarize)
out = summarize(document="...")
print(out.result.summary, out.result.confidence)Common Mistake: Declaring int or float outputs and getting validation errors. Models sometimes write "eight" instead of 8. Either accept str and convert in your code, or strengthen the description: desc="Return the number as a digit, not spelled out".
Fix 3: Pick the Right Module
DSPy has several built-in modules:
dspy.Predict— the simplest. Just runs the signature.dspy.ChainOfThought— adds arationalestep before answering. Often improves accuracy on reasoning tasks.dspy.ProgramOfThought— generates and executes Python code for the answer. Best for numeric/symbolic problems.dspy.ReAct— interleaves reasoning with tool calls.dspy.MultiChainComparison— generates multiple chains and picks the best.
Switching is one line:
# Simple:
qa = dspy.Predict("question -> answer")
# With reasoning:
qa = dspy.ChainOfThought("question -> answer")
result = qa(question="...")
print(result.rationale) # The thinking step
print(result.answer)For agentic patterns, wire tools into ReAct:
def search(query: str) -> str:
return wikipedia.search(query)
def calculate(expression: str) -> str:
return str(eval(expression))
agent = dspy.ReAct("question -> answer", tools=[search, calculate])
result = agent(question="Population of Tokyo squared")ReAct lets the LM call your tools by name. The tool function’s docstring/signature becomes part of the prompt.
Fix 4: Compose Modules in a Custom Program
For multi-step programs, subclass dspy.Module:
class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought(AnswerWithCitation)
def forward(self, question):
passages = self.retrieve(question).passages
prediction = self.generate_answer(question=question, context=passages)
return prediction
rag = RAG()
result = rag(question="What is DSPy?")The forward method is your control flow. Each sub-module (self.retrieve, self.generate_answer) becomes an optimization target.
Pro Tip: Keep forward deterministic where possible (no random calls, no side effects). Optimizers re-run forward many times with different prompts — non-determinism in your code makes the metric noisier.
Fix 5: Configure a Retriever for RAG
dspy.Retrieve needs a configured RM (retrieval model):
# ColBERTv2-hosted retriever (the canonical DSPy example):
colbert = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")
dspy.configure(rm=colbert)
retrieve = dspy.Retrieve(k=5)
passages = retrieve("DSPy framework").passagesFor your own retriever (BM25, Chroma, Pinecone, etc.), wrap it in a dspy.Retriever subclass:
class ChromaRM(dspy.Retrieve):
def __init__(self, collection, k=3):
super().__init__(k=k)
self.collection = collection
def forward(self, query, k=None):
results = self.collection.query(
query_texts=[query],
n_results=k or self.k,
)
return dspy.Prediction(passages=results["documents"][0])
dspy.configure(rm=ChromaRM(my_chroma_collection))Fix 6: Optimize With MIPROv2
The optimizer compiles your program by searching for better prompts using examples:
trainset = [
dspy.Example(question="...", answer="...").with_inputs("question"),
# ... 20-50 examples ...
]
def my_metric(example, pred, trace=None):
return example.answer.lower() in pred.answer.lower()
optimizer = dspy.MIPROv2(metric=my_metric, auto="medium", num_threads=4)
optimized_rag = optimizer.compile(
student=RAG(),
trainset=trainset,
)
optimized_rag.save("optimized_rag.json")auto="light", auto="medium", auto="heavy" are progressively more thorough (and more expensive). Start with light for fast iteration.
If the optimized program isn’t better:
- Trainset too small. Aim for at least 20 high-quality examples, ideally 50+.
- Metric is noisy. Test the metric manually first — give it a perfect pred and a bad pred, confirm it returns 1 and 0 respectively. Vague metrics confuse the optimizer.
- Student model is the limit. If gpt-4o-mini can’t do the task even with the best prompt, no amount of optimization will save it. Try a stronger LM as the student or as a teacher.
Note: MIPROv2 calls the LM many times during optimization. Estimate cost before running on production data: a light run on 50 examples with gpt-4o might cost a few dollars.
Fix 7: Cache Control
DSPy caches LM responses. Useful in dev (no repeated cost for the same prompt), confusing when iterating:
# Disable cache for a fresh run:
dspy.configure(cache=False)
# Or disable on the LM:
lm = dspy.LM("openai/gpt-4o-mini", cache=False)
dspy.configure(lm=lm)The cache key includes the resolved prompt and model parameters. If you edit your code but the generated prompt is identical (same signature, same inputs), DSPy reuses the cached response.
To inspect what was actually sent to the LM:
dspy.inspect_history(n=3) # Last 3 callsPrints the actual prompts and responses — invaluable when “DSPy returns weird stuff” turns out to be your signature producing a weirder-than-expected prompt.
Fix 8: Async Calls and Concurrency
DSPy 2.5+ supports async:
import asyncio
async def main():
qa = dspy.Predict("question -> answer")
result = await qa.acall(question="What's 2+2?")
print(result.answer)
asyncio.run(main())For concurrent calls (e.g. batch inference), use dspy.Parallel:
pred = dspy.ChainOfThought("question -> answer")
questions = [
{"question": "Q1"},
{"question": "Q2"},
{"question": "Q3"},
]
results = dspy.Parallel(num_threads=8).forward(pred, questions)For older DSPy versions without acall, wrap the sync function in asyncio.to_thread:
result = await asyncio.to_thread(qa, question="...")Still Not Working?
A few less-obvious failures:
ValidationErroronly on some inputs. The LM’s output style is unstable. Add explicitdesc=to the OutputField clarifying the format, or usedspy.TypedPredictor(forces stricter typing).- Trainset examples don’t seem to influence the optimized program. Make sure each example calls
.with_inputs("field1", "field2")to mark which fields are inputs vs labels. Without it, the optimizer treats all fields as inputs. - Saved/loaded programs don’t behave the same.
program.save("p.json")stores the optimized prompts.program.load("p.json")restores them — but the LM, RM, and Module structure must match the original at load time. - Token limit exceeded in retrieval. Long contexts fill the LM’s window. Either reduce
kinRetrieve, or filter/summarize passages before passing to the next module. - OpenAI 429 during MIPROv2 run. Reduce
num_threads, or addmin_examplesto slow down. Long optimization runs hammer the API. dspy.inspect_historyshows weird system prompts. That’s DSPy’s prompt engineering at work. To customize, write a custom Module that constructs the prompt yourself (escape hatch from the framework).- Different results between local and production. Cache. Or temperature > 0. Or the LM version drifted (Anthropic’s
latestalias points at different snapshots). Pin everything explicitly. AssertionErrorfromdspy.Suggest/dspy.Assert. These are constraints DSPy uses for self-refinement. The constraint failed — read the assertion’s message and either weaken it or improve the upstream prompt.
For related Python LLM tooling and validation issues, see Instructor not working, LiteLLM not working, LangChain Python not working, and Pydantic validation error.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Instructor Not Working — Validation Loops, Mode Mismatch, Streaming, and Anthropic / Gemini Issues
How to fix Python Instructor errors — ValidationError loops, max_retries exhausted, mode=Mode.TOOLS vs JSON, partial streaming type errors, Anthropic and Gemini client patching, token usage tracking.
Fix: LiteLLM Not Working — Model Name Format, API Keys, Streaming, and Fallback Errors
How to fix LiteLLM errors — BadRequestError model not found, missing API key env vars, streaming chunk differences, fallback model not triggering, async drop_params, and proxy server 401.
Fix: LlamaIndex Not Working — Import Errors, Vector Store Issues, and Query Engine Failures
How to fix LlamaIndex errors — ImportError llama_index.core module not found, ServiceContext deprecated use Settings instead, vector store index not persisting, query engine returns irrelevant results, and LlamaIndex 0.10 migration.
Fix: Langfuse Not Working — SDK Init, Tracing Generations, LangChain Wrapper, and Self-Hosted Setup
How to fix Langfuse errors — Python/JS SDK init, trace/span/generation hierarchy, LangChain CallbackHandler, OpenAI wrapper, missing usage/cost data, prompt management, and self-hosted Postgres setup.