Skip to content

Fix: CrewAI Not Working — Agent Delegation, Task Context, and LLM Configuration Errors

FixDevs · (Updated: )

Part of:  Python Errors

Quick Answer

How to fix CrewAI errors — LLM not configured ValidationError, agent delegation loop, task context not passed between agents, tool output truncated, process hierarchical vs sequential, and memory not persisting across runs.

The Error

You try to run your first Crew and it refuses to start:

pydantic.ValidationError: 1 validation error for Agent
  llm: field required

Or agents keep delegating to each other in circles without making progress:

[Manager] → delegates to [Researcher]
[Researcher] → delegates back to [Manager]
[Manager] → delegates to [Researcher]
...

Or a task result gets passed to the next task, but the downstream agent doesn’t actually use it:

task1 = Task(description="Research X", agent=researcher)
task2 = Task(description="Write about the research", agent=writer)
# Writer writes from scratch, ignoring researcher's output

Or tool outputs get truncated mid-response:

Tool output: "The search results are... [TRUNCATED at 4000 chars]"

Or the process stops without clear reason:

Crew execution finished with no final result

CrewAI is designed around “role-based agents” collaborating on tasks. The framework looks simple (define Agents + Tasks + Crew, call kickoff()), but the interaction between delegation, task contexts, process modes, and LLM configuration creates specific failure modes. This guide covers each.

Why This Happens

CrewAI agents delegate tasks to each other when they decide they can’t handle something alone. Without explicit delegation constraints, agents can bounce work back and forth indefinitely. Task contexts (the output of one task passed to the next) require explicit wiring via the context parameter — without it, subsequent agents start from scratch.

The Process mode (sequential vs hierarchical) dramatically changes execution semantics: sequential runs tasks in order with straight context-passing; hierarchical uses a manager LLM to coordinate. Mixing up which you configured is a common source of “why did my crew do that?” confusion.

Diagnostic Timeline

When a CrewAI crew “stops working,” the visible symptom is usually nonsense output or infinite loops. The real cause is almost always invisible to the crew itself. Here is how to narrow it down.

Minute 0 — First guess: tweak the agents. You rewrite role, goal, and backstory, hoping a clearer persona fixes the bad output. The crew still produces nonsense. Persona tweaks help marginally, but they cannot fix structural problems with delegation, context, or tool calling.

Minute 5 — Inspect tool call format in verbose=True logs. Most agent failures trace back to broken tool calls. The LLM outputs a tool name that does not exist, malformed JSON arguments, or arguments that violate the tool’s Pydantic schema. CrewAI retries silently up to max_iter, then gives up and returns garbage. Read the verbose log — if you see repeated “Action not found” or “Invalid arguments” lines, the model’s tool-calling format does not match what CrewAI expects.

Minute 10 — Confirm LLM JSON mode. GPT-4o-mini and similar models default to text mode. For structured outputs (output_pydantic=..., tool calls), you need JSON mode or function calling enabled. If your LLM provider’s wrapper does not force response_format={"type": "json_object"} for structured outputs, the model returns prose that the parser rejects. Use CrewAI’s LLM() wrapper rather than raw provider SDKs — it handles this automatically.

Minute 20 — Check delegation flags. Run through every Agent(...) definition and confirm allow_delegation. If multiple agents have it set to True, delegation loops are almost guaranteed. Default to False everywhere except a single explicit manager.

Minute 30 — Verify task context wiring. In sequential mode, the writer agent does not automatically see the researcher’s output unless you pass context=[research_task] on the write task. Without it, each agent starts from a blank slate and re-derives prior work — manifesting as “the writer ignored the research.”

Minute 45 — Memory backend persistence. If memory=True but long-term memory is not surviving across runs, check that CREWAI_STORAGE_DIR points at a persistent location. In containerized deployments, the default ~/.crewai/ is inside the ephemeral container filesystem and resets every restart. Mount a volume or point the storage at S3-backed storage for true persistence.

Fix 1: LLM Configuration — The llm Field

pydantic.ValidationError: 1 validation error for Agent
  llm: field required

CrewAI requires explicit LLM configuration for each agent (since v0.30+). Older tutorials that rely on implicit OpenAI defaults break immediately.

Modern setup:

from crewai import Agent, Task, Crew, Process
from crewai import LLM

# Option 1: Use CrewAI's LLM wrapper
llm = LLM(
    model="gpt-4o-mini",
    temperature=0.7,
    api_key="sk-...",   # Or read from OPENAI_API_KEY env var
)

# Option 2: LangChain LLM (compatible)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

researcher = Agent(
    role="Senior Researcher",
    goal="Gather accurate information about {topic}",
    backstory="You have 15 years of experience in {topic} research.",
    llm=llm,   # Required
    verbose=True,
)

Multiple models for different agents:

from crewai import LLM

fast_llm = LLM(model="gpt-4o-mini", temperature=0.3)
strong_llm = LLM(model="gpt-4o", temperature=0.7)

researcher = Agent(role="Researcher", goal="...", backstory="...", llm=fast_llm)
writer = Agent(role="Writer", goal="...", backstory="...", llm=strong_llm)

Local LLMs via Ollama:

from crewai import LLM

local_llm = LLM(
    model="ollama/llama3",   # Note the "ollama/" prefix
    base_url="http://localhost:11434",
)

agent = Agent(role="...", goal="...", backstory="...", llm=local_llm)

For Ollama setup and model management, see Ollama not working.

Environment variable setup (cleanest for production):

export OPENAI_API_KEY=sk-...
export OPENAI_MODEL_NAME=gpt-4o-mini   # CrewAI reads this as default
from crewai import Agent

# No llm= needed if env vars set — uses OPENAI_MODEL_NAME
researcher = Agent(role="Researcher", goal="...", backstory="...")

Fix 2: Delegation Loops

[Manager] delegates to [Researcher]
[Researcher] delegates back to [Manager]
[Manager] delegates to [Researcher]
...

Agents can delegate work to other crew members. Without guardrails, they loop.

Disable delegation for most agents:

researcher = Agent(
    role="Researcher",
    goal="...",
    backstory="...",
    llm=llm,
    allow_delegation=False,   # Agent can't delegate to others
)

Only agents that truly need to delegate (managers, coordinators) should have allow_delegation=True. The default in recent versions is False — but tutorials often set it to True for all agents, causing loops.

Constrain with max_iter:

agent = Agent(
    role="...",
    goal="...",
    backstory="...",
    llm=llm,
    max_iter=15,   # Max agent iterations (thoughts + tool calls) before stopping
    max_execution_time=300,   # Max seconds for this agent's work
)

Hierarchical process with an explicit manager to prevent ad-hoc delegation:

from crewai import Process

crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, write_task, edit_task],
    process=Process.hierarchical,   # Manager coordinates
    manager_llm=LLM(model="gpt-4o"),   # Separate LLM for the manager
    verbose=True,
)

In hierarchical mode, agents don’t delegate to each other directly — the manager decides task assignments.

Common Mistake: Setting allow_delegation=True on every agent and hoping they cooperate. Delegation is powerful but requires discipline — most multi-agent crews work better with allow_delegation=False and explicit task ordering (sequential process).

Fix 3: Task Context Not Passing Between Tasks

task1 = Task(description="Research the topic", agent=researcher)
task2 = Task(description="Write an article", agent=writer)

# Writer doesn't see researcher's output

In sequential process, task results cascade automatically — but the downstream agent’s LLM still needs explicit reference to previous output in the task description.

Explicit context parameter:

research_task = Task(
    description="Research {topic} and gather key facts.",
    expected_output="A bulleted list of 10 key facts.",
    agent=researcher,
)

write_task = Task(
    description=(
        "Write a 500-word article about {topic}. "
        "Use the research findings to support your claims."
    ),
    expected_output="A 500-word article in markdown format.",
    agent=writer,
    context=[research_task],   # Explicitly passes research_task output to writer
)

The context list tells CrewAI that write_task depends on research_task’s output. The writer agent receives the researcher’s output as part of its prompt automatically.

Multiple context sources:

review_task = Task(
    description="Review the article for accuracy and style.",
    agent=editor,
    context=[research_task, write_task],   # Both research and draft available
)

expected_output is critical — without it, the LLM doesn’t know what format to produce, and the next agent gets garbled data:

# WRONG — vague output format
Task(description="Research the market", agent=researcher)
# Output might be a paragraph, a list, or bullet points — inconsistent

# CORRECT — specific format expected
Task(
    description="Research the EV market in 2025.",
    expected_output=(
        "A JSON object with the following keys:\n"
        "- market_size: numeric value in billions USD\n"
        "- top_players: list of company names\n"
        "- growth_rate: percentage\n"
        "- key_trends: list of 3-5 trend descriptions"
    ),
    agent=researcher,
)

Pro Tip: Write expected_output as if you’re specifying an API contract. The more structured and constrained, the more reliably downstream agents can use the result. Vague outputs (“a summary”, “some insights”) produce vague, inconsistent results that compound through the crew.

Fix 4: Tools and Tool Output Limits

from crewai_tools import SerperDevTool, FileReadTool

search = SerperDevTool()   # Google search via Serper API
file_reader = FileReadTool()

researcher = Agent(
    role="Researcher",
    goal="...",
    backstory="...",
    llm=llm,
    tools=[search, file_reader],
)

Build custom tools with @tool decorator:

from crewai.tools import tool

@tool("Calculate statistics")
def calculate_stats(numbers: list[float]) -> dict:
    """Compute mean, median, and standard deviation of a list of numbers."""
    import statistics
    return {
        "mean": statistics.mean(numbers),
        "median": statistics.median(numbers),
        "stdev": statistics.stdev(numbers),
    }

agent = Agent(role="Data Analyst", goal="...", backstory="...", tools=[calculate_stats])

Tool output truncation happens because LLMs have context limits. Large tool outputs (scraped pages, big API responses) get cut off.

@tool("Fetch article content")
def fetch_article(url: str) -> str:
    """Fetch and return the first 3000 chars of an article."""
    import requests
    content = requests.get(url).text
    return content[:3000] + "..." if len(content) > 3000 else content
    # Summarize or truncate explicitly — don't dump 100KB into context

BaseTool class for more complex tools (with inputs validation):

from crewai.tools import BaseTool
from pydantic import BaseModel, Field

class QueryInput(BaseModel):
    query: str = Field(..., description="Search query string")
    limit: int = Field(10, description="Max results to return")

class DatabaseSearchTool(BaseTool):
    name: str = "Database Search"
    description: str = "Search the internal database for documents."
    args_schema: type[BaseModel] = QueryInput

    def _run(self, query: str, limit: int = 10) -> str:
        results = db.search(query, limit=limit)
        return "\n".join(f"- {r.title}: {r.summary}" for r in results)

search_tool = DatabaseSearchTool()

Fix 5: Process Modes — Sequential vs Hierarchical

from crewai import Crew, Process

# Sequential: tasks run in order, output cascades
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, write_task, edit_task],
    process=Process.sequential,   # Default
    verbose=True,
)

# Hierarchical: a manager LLM coordinates
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, write_task, edit_task],
    process=Process.hierarchical,
    manager_llm=LLM(model="gpt-4o"),   # Required for hierarchical
    verbose=True,
)

Sequential mode — best for most use cases:

  • Tasks execute in the order they’re listed
  • Each task’s output flows to the next (if context=[previous_task])
  • No coordinator overhead
  • Predictable execution

Hierarchical mode — for complex, dynamic workflows:

  • A manager LLM decides task order and delegation
  • Agents don’t need to know about each other
  • Higher LLM costs (manager adds LLM calls)
  • Less deterministic

Choosing between them:

Use sequential whenUse hierarchical when
Task order is known upfrontTask order depends on intermediate results
2–5 tasks5+ tasks with complex dependencies
You want predictable costsYou need dynamic replanning
You want reproducibilityManager can improve efficiency

Fix 6: Memory and Persistence

CrewAI has three memory types — short-term (within a run), long-term (across runs), and entity (facts about specific entities).

from crewai import Crew

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    memory=True,   # Enable all three memory types
    verbose=True,
)

# Or configure individually
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    memory=True,
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
    verbose=True,
)

Memory stores to ~/.crewai/ by default. To set a custom location:

import os
os.environ["CREWAI_STORAGE_DIR"] = "/path/to/custom/storage"

Memory types:

  • Short-term: Agent conversation history within a single crew run. Automatic.
  • Long-term: Persists across runs. Lets agents learn from previous crews.
  • Entity: Tracks facts about specific entities (people, places, products). Useful for customer-facing agents.

Clear memory between tests:

import shutil
import os

shutil.rmtree(os.path.expanduser("~/.crewai/memory"), ignore_errors=True)

For production with shared memory (multi-user, multi-server) — use a centralized vector store:

crew = Crew(
    agents=[...],
    tasks=[...],
    memory=True,
    embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}},
    # Use a centralized vector DB rather than local file storage
)

Fix 7: Asynchronous and Parallel Task Execution

from crewai import Task

research_us = Task(
    description="Research US market",
    agent=researcher,
    async_execution=True,   # This task runs in parallel with other async tasks
)

research_eu = Task(
    description="Research EU market",
    agent=researcher,
    async_execution=True,
)

combine_research = Task(
    description="Combine US and EU research into a global report",
    agent=analyst,
    context=[research_us, research_eu],   # Waits for both to complete
)

crew = Crew(
    agents=[researcher, analyst],
    tasks=[research_us, research_eu, combine_research],
)

async_execution=True lets tasks run concurrently if their dependencies allow. The combining task automatically waits for both parallel tasks.

kickoff_async() for concurrent crew runs:

import asyncio

async def run_crews():
    results = await asyncio.gather(
        crew1.kickoff_async(inputs={"topic": "AI"}),
        crew2.kickoff_async(inputs={"topic": "Biotech"}),
        crew3.kickoff_async(inputs={"topic": "Fintech"}),
    )
    return results

results = asyncio.run(run_crews())

For asyncio event loop patterns, see Python asyncio not running.

Fix 8: Inputs, Outputs, and Callbacks

Parameterize crews with inputs:

research_task = Task(
    description="Research {topic} for a {audience} audience.",
    agent=researcher,
    expected_output="Research notes on {topic}.",
)

# Variables in description are substituted at kickoff
result = crew.kickoff(inputs={"topic": "quantum computing", "audience": "executives"})

Structured output with Pydantic:

from pydantic import BaseModel
from crewai import Task

class ArticleOutline(BaseModel):
    title: str
    sections: list[str]
    key_points: list[str]
    estimated_word_count: int

outline_task = Task(
    description="Create an outline for an article about {topic}.",
    expected_output="A structured outline.",
    agent=writer,
    output_pydantic=ArticleOutline,   # Validates output matches this schema
)

result = crew.kickoff(inputs={"topic": "AI ethics"})
outline = result.pydantic   # Typed Pydantic object
print(outline.title)
print(outline.sections)

Task callbacks for logging/monitoring:

def task_callback(output):
    print(f"Task {output.task_id} completed: {output.raw[:100]}...")
    # Log to monitoring system

task = Task(
    description="...",
    agent=agent,
    callback=task_callback,   # Called when task completes
)

Common Mistake: Accessing result.raw expecting a string when the task returned JSON. Always check the type: result.pydantic (structured), result.json_dict (parsed JSON), or result.raw (raw string).

Still Not Working?

CrewAI vs LangGraph vs AutoGen

  • CrewAI — Role-based collaboration, opinionated structure, fast to set up. Best for linear or lightly-branched agent workflows.
  • LangGraph — Explicit state machines, checkpointing, more control. Best for complex flows with loops and human-in-the-loop. See LangGraph not working.
  • AutoGen — Microsoft’s framework, strong for code generation and technical tasks.

Debugging Crew Execution

crew = Crew(
    agents=[...],
    tasks=[...],
    verbose=True,   # Print every agent step
)

# Or set specific verbosity
agent = Agent(
    role="...",
    goal="...",
    backstory="...",
    llm=llm,
    verbose=True,   # Per-agent verbose
)

Enable full trace logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Cost Tracking

CrewAI reports token usage after kickoff:

result = crew.kickoff(inputs={"topic": "AI"})
print(f"Total tokens used: {result.token_usage}")
# {"total_tokens": 12345, "prompt_tokens": 8000, "completion_tokens": 4345}

For OpenAI API cost and rate limit handling, see OpenAI API not working.

Integration with RAG

Combine CrewAI with a RAG pipeline by creating a custom tool that queries your vector store. Wrap the vector store’s query function with @tool, return retrieved chunks as a single string, and give the tool to the researcher agent.

from crewai.tools import tool

@tool("Search knowledge base")
def search_kb(query: str) -> str:
    """Search the internal knowledge base for relevant information."""
    response = llamaindex_query_engine.query(query)
    return str(response)

researcher = Agent(
    role="Researcher",
    goal="Find accurate information from internal docs",
    backstory="You are an expert at searching internal knowledge bases.",
    llm=llm,
    tools=[search_kb],
)

Outputs Truncated Mid-Sentence

If task outputs cut off mid-sentence, the agent hit its max_iter or the LLM hit max_tokens before finishing. Raise max_iter on the agent and max_tokens on the LLM. For long-form outputs, split into multiple smaller tasks rather than one giant task — agents reason better in shorter turns and you avoid the truncation entirely.

Token Costs Spiking After a Refactor

If your crew suddenly costs 5x more after adding an agent or tool, check whether you added memory=True or made all agents verbose=True with shared context. Each agent re-receives the full conversation history every turn — adding a fourth agent does not add 25% to costs, it multiplies them. Audit result.token_usage per run and consider switching to hierarchical mode where the manager filters context per delegation.

Manager LLM Producing Inconsistent Routing

In hierarchical mode, the manager decides which agent handles each task. A weak or low-temperature manager can route every task to the same agent, ignoring the crew’s structure. Use a stronger model (GPT-4o, Claude Opus 4) for manager_llm with temperature around 0.3 — high enough to consider alternatives, low enough to be consistent across runs.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles