Fix: Hypothesis Not Working — Strategy Errors, Flaky Tests, and Shrinking Issues
Quick Answer
How to fix Hypothesis errors — Unsatisfied assumption, Flaky test detected, HealthCheck data_too_large, strategy composition failing, example database stale, settings profile not found, and stateful testing errors.
The Error
You run a property-based test and Hypothesis gives up:
hypothesis.errors.Unsatisfied: Unable to satisfy assumptions of hypothesis test_my_functionOr a test that passed yesterday is now flagged as flaky:
hypothesis.errors.Flaky: Hypothesis test_foo produces unreliable results:
Falsified on the first call but did not on a subsequent oneOr Hypothesis complains about slow data generation:
hypothesis.errors.FailedHealthCheck:
data generation is extremely slow: Only produced 4 valid examples in 1.00 secondsOr @given decorators conflict with pytest fixtures:
InvalidArgument: Got unsatisfiable strategy. Hypothesis cannot generate examples.Or your stateful test fails deep into a long sequence and the shrinking hangs:
Shrinking...
# 30 minutes later, still shrinkingHypothesis is Python’s dominant property-based testing library — instead of hand-writing inputs, you describe the space of valid inputs and Hypothesis generates hundreds of random examples, automatically shrinking failing cases to minimal reproductions. This is more powerful than example-based testing but introduces error modes that don’t exist in regular pytest. This guide covers each.
Why This Happens
Hypothesis uses strategies — descriptions of data spaces — to generate test inputs. Strategies compose (integers, lists, dictionaries, complex types) but can become unsatisfiable if your filters reject most generated values. When a test fails, Hypothesis shrinks the failing input to a minimal reproduction — but shrinking a complex strategy can take a long time, and a test that’s non-deterministic triggers the Flaky error.
The example database (.hypothesis/) stores past failures so subsequent runs re-try them. Stale entries or different code paths can cause unexpected behavior.
Fix 1: Unsatisfied Strategy — Too Much Filtering
from hypothesis import given, strategies as st
@given(st.integers().filter(lambda x: x > 100 and x < 110))
def test_narrow(n):
...
# hypothesis.errors.Unsatisfied: Unable to generate any examples.filter() rejects generated values that don’t match. When the filter rejects most of the random space, Hypothesis runs out of attempts and fails.
Use a bounded strategy instead:
# WRONG — filters out 99.9999% of integers
@given(st.integers().filter(lambda x: 100 < x < 110))
def test_narrow(n): ...
# CORRECT — generate the exact range
@given(st.integers(min_value=101, max_value=109))
def test_narrow(n): ...Composition for complex constraints:
# Generate even integers directly, not via filter
@given(st.integers(min_value=0, max_value=1000).map(lambda x: x * 2))
def test_even(n):
assert n % 2 == 0assume() inside the test — filters after generation:
from hypothesis import given, assume, strategies as st
@given(st.lists(st.integers()))
def test_sort_non_empty(lst):
assume(len(lst) > 0) # Skip empty lists
assume(len(set(lst)) > 1) # Skip lists with all same values
sorted_lst = sorted(lst)
assert sorted_lst[0] <= sorted_lst[-1]assume() is clearer than .filter() for test-specific conditions. If most examples trigger assume(False), you get the same Unsatisfied error — but with easier debugging (add a print before the assume).
Common Mistake: Using .filter(lambda x: valid_condition(x)) when valid_condition is highly restrictive. Always generate valid data directly when possible. Filters are cheap if they reject <10% of values; expensive and flaky if they reject >50%.
Fix 2: Flaky — Non-Deterministic Tests
hypothesis.errors.Flaky: Hypothesis test_foo produces unreliable results:
Falsified on the first call but did not on a subsequent oneHypothesis re-runs a failing input to confirm the failure — if the re-run passes, the test is flaky.
Common causes:
- Test uses global state (time, random seed, environment variables)
- External system dependency (DB, network, filesystem with uncleaned files)
- Mutable default argument that accumulates state
Example — flaky test with time:
import time
@given(st.integers())
def test_time_based(n):
result = some_function(n)
assert result.timestamp < time.time() # Flaky — time.time() changes between callsFix: freeze time or mock dependencies:
from freezegun import freeze_time
@freeze_time("2025-01-01")
@given(st.integers())
def test_time_based(n):
result = some_function(n)
assert result.timestamp < time.time()Example — flaky test with uncleaned files:
@given(st.text())
def test_file_write(content):
with open("/tmp/test.txt", "w") as f:
f.write(content)
# ... test reads /tmp/test.txt
# Previous run's content may still be aroundFix: use a fresh tempdir per run:
import tempfile, os
@given(st.text())
def test_file_write(content):
with tempfile.NamedTemporaryFile(mode="w", delete=False) as f:
f.write(content)
path = f.name
try:
# test logic
pass
finally:
os.unlink(path)Register the flake for investigation rather than hiding it:
from hypothesis import given, strategies as st, settings, HealthCheck
@given(st.integers())
@settings(suppress_health_check=[HealthCheck.function_scoped_fixture])
def test_still_flaky(n):
...Use this sparingly — suppressing a Flaky warning without understanding it just hides a real bug.
Fix 3: FailedHealthCheck — Slow or Biased Generation
hypothesis.errors.FailedHealthCheck:
data generation is extremely slow: Only produced 4 valid examples in 1.00 seconds.Data generation should average <1ms per example. If it’s much slower, Hypothesis warns.
Common causes:
- Expensive strategy composition — many nested
.map()/.filter() - Expensive side effect in strategy — reading files, network calls
# WRONG — filesystem read in every example
@given(st.text().map(lambda s: open(f"/tmp/{s}.txt").read()))
def test_bad(content): ...
# WRONG — HTTP call in strategy
@given(st.integers().map(lambda i: requests.get(f"https://api.example.com/{i}").json()))
def test_bad(data): ...Move expensive setup outside the strategy:
# CORRECT — pre-compute data once
cached_data = {i: requests.get(f"https://api.example.com/{i}").json() for i in range(100)}
@given(st.sampled_from(list(cached_data.keys())))
def test_good(key):
data = cached_data[key]
...Suppress specific health checks when you understand the cost:
from hypothesis import given, settings, HealthCheck, strategies as st
@given(st.integers())
@settings(suppress_health_check=[HealthCheck.too_slow])
def test_known_slow(n):
...Common health check types:
| HealthCheck | Meaning |
|---|---|
too_slow | Generation is slow |
data_too_large | Generated data exceeds size limit |
filter_too_much | Too many .filter() rejections |
function_scoped_fixture | pytest fixture may not reset between examples |
return_value | Test function returns non-None |
differing_executors | Different executor than expected |
Pro Tip: Rather than suppressing too_slow, fix it. A slow strategy wastes CPU on every test run and often hides real issues (e.g., generating data that’s too complex). Trim the strategy to only what the test actually needs.
Fix 4: Strategy Composition Patterns
Basic composition:
from hypothesis import given, strategies as st
# Tuple of (int, str)
@given(st.tuples(st.integers(), st.text()))
def test_tuple(t):
i, s = t
# Dict with fixed keys
@given(st.fixed_dictionaries({
"name": st.text(min_size=1),
"age": st.integers(min_value=0, max_value=150),
"email": st.emails(),
}))
def test_user(user): ...
# Dict with dynamic keys
@given(st.dictionaries(
keys=st.text(min_size=1, max_size=10),
values=st.integers(),
min_size=1, max_size=100,
))
def test_dict(d): ...@composite for custom strategies:
from hypothesis import strategies as st
@st.composite
def valid_user(draw):
name = draw(st.text(min_size=1, max_size=50))
age = draw(st.integers(min_value=0, max_value=120))
email = f"{draw(st.text(alphabet='abcdefghij', min_size=3, max_size=10))}@example.com"
return {"name": name, "age": age, "email": email}
@given(valid_user())
def test_user_creation(user):
...Recursive strategies for tree-like data:
from hypothesis import given, strategies as st
json_strategy = st.recursive(
st.one_of(
st.none(),
st.booleans(),
st.integers(),
st.floats(allow_nan=False, allow_infinity=False),
st.text(),
),
lambda children: st.one_of(
st.lists(children),
st.dictionaries(st.text(), children),
),
max_leaves=10,
)
@given(json_strategy)
def test_json_roundtrip(data):
import json
assert json.loads(json.dumps(data)) == dataDataclass and Pydantic model generation:
from dataclasses import dataclass
from hypothesis import given, strategies as st
@dataclass
class User:
name: str
age: int
@given(st.builds(User, name=st.text(min_size=1), age=st.integers(min_value=0, max_value=120)))
def test_user(user):
...For Pydantic models, hypothesis-pydantic auto-generates strategies:
pip install hypothesis[pydantic]Fix 5: @settings for Test Configuration
from hypothesis import given, settings, strategies as st
@given(st.integers())
@settings(
max_examples=500, # Generate 500 examples (default 100)
deadline=1000, # Each example must complete in 1000ms
derandomize=False, # Use random seeds (True = deterministic)
print_blob=True, # Print failure reproduction blob
)
def test_with_settings(n):
...Named profiles for different environments:
from hypothesis import settings, Verbosity
settings.register_profile("ci", max_examples=1000, deadline=5000)
settings.register_profile("dev", max_examples=10, verbosity=Verbosity.verbose)
settings.register_profile("quick", max_examples=5)
# Use via env var or pytest option
# HYPOTHESIS_PROFILE=ci pytest
# pytest --hypothesis-profile=ciThen in conftest.py:
from hypothesis import settings
settings.load_profile("dev") # Default for this projectCommon Mistake: Setting max_examples=10 to “speed up” tests in CI. You lose Hypothesis’s main benefit (finding edge cases). Instead, register a fast dev profile and use a thorough CI profile — CI has time to run many examples, developers need fast feedback.
Fix 6: Shrinking and Reproducing Failures
When a test fails, Hypothesis tries to shrink — find a smaller failing input. This often produces surprisingly minimal reproductions.
@given(st.lists(st.integers(), min_size=1))
def test_sort(lst):
assert sorted(lst) == lst # Wrong test — fails
# Hypothesis output:
# Falsifying example: test_sort(lst=[1, 0])The minimal failing input is [1, 0] — two elements is the smallest possible counterexample.
Reproducing a specific failure:
Hypothesis prints a “blob” (reproduction token) on failure:
You can reproduce this example by temporarily adding:
@reproduce_failure('6.100.0', b'...base64...')from hypothesis import reproduce_failure, given, strategies as st
@reproduce_failure('6.100.0', b'AXic...')
@given(st.lists(st.integers()))
def test_sort(lst):
assert sorted(lst) == lst
# Always runs the same failing exampleShrinking is slow for complex strategies. If shrinking takes > 10 minutes, either simplify your strategy or disable shrinking temporarily:
@settings(phases=[Phase.explicit, Phase.reuse, Phase.generate]) # Skip shrinking
def test_expensive(x): ...Example database — past failures are cached:
# Stored in .hypothesis/examples/
ls .hypothesis/examples/Delete it to reset:
rm -rf .hypothesis/Fix 7: Stateful Testing
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
from hypothesis import strategies as st
class BankAccount:
def __init__(self):
self.balance = 0
def deposit(self, amount):
self.balance += amount
def withdraw(self, amount):
if amount > self.balance:
raise ValueError("insufficient funds")
self.balance -= amount
class BankAccountTest(RuleBasedStateMachine):
def __init__(self):
super().__init__()
self.account = BankAccount()
@rule(amount=st.integers(min_value=1, max_value=1000))
def deposit(self, amount):
self.account.deposit(amount)
@rule(amount=st.integers(min_value=1, max_value=1000))
def withdraw(self, amount):
try:
self.account.withdraw(amount)
except ValueError:
pass
@invariant()
def balance_never_negative(self):
assert self.account.balance >= 0
# Run via pytest
TestBankAccount = BankAccountTest.TestCase@rule defines operations; @invariant runs after every rule to check consistency. Hypothesis generates random sequences of operations and checks invariants hold.
Stateful testing shrinking can be slow — limit run size:
from hypothesis.stateful import RuleBasedStateMachine, run_state_machine_as_test
from hypothesis import settings
class MyState(RuleBasedStateMachine):
...
TestMyState = MyState.TestCase
TestMyState.settings = settings(max_examples=50, stateful_step_count=30)Fix 8: Integration with pytest
Hypothesis integrates seamlessly with pytest:
# test_math.py
import pytest
from hypothesis import given, strategies as st
@given(st.integers(), st.integers())
def test_add_commutative(a, b):
assert a + b == b + a
@pytest.mark.parametrize("fn", [add, multiply])
@given(st.integers(), st.integers())
def test_operations(fn, a, b):
assert fn(a, b) == fn(b, a)Fixtures with Hypothesis — avoid function-scoped fixtures if they hold state:
# WRONG — fixture creates shared state across Hypothesis examples
@pytest.fixture
def db():
conn = create_db()
yield conn
conn.close()
@given(st.text())
def test_db_insert(db, value): # WARNING: function_scoped_fixture
db.insert(value)Fix — use module scope or re-create inside the test:
@pytest.fixture(scope="module")
def db():
conn = create_db()
yield conn
conn.close()
@given(st.text())
def test_db_insert(db, value):
# Reset state inside the test
db.clear()
db.insert(value)For pytest fixture lifecycle patterns that interact with Hypothesis, see pytest fixture not found. For mypy type-checking of test files using Hypothesis strategies, see Python mypy type error.
Still Not Working?
Hypothesis vs Regular pytest Parametrize
- Regular
@pytest.mark.parametrize— Explicit, small input sets. Best when you know exactly which inputs matter. - Hypothesis
@given— Generative, finds edge cases automatically. Best for general-purpose invariants and transformations.
Use both: parametrize for specific known-tricky inputs, given for broader property coverage.
Targeted Search with @example
Add specific inputs that must always be tested:
from hypothesis import given, example, strategies as st
@given(st.integers())
@example(0)
@example(-1)
@example(2**63 - 1) # Max int64
def test_func(n):
...@example always runs these specific values on every test execution, alongside random generation. Use for known-tricky edge cases.
Coverage and Optimization
Hypothesis shrinks for minimality by default. For faster test runs, limit shrinking:
from hypothesis import settings, Phase
@settings(phases=[Phase.generate, Phase.reuse]) # Skip shrinking entirely
def test_fast(x): ...Use this in CI when you only need to know whether the test fails, not the minimal input.
Type-Based Generation with from_type
from hypothesis import given, strategies as st
from typing import List, Optional
@given(st.from_type(List[int]))
def test_list(lst): ...
@given(st.from_type(Optional[str]))
def test_optional(s): ...Let Hypothesis infer strategies from type annotations. Works for most built-in types and many third-party types.
Custom Type Strategies
Register a strategy for your own types:
from hypothesis import strategies as st
class Money:
def __init__(self, amount, currency):
self.amount = amount
self.currency = currency
st.register_type_strategy(
Money,
st.builds(Money,
amount=st.integers(min_value=0, max_value=1_000_000),
currency=st.sampled_from(["USD", "EUR", "JPY"]),
),
)
@given(st.from_type(Money))
def test_money(m):
assert m.amount >= 0For testing patterns with pre-commit hooks that integrate Hypothesis into the commit workflow, see pre-commit not working. For Ruff-based linting that complements Hypothesis’s property testing, see Ruff not working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Tox Not Working — Environment Creation, Config Errors, and Multi-Python Testing
How to fix Tox errors — ERROR cannot find Python interpreter, tox.ini config parsing error, allowlist_externals required, recreating environments slow, pyproject.toml integration, and matrix env selection.
Fix: Locust Not Working — User Class Errors, Distributed Mode, and Throughput Issues
How to fix Locust errors — no locustfile found, User class not detected, worker connection refused, distributed mode throughput lower than single-node, StopUser exception, FastHttpUser vs HttpUser, and headless CSV reports.
Fix: Selenium Not Working — WebDriver Errors, Element Not Found, and Timeout Issues
How to fix Selenium errors — WebDriverException session not created, NoSuchElementException element not found, StaleElementReferenceException, TimeoutException waiting for element, headless Chrome crashes, and driver version mismatch.
Fix: pytest fixture Not Found – ERRORS or 'fixture not found' in Test Collection
How to fix pytest errors like 'fixture not found', 'ERRORS collecting test', or 'no tests ran' caused by missing conftest.py, wrong scope, or import issues.