Testing
Core server tests live in tappass/tests/. SDK tests live in tappass-sdk/tests/. Same philosophy across both.
Philosophy in one sentence
Section titled “Philosophy in one sentence”Integration tests with real dependencies beat mocked unit tests for anything on the request path.
We were burned once by mocks — the tests passed, prod broke. Rule since then: if your change touches Postgres, an external API, or the pipeline, there must be an integration test that hits a real Postgres or a recorded/fake HTTP layer. See this feedback memory for the incident.
Layers
Section titled “Layers”| Layer | Tool | Runs against | When to write |
|---|---|---|---|
| Unit | pytest | Pure Python, no I/O | Domain logic, helpers, pure functions |
| Integration | pytest + real Postgres + httpx.MockTransport for upstreams | Real DB, fake HTTP | API routes, pipeline, vault, audit |
| Contract | pytest + recorded upstream responses | Real provider APIs (periodic, not on every PR) | Provider client changes |
| E2E | Playwright | Full server + frontend in staging | Release candidate smoke |
Everything in CI runs unit + integration on every PR. Contract and E2E run nightly against staging.
Running tests
Section titled “Running tests”# All (from tappass/ repo root)pytest
# Fast subset (unit only, no DB)pytest -m "not integration"
# Single filepytest tests/unit/pipeline/steps/test_detect_pii.py
# With pdb on failurepytest -x --pdb
# With coveragepytest --cov=src/tappass --cov-report=term-missing
# Show the slowest 20 testspytest --durations=20Integration tests need a real Postgres
Section titled “Integration tests need a real Postgres”The test suite spins one up per session via testcontainers:
@pytest.fixture(scope="session")def pg_container() -> Iterator[PostgresContainer]: with PostgresContainer("postgres:15") as pg: yield pgOn first run this pulls the Postgres image (~200 MB). After that, tests reuse the container for the session.
Per-test isolation happens via transaction rollback, not DB drop:
@pytest.fixtureasync def db(pg_session) -> AsyncIterator[AsyncSession]: async with pg_session.begin() as tx: yield tx await tx.rollback()So a test cannot “leak” state into the next test. Don’t fight this — use factories, not shared state.
Factories (no fixtures-by-name-everywhere)
Section titled “Factories (no fixtures-by-name-everywhere)”We use factory-boy with async support. One factory per model:
class TenantFactory(AsyncSQLAlchemyFactory): class Meta: model = Tenant
name = Faker("company") slug = LazyAttribute(lambda o: slugify(o.name))
class AgentFactory(AsyncSQLAlchemyFactory): class Meta: model = Agent
tenant = SubFactory(TenantFactory) label = Faker("slug")In a test:
async def test_something(db): agent = await AgentFactory.create(session=db) # agent is a real row with real relationshipsMocking external HTTP
Section titled “Mocking external HTTP”Never import unittest.mock. Use httpx.MockTransport:
from httpx import AsyncClient, MockTransport, Response
def _handler(request): if request.url.path == "/v1/chat/completions": return Response(200, json={"choices": [{"message": {"content": "hello"}}]}) return Response(404)
async def test_openai_client_retries(): transport = MockTransport(_handler) async with AsyncClient(transport=transport) as http: client = OpenAIClient(http=http, base_url="https://api.openai.com") reply = await client.chat(...) assert reply == "hello"Lets you assert on requests, simulate 500s, timeouts, partial streaming — without patching.
Markers
Section titled “Markers”@pytest.mark.integration # requires real Postgres@pytest.mark.slow # > 1s; excluded by default in pre-commit@pytest.mark.contract # runs against live upstream; nightly only@pytest.mark.flaky(reruns=3) # for genuinely flaky-by-design (rare)Default pytest runs integration but not contract. Pre-commit runs not slow.
What to test
Section titled “What to test”For pipeline steps (see Pipeline step anatomy):
- Unit: given a fake backend, the step returns the right
Detection[] - Unit: on_detection=redact actually modifies
ctx.user_message - Integration: POST to
/v1/chat/completionswith a triggering payload → event lands inaudit_eventswith the expected detections
For API routes:
- Happy path: 200 with expected shape
- Auth: 401 with no key, 403 with wrong tenant’s key
- Validation: 422 on malformed payload
- Idempotency: two identical requests produce one audit event (if the route is idempotent)
For vault providers:
- Round-trip: set then get returns what we stored
- Encryption: raw Postgres row never contains the plaintext
- Versioning: set twice, old version still retrievable by version id
- Deletion: deleted secret is unreadable, audit event recorded
Snapshots
Section titled “Snapshots”We use syrupy for snapshot tests on API response shapes. When you intentionally change a response:
pytest --snapshot-updateReview the diff carefully before committing. A snapshot diff in a PR without accompanying test assertions is a red flag — it means the contract changed silently.
Flake policy
Section titled “Flake policy”We tolerate zero flakes in main. If a test is flaky:
- Skip it with
@pytest.mark.skip(reason="flaky, see #issue")— open an issue - Do not sprinkle retries; fix the root cause
- Target: flakes fixed or removed within two weeks
Test data: never real
Section titled “Test data: never real”- No real PII in fixtures — use
faker - No real API keys anywhere in
tests/—tp_test_xxxis fine, live keys never - No production database dumps for seeding — use factories
gitleaks scans tests/ too; a committed live key will block CI.
Coverage
Section titled “Coverage”Target: > 85% on new code in a PR (per-file, not project-wide). Existing low-coverage files can slide, but any file you touch must meet the target.
Coverage is reported as a CI comment by the codecov action. A drop of more than 2% blocks merge.
SDK-specific notes
Section titled “SDK-specific notes”tappass-sdk/ tests are trickier because the SDK is what customers use — we can’t break the public surface. Two extra layers:
- API compatibility tests — every method of the
Agentclass has a test that asserts its signature. Adding a required argument fails the suite. - Version pinning in examples —
tappass-examples/pins the SDK version. A PR that changes SDK behaviour must update the example, or CI in the examples repo fails.