Skip to content

Testing

Core server tests live in tappass/tests/. SDK tests live in tappass-sdk/tests/. Same philosophy across both.

Integration tests with real dependencies beat mocked unit tests for anything on the request path.

We were burned once by mocks — the tests passed, prod broke. Rule since then: if your change touches Postgres, an external API, or the pipeline, there must be an integration test that hits a real Postgres or a recorded/fake HTTP layer. See this feedback memory for the incident.

LayerToolRuns againstWhen to write
UnitpytestPure Python, no I/ODomain logic, helpers, pure functions
Integrationpytest + real Postgres + httpx.MockTransport for upstreamsReal DB, fake HTTPAPI routes, pipeline, vault, audit
Contractpytest + recorded upstream responsesReal provider APIs (periodic, not on every PR)Provider client changes
E2EPlaywrightFull server + frontend in stagingRelease candidate smoke

Everything in CI runs unit + integration on every PR. Contract and E2E run nightly against staging.

Terminal window
# All (from tappass/ repo root)
pytest
# Fast subset (unit only, no DB)
pytest -m "not integration"
# Single file
pytest tests/unit/pipeline/steps/test_detect_pii.py
# With pdb on failure
pytest -x --pdb
# With coverage
pytest --cov=src/tappass --cov-report=term-missing
# Show the slowest 20 tests
pytest --durations=20

The test suite spins one up per session via testcontainers:

tests/conftest.py
@pytest.fixture(scope="session")
def pg_container() -> Iterator[PostgresContainer]:
with PostgresContainer("postgres:15") as pg:
yield pg

On first run this pulls the Postgres image (~200 MB). After that, tests reuse the container for the session.

Per-test isolation happens via transaction rollback, not DB drop:

@pytest.fixture
async def db(pg_session) -> AsyncIterator[AsyncSession]:
async with pg_session.begin() as tx:
yield tx
await tx.rollback()

So a test cannot “leak” state into the next test. Don’t fight this — use factories, not shared state.

Factories (no fixtures-by-name-everywhere)

Section titled “Factories (no fixtures-by-name-everywhere)”

We use factory-boy with async support. One factory per model:

tests/factories.py
class TenantFactory(AsyncSQLAlchemyFactory):
class Meta:
model = Tenant
name = Faker("company")
slug = LazyAttribute(lambda o: slugify(o.name))
class AgentFactory(AsyncSQLAlchemyFactory):
class Meta:
model = Agent
tenant = SubFactory(TenantFactory)
label = Faker("slug")

In a test:

async def test_something(db):
agent = await AgentFactory.create(session=db)
# agent is a real row with real relationships

Never import unittest.mock. Use httpx.MockTransport:

from httpx import AsyncClient, MockTransport, Response
def _handler(request):
if request.url.path == "/v1/chat/completions":
return Response(200, json={"choices": [{"message": {"content": "hello"}}]})
return Response(404)
async def test_openai_client_retries():
transport = MockTransport(_handler)
async with AsyncClient(transport=transport) as http:
client = OpenAIClient(http=http, base_url="https://api.openai.com")
reply = await client.chat(...)
assert reply == "hello"

Lets you assert on requests, simulate 500s, timeouts, partial streaming — without patching.

@pytest.mark.integration # requires real Postgres
@pytest.mark.slow # > 1s; excluded by default in pre-commit
@pytest.mark.contract # runs against live upstream; nightly only
@pytest.mark.flaky(reruns=3) # for genuinely flaky-by-design (rare)

Default pytest runs integration but not contract. Pre-commit runs not slow.

For pipeline steps (see Pipeline step anatomy):

  1. Unit: given a fake backend, the step returns the right Detection[]
  2. Unit: on_detection=redact actually modifies ctx.user_message
  3. Integration: POST to /v1/chat/completions with a triggering payload → event lands in audit_events with the expected detections

For API routes:

  1. Happy path: 200 with expected shape
  2. Auth: 401 with no key, 403 with wrong tenant’s key
  3. Validation: 422 on malformed payload
  4. Idempotency: two identical requests produce one audit event (if the route is idempotent)

For vault providers:

  1. Round-trip: set then get returns what we stored
  2. Encryption: raw Postgres row never contains the plaintext
  3. Versioning: set twice, old version still retrievable by version id
  4. Deletion: deleted secret is unreadable, audit event recorded

We use syrupy for snapshot tests on API response shapes. When you intentionally change a response:

Terminal window
pytest --snapshot-update

Review the diff carefully before committing. A snapshot diff in a PR without accompanying test assertions is a red flag — it means the contract changed silently.

We tolerate zero flakes in main. If a test is flaky:

  1. Skip it with @pytest.mark.skip(reason="flaky, see #issue") — open an issue
  2. Do not sprinkle retries; fix the root cause
  3. Target: flakes fixed or removed within two weeks
  • No real PII in fixtures — use faker
  • No real API keys anywhere in tests/tp_test_xxx is fine, live keys never
  • No production database dumps for seeding — use factories

gitleaks scans tests/ too; a committed live key will block CI.

Target: > 85% on new code in a PR (per-file, not project-wide). Existing low-coverage files can slide, but any file you touch must meet the target.

Coverage is reported as a CI comment by the codecov action. A drop of more than 2% blocks merge.

tappass-sdk/ tests are trickier because the SDK is what customers use — we can’t break the public surface. Two extra layers:

  1. API compatibility tests — every method of the Agent class has a test that asserts its signature. Adding a required argument fails the suite.
  2. Version pinning in examplestappass-examples/ pins the SDK version. A PR that changes SDK behaviour must update the example, or CI in the examples repo fails.