Database & migrations

Postgres 15+. Alembic for migrations. Cloud SQL in staging + prod.

See Data model for the schema map.

Alembic workflow

Create a migration

# From the tappass/ repo root
alembic revision --autogenerate -m "add agent_labels index"

Review the generated file. Autogenerate is a starting point, not a source of truth — it misses things like concurrent index creation.

Apply locally

alembic upgrade head

Inspect pending

alembic current
alembic history
alembic heads       # should always be a single head

If heads shows more than one, someone merged two parallel migrations — merge them manually with alembic revision --merge.

Rollback (local only)

alembic downgrade -1

Zero-downtime rules

Every schema change must be deployable while the previous release is still running.

The three-phase pattern

For any non-additive change:

Phase	Code change	Schema change
1. Add	Write new + old; read both	Add new column/table, non-destructive
2. Backfill	Same	Populate new column from old
3. Remove	Write new only; read new	Drop old column

Each phase is a separate release. Shipping all three in one deploy gets you a half-second outage when the new code is running but the old schema row shape is still around, or vice versa.

Examples

Renaming email to contact_email:

Add: new column contact_email, nullable. Code writes to both; reads prefer new, falls back to old.
Backfill: UPDATE users SET contact_email = email WHERE contact_email IS NULL; (batched).
Remove: stop writing to email. Drop column.

Changing a JSON column's shape:

Add: new column with new shape, or new top-level key in the JSON. Writers produce both; readers tolerate both.
Backfill: rewrite existing rows.
Remove: stop tolerating the old shape.

Dropping a deprecated table:

Stop writing. Deploy. Let traffic drain for a week.
Stop reading. Deploy. Wait a day.
Drop. Last.

Index creation

Use CREATE INDEX CONCURRENTLY for any index on a large table. Alembic's op.create_index takes a lock — we want to avoid that in prod.

def upgrade() -> None:
    op.execute(
        "CREATE INDEX CONCURRENTLY idx_audit_events_agent_ts "
        "ON audit_events (agent_id, ts DESC)"
    )

Note: CONCURRENTLY can't run inside a transaction. Alembic wraps migrations in a transaction by default; opt out:

from alembic import op
from sqlalchemy import engine_from_config

revision = "abc123"
# ...

def upgrade() -> None:
    with op.get_context().autocommit_block():
        op.execute("CREATE INDEX CONCURRENTLY ...")

NOT NULL on existing column

Don't do it in one migration on a big table. It locks.

Add column nullable, backfill, then
ALTER TABLE ... ALTER COLUMN ... SET NOT NULL — only after every row has a value

Plan it in two releases.

Naming

Thing	Convention	Example
Table	`snake_case`, plural	`audit_events`
Column	`snake_case`	`tenant_id`, `created_at`
Primary key	`id`, UUIDv7	—
Foreign key	`<referenced_table_singular>_id`	`tenant_id`, `agent_id`
Timestamps	`created_at`, `updated_at`	both `timestamp with time zone`
Index	`idx_<table>_<cols>`	`idx_audit_events_agent_ts`
Unique index	`uq_<table>_<cols>`	`uq_api_keys_key_hash`

All timestamps in UTC. Never timestamp without time zone.

Connection pooling

Dev: 5 connections per process
Staging / prod: 20 per Cloud Run instance
Max Cloud SQL connections: 100 (staging), 400 (prod)
Cloud Run concurrency cap × instances = must stay under the DB max

If you see FATAL: too many connections, check the pool size in config/database.py and the instance count in Cloud Run. We don't use pgbouncer; Cloud SQL Proxy is enough at our scale.

Read replica

Prod has one streaming replica for dashboards and analytics queries. Cloud SQL Proxy exposes it with a -replica suffix.

Rules:

Never write to the replica. It can't accept writes anyway, but don't build code that tries.
Prefer the replica for dashboard queries where sub-second staleness is fine.
Don't use the replica for audit integrity checks — they need strictly-serializable reads.

Testing migrations

Alembic migrations run against a real Postgres in CI via testcontainers. The test suite:

Creates a fresh DB
Runs alembic upgrade head
Seeds factories
Runs the integration tests

This catches migration regressions before they hit staging.

Generating ERDs

# Requires graphviz
uv run python -m tappass.tools.erd --output docs/erd.svg

Drop the SVG into a PR description when the change affects the schema meaningfully. Don't commit it — it drifts.