Skip to content

Database & migrations

Postgres 15+. Alembic for migrations. Cloud SQL in staging + prod.

See Data model for the schema map.

Terminal window
# From the tappass/ repo root
alembic revision --autogenerate -m "add agent_labels index"

Review the generated file. Autogenerate is a starting point, not a source of truth — it misses things like concurrent index creation.

Terminal window
alembic upgrade head
Terminal window
alembic current
alembic history
alembic heads # should always be a single head

If heads shows more than one, someone merged two parallel migrations — merge them manually with alembic revision --merge.

Terminal window
alembic downgrade -1

Every schema change must be deployable while the previous release is still running.

For any non-additive change:

PhaseCode changeSchema change
1. AddWrite new + old; read bothAdd new column/table, non-destructive
2. BackfillSamePopulate new column from old
3. RemoveWrite new only; read newDrop old column

Each phase is a separate release. Shipping all three in one deploy gets you a half-second outage when the new code is running but the old schema row shape is still around, or vice versa.

Renaming email to contact_email:

  1. Add: new column contact_email, nullable. Code writes to both; reads prefer new, falls back to old.
  2. Backfill: UPDATE users SET contact_email = email WHERE contact_email IS NULL; (batched).
  3. Remove: stop writing to email. Drop column.

Changing a JSON column’s shape:

  1. Add: new column with new shape, or new top-level key in the JSON. Writers produce both; readers tolerate both.
  2. Backfill: rewrite existing rows.
  3. Remove: stop tolerating the old shape.

Dropping a deprecated table:

  1. Stop writing. Deploy. Let traffic drain for a week.
  2. Stop reading. Deploy. Wait a day.
  3. Drop. Last.

Use CREATE INDEX CONCURRENTLY for any index on a large table. Alembic’s op.create_index takes a lock — we want to avoid that in prod.

def upgrade() -> None:
op.execute(
"CREATE INDEX CONCURRENTLY idx_audit_events_agent_ts "
"ON audit_events (agent_id, ts DESC)"
)

Note: CONCURRENTLY can’t run inside a transaction. Alembic wraps migrations in a transaction by default; opt out:

from alembic import op
from sqlalchemy import engine_from_config
revision = "abc123"
# ...
def upgrade() -> None:
with op.get_context().autocommit_block():
op.execute("CREATE INDEX CONCURRENTLY ...")

Don’t do it in one migration on a big table. It locks.

  1. Add column nullable, backfill, then
  2. ALTER TABLE ... ALTER COLUMN ... SET NOT NULL — only after every row has a value

Plan it in two releases.

ThingConventionExample
Tablesnake_case, pluralaudit_events
Columnsnake_casetenant_id, created_at
Primary keyid, UUIDv7
Foreign key<referenced_table_singular>_idtenant_id, agent_id
Timestampscreated_at, updated_atboth timestamp with time zone
Indexidx_<table>_<cols>idx_audit_events_agent_ts
Unique indexuq_<table>_<cols>uq_api_keys_key_hash

All timestamps in UTC. Never timestamp without time zone.

  • Dev: 5 connections per process
  • Staging / prod: 20 per Cloud Run instance
  • Max Cloud SQL connections: 100 (staging), 400 (prod)
  • Cloud Run concurrency cap × instances = must stay under the DB max

If you see FATAL: too many connections, check the pool size in config/database.py and the instance count in Cloud Run. We don’t use pgbouncer; Cloud SQL Proxy is enough at our scale.

Prod has one streaming replica for dashboards and analytics queries. Cloud SQL Proxy exposes it with a -replica suffix.

Rules:

  • Never write to the replica. It can’t accept writes anyway, but don’t build code that tries.
  • Prefer the replica for dashboard queries where sub-second staleness is fine.
  • Don’t use the replica for audit integrity checks — they need strictly-serializable reads.

Alembic migrations run against a real Postgres in CI via testcontainers. The test suite:

  1. Creates a fresh DB
  2. Runs alembic upgrade head
  3. Seeds factories
  4. Runs the integration tests

This catches migration regressions before they hit staging.

Terminal window
# Requires graphviz
uv run python -m tappass.tools.erd --output docs/erd.svg

Drop the SVG into a PR description when the change affects the schema meaningfully. Don’t commit it — it drifts.