Database & migrations
Postgres 15+. Alembic for migrations. Cloud SQL in staging + prod.
See Data model for the schema map.
Alembic workflow
Section titled “Alembic workflow”Create a migration
Section titled “Create a migration”# From the tappass/ repo rootalembic revision --autogenerate -m "add agent_labels index"Review the generated file. Autogenerate is a starting point, not a source of truth — it misses things like concurrent index creation.
Apply locally
Section titled “Apply locally”alembic upgrade headInspect pending
Section titled “Inspect pending”alembic currentalembic historyalembic heads # should always be a single headIf heads shows more than one, someone merged two parallel migrations — merge them manually with alembic revision --merge.
Rollback (local only)
Section titled “Rollback (local only)”alembic downgrade -1Zero-downtime rules
Section titled “Zero-downtime rules”Every schema change must be deployable while the previous release is still running.
The three-phase pattern
Section titled “The three-phase pattern”For any non-additive change:
| Phase | Code change | Schema change |
|---|---|---|
| 1. Add | Write new + old; read both | Add new column/table, non-destructive |
| 2. Backfill | Same | Populate new column from old |
| 3. Remove | Write new only; read new | Drop old column |
Each phase is a separate release. Shipping all three in one deploy gets you a half-second outage when the new code is running but the old schema row shape is still around, or vice versa.
Examples
Section titled “Examples”Renaming email to contact_email:
- Add: new column
contact_email, nullable. Code writes to both; reads prefer new, falls back to old. - Backfill:
UPDATE users SET contact_email = email WHERE contact_email IS NULL;(batched). - Remove: stop writing to
email. Drop column.
Changing a JSON column’s shape:
- Add: new column with new shape, or new top-level key in the JSON. Writers produce both; readers tolerate both.
- Backfill: rewrite existing rows.
- Remove: stop tolerating the old shape.
Dropping a deprecated table:
- Stop writing. Deploy. Let traffic drain for a week.
- Stop reading. Deploy. Wait a day.
- Drop. Last.
Index creation
Section titled “Index creation”Use CREATE INDEX CONCURRENTLY for any index on a large table. Alembic’s op.create_index takes a lock — we want to avoid that in prod.
def upgrade() -> None: op.execute( "CREATE INDEX CONCURRENTLY idx_audit_events_agent_ts " "ON audit_events (agent_id, ts DESC)" )Note: CONCURRENTLY can’t run inside a transaction. Alembic wraps migrations in a transaction by default; opt out:
from alembic import opfrom sqlalchemy import engine_from_config
revision = "abc123"# ...
def upgrade() -> None: with op.get_context().autocommit_block(): op.execute("CREATE INDEX CONCURRENTLY ...")NOT NULL on existing column
Section titled “NOT NULL on existing column”Don’t do it in one migration on a big table. It locks.
- Add column nullable, backfill, then
ALTER TABLE ... ALTER COLUMN ... SET NOT NULL— only after every row has a value
Plan it in two releases.
Naming
Section titled “Naming”| Thing | Convention | Example |
|---|---|---|
| Table | snake_case, plural | audit_events |
| Column | snake_case | tenant_id, created_at |
| Primary key | id, UUIDv7 | — |
| Foreign key | <referenced_table_singular>_id | tenant_id, agent_id |
| Timestamps | created_at, updated_at | both timestamp with time zone |
| Index | idx_<table>_<cols> | idx_audit_events_agent_ts |
| Unique index | uq_<table>_<cols> | uq_api_keys_key_hash |
All timestamps in UTC. Never timestamp without time zone.
Connection pooling
Section titled “Connection pooling”- Dev: 5 connections per process
- Staging / prod: 20 per Cloud Run instance
- Max Cloud SQL connections: 100 (staging), 400 (prod)
- Cloud Run concurrency cap × instances = must stay under the DB max
If you see FATAL: too many connections, check the pool size in config/database.py and the instance count in Cloud Run. We don’t use pgbouncer; Cloud SQL Proxy is enough at our scale.
Read replica
Section titled “Read replica”Prod has one streaming replica for dashboards and analytics queries. Cloud SQL Proxy exposes it with a -replica suffix.
Rules:
- Never write to the replica. It can’t accept writes anyway, but don’t build code that tries.
- Prefer the replica for dashboard queries where sub-second staleness is fine.
- Don’t use the replica for audit integrity checks — they need strictly-serializable reads.
Testing migrations
Section titled “Testing migrations”Alembic migrations run against a real Postgres in CI via testcontainers. The test suite:
- Creates a fresh DB
- Runs
alembic upgrade head - Seeds factories
- Runs the integration tests
This catches migration regressions before they hit staging.
Generating ERDs
Section titled “Generating ERDs”# Requires graphvizuv run python -m tappass.tools.erd --output docs/erd.svgDrop the SVG into a PR description when the change affects the schema meaningfully. Don’t commit it — it drifts.