Live policy push channel

What it does: Notifies running agents within seconds when policy changes — signed, monotonic, replay-resistant.

1. Vision context

This is the live in "live policy → every running agent." Without this channel, policy changes only take effect on the next call (correct, but invisible — no demo moment). With it, the operator changes a Rego rule on stage and watches the agent's tool list shrink in real time.

The channel is also the substrate for behavior-drift-monitor — same delivery infrastructure carries metrics back to TapPass.

See architecture §11 for full sync flow and §10 for the privsep contract that makes the channel safe (unidirectional, signed, no upward channel).

2. Functional specification

Push payload structure documented in architecture §11.2. Validation rules in §11.3.

On policy change (any cascade level):

Builder re-derives stale keyrings.
For each, sign payload with TapPass's Ed25519 key.
Push over the per-sandbox WS to tappass-host.
On host ack, mark applied_at. On no-ack within timeout, retry.

On sync miss (host offline / network):

Sandbox's tokens fail closed at TTL expiry; agent stops being able to act.
On reconnect, host requests latest by (sandbox_id, last_policy_version); channel replays from that version forward.

3. Technical design

Lives at tappass/sync/. Reuses TapPass's existing WS infrastructure; adds Ed25519 signing for push payloads.

4. Definition of done

All acceptance_criteria pass.
Latency benchmarks met.
Replay test: host disconnects 1 hour, reconnects, replays correctly.
Anti-replay test: malicious actor with old payload cannot apply older keyring.

5. Coordination notes

With policy-to-sandbox-config-builder: builder triggers pushes; channel owns delivery. Builder never pushes directly.

With host-runtime-cli: host is the only valid receiver. Host's mTLS cert is what gates connection.

With behavior-drift-monitor: monitor reads audit; channel ensures audit gets to TapPass.

Open questions:

(Q) WebSocket vs. SSE vs. long-poll? Lean: WS for bidi (host can ack, channel can push) — avoids the operational complexity of split protocols.
(Q) Replay window — keep last N versions per sandbox indefinitely, or TTL? Lean: TTL of 24 hours; longer disconnects require re-enrollment.

6. Out of scope

Policy authoring (intent-to-policy + cascade).
Keyring derivation (policy-to-sandbox-config-builder).
Layer application (q09 components).