Runtime Tool Discovery & Catalog Promotion — Concept

Status: Concept / design draft. Not yet a feature spec. Date: 2026-05-06 Origin: Brainstorm while seeding a staging demo. Tools used in /v1/chat/completions got blocked by verify_tool_governance even after the app called /tools/govern, because there was no governed backend to dispatch to. We pulled on the thread and realised TapPass is conflating two different questions into one pipeline step. This concept untangles them.

1. The thesis

TapPass today asks one question of every tool call: "was this specific tool_call_id governed via /tools/govern or /tools/execute?" That's a per-call ceremony — useful for arg-level scrutiny but blunt as a way to decide whether a tool is allowed to exist in this org at all.

The proposal is to split that single check into two questions, and add a runtime loop that lets the catalog grow from observation:

The Catalog answers "does this tool name exist in our org?". The Per-Call Governance check answers "is this specific invocation OK to run?". They run in series. Both must pass. The Catalog is the organizational allow-list; per-call governance is the request-scoped scrutiny.

Default-deny on first sight. Schema captured. Operator promotes from a review queue. Subsequent calls pass without re-prompting.

This is the Auth0 split — application registration vs. per-request scope check — applied to tool calls. It also closes a loop the codebase already half-implements: CatalogSource.DISCOVERED exists, runtime tool capture exists, but nothing connects the two to the catalog write path (because the write path itself isn't shipped).

2. Today's behaviour and where it breaks

Layer	What it does today	Gap
`/v1/chat/completions`	Captures every `tools=[...]` schema into `registry/tools.py` (in-memory, per org)	Capture is ephemeral; never promoted to the catalog table
`/tools/govern`	Runs the pipeline on a single tool call, marks `tool_call_id` as governed	Doesn't ask whether the tool should exist — just whether this call is acceptable
`verify_tool_governance`	Blocks the next chat completion if any prior tool result has an ungoverned `tool_call_id`	Conflates "unknown tool" with "ungoverned call". An app that legitimately governs every call still gets blocked if the tool isn't backend-registered
`/api/catalog/tools` (GET)	Lists the tool catalog with provenance	No write path. `CatalogSource.DISCOVERED` is defined but no row ever lands with that source
`/tools/integrity/approve`	CISO-style approval when a known tool's schema changes	Triggered on drift, not on first sight. The "first sight" path doesn't exist as a workflow

The brokenness shows up the moment you try to do an honest demo: ad-hoc tools get blocked at call_tool (no executor) or at verify_tool_governance (no per-call ceremony match). Neither block carries the right product message — neither says "this tool is not in your catalog yet, here's how to add it".

3. The split

3.1 Catalog membership — organizational allow-list

A tool is in the catalog (source ∈ {builtin, tenant, discovered, mcp_advertised, sdk_declared, imported}, status=approved) iff an operator (or auto-approval rule, see §5) has decided it can exist in this org.

Scope: org-level (with optional per-project narrowing — see §7)
Identity: by tool_name (org-scoped). Two tools with the same name in different orgs are independent rows.
Schema is fingerprinted — drift is the existing rug-pull detection's responsibility; new schemas for an existing approved tool re-enter review.
Failure mode: unknown tool → request rejected with a catalog-aware error message: "Tool foo is not in your catalog. Pending review at /app/catalog/discovered."

3.2 Per-call governance — request-scoped scrutiny

Even when a tool is in the catalog, every call to it goes through /tools/govern (Track B) or /tools/execute (Track A) for arg-level checks: PII in arguments, constraint violations, capability-token scope, rate limits, etc.

Scope: per tool_call_id
Failure mode: specific call rejected with a step-level reason ("argument account matches PII pattern", "tool not in capability_token scope"), regardless of whether the tool type is allowed in the org.

3.3 How they compose in `verify_tool_governance`

on each tool result entering /v1/chat/completions:
    if tool_name not in catalog(org_id, status=approved):
        BLOCK with {step: catalog_check, reason: "tool not in org catalog", tool_name}
        side_effect: ensure a `discovered` row exists for review
    elif tool_call_id not in governed_set:
        BLOCK with {step: verify_tool_governance, reason: "no /tools/govern ceremony for this call"}
    else:
        CONTINUE

Two clear, separable rejection messages instead of today's conflated "1 ungoverned" string.

4. The discovery loop

Default behaviour for any tool name not yet in the catalog:

Observe. First time a tool name is seen in a /v1/chat/completions request (already captured by registry/tools.py).
Promote to pending. Background task writes a row: (org_id, tool_name, schema, source=discovered, status=pending, first_seen_at, last_seen_at, observation_count, observed_by_agents=[...]).
Block at runtime. verify_tool_governance's catalog check returns the block reason that includes the discovered row's id, so the dashboard can deep-link.
Surface in review queue. New page: /app/catalog/discovered. Operator sees agent + tool name + inferred schema + sample arguments + how many times it's been attempted.
Decide.
- Approve → row flips to status=approved. Next call passes the catalog check.
- Refine → operator edits the schema (narrow types, add constraints), then approves. The original schema fingerprint is captured for drift detection.
- Deny → row flips to status=denied. Future calls get a faster-path block with a clearer message.
- Defer → row stays pending; no UI nudge until next observation.

The loop is intentionally human-in-the-loop by default. Auto-approval is opt-in (§5).

5. Auto-approval — opt-in heuristics

The default for discovered tools is block until reviewed. But a senior buyer will hate that for high-trust sources. Three opt-in lanes:

Lane	Trigger	Default?
`mcp_advertised` from a registered MCP server	Tool was advertised via the org's registered MCP server's tool schema	Auto-approve (the MCP server is the trust anchor)
`sdk_declared` matching agent's `declared_capabilities`	Agent registered with `declared_capabilities=["lookup_refund"]`; observed tool name matches	Off; flag for one-click batch approve
`discovered` from agent in shadow / canary mode	Agent has `mode=shadow` set	Off; auto-promote to pending but don't block (already what shadow mode means for everything else)

These are knobs, not defaults. The default story remains: see → block → review → approve.

6. What "approved" lets you skip

Approval grants exactly one thing: the catalog check passes. Per-call governance still applies. Specifically:

✅ The tool's existence is no longer a block reason
❌ /tools/govern is still expected before feeding a tool result back to the LLM
❌ Argument-level PII / secret / shell-bleed detections still run on every call
❌ Capability-token scope still gates which tools each chat completion can produce

This is important: approving a tool ≠ trusting its arguments. The two scrutiny layers stay independent.

7. Open questions

Approval scope: org-wide or per-project? The current Project & Teams concept project-scopes most resources. Should a tool approved in Project A also be approved in Project B of the same org? Lean: org-wide approval, per-project enable/disable. Approving once is a one-time decision; who can use it is a separate authorization knob handled by existing tool-permissions.
What happens to in-flight calls during review? Today: blocked. Alternative: queue and replay on approval (UX-friendly, but couples the agent's runtime to the operator's review SLA — bad). Lean: keep blocking, but the dashboard shows agents waiting on approval so the operator knows there's pressure.
Granularity: per (tool_name, arg_schema_hash) or just tool_name? If the agent extends the tool's schema later (new optional field), is that a new tool or a drift event on the same tool? Lean: same tool, schema change goes through existing rug-pull integrity flow. Don't fragment the catalog.
TTL on pending entries? Pending rows that no one approves and no agent retries probably want to expire. Lean: 30-day TTL on status=pending with no observation in that window. Doesn't apply to denied or approved.
Catalog-check cost on the hot path. Reading the catalog on every chat completion is not free. Lean: in-memory cache per (org_id, version), invalidated on catalog write. Same pattern as the OPA bridge already uses.
How does this interact with Track A (/tools/execute)? Track A executes the tool through TapPass. If the tool is in the catalog but TapPass has no executor binding, what happens? Lean: catalog membership ≠ executor binding. A tool can be approved (LLM allowed to emit it, app allowed to call govern) without TapPass having a backend for it. Track A only works for tools with a registered executor, which is a separate row in a separate table.
Audit story. Every approve / deny / refine should land in the audit trail with the reviewer's identity, the schema as approved, and a link from the original blocking event. Reuses the existing audit infra.

8. What's missing in code

Crisp work list, mapping the design back to the codebase:

#	Piece	New / Edit	Where	Notes
1	Promote runtime captures to `discovered` rows	New	`policy/catalog/tools_promoter.py`	Background task; reads `registry/tools._tools` and upserts into the catalog table
2	Catalog write endpoints	New	`api/routes/catalog/tools_admin.py`	`POST /api/catalog/tools`, `PATCH /api/catalog/tools/{id}` for status transitions
3	Status field on `CatalogTool`	Edit	`models/catalog_tool.py`	Add `status: Literal["pending", "approved", "denied"] = "pending"`
4	`verify_tool_governance` reads catalog	Edit	`pipeline/steps/verify_tool_governance.py`	Two-layer check; catalog membership first, per-call governance second
5	Discovery review UI	New	frontend page `/app/catalog/discovered`	List pending rows, schema diff, approve / deny / refine actions
6	Audit events	Edit	`audit` event types	`tool_discovered`, `tool_approved`, `tool_denied`, `tool_refined`
7	Auto-approve lanes (§5)	New	`policy/catalog/tools_auto_approve.py`	Opt-in heuristics; off by default
8	Catalog cache invalidation	Edit	`policy/catalog/tools_repo.py`	Bump version on every write so consumers know to refetch

None of this is huge in isolation. The shape is mostly assembling pieces that already exist (capture, integrity, GET catalog, audit trail) with one new admin endpoint and one new pipeline step branch.

9. What this unlocks for the demo

Beyond the obvious ("the staging demo would actually work end-to-end"), three sales motions land cleanly on this:

"TapPass discovers your shadow IT." First demo opens with a list of tools the agents have tried to call but haven't been approved yet — visible shadow tool use, the buyer's headline fear.
"You don't write a tool catalog upfront." Big objection to most governance products: "I don't have an inventory." TapPass fills the inventory by watching.
"Default-deny is non-negotiable but not painful." The block reason is actionable (review queue) instead of just a 403, which is what every CISO asks for once they've lived with rule-based deny lists.

10. Next step

If this concept holds, the spec is the natural next deliverable: full feature doc with API contracts, DB migration, OPA changes, UI wireframes, rollout plan. Per the "concept first" convention, that's a separate document and only written when you ask.

Runtime Tool Discovery & Catalog Promotion — Concept

Runtime Tool Discovery & Catalog Promotion — Concept

1. The thesis

2. Today's behaviour and where it breaks

3. The split

3.1 Catalog membership — organizational allow-list

3.2 Per-call governance — request-scoped scrutiny

3.3 How they compose in verify_tool_governance

4. The discovery loop

5. Auto-approval — opt-in heuristics

6. What "approved" lets you skip

7. Open questions

8. What's missing in code

9. What this unlocks for the demo

10. Next step

3.3 How they compose in `verify_tool_governance`