Runtime Tool Discovery & Catalog Promotion — Concept
Runtime Tool Discovery & Catalog Promotion — Concept
Section titled “Runtime Tool Discovery & Catalog Promotion — Concept”Status: Concept / design draft. Not yet a feature spec.
Date: 2026-05-06
Origin: Brainstorm while seeding a staging demo. Tools used in /v1/chat/completions got blocked by verify_tool_governance even after the app called /tools/govern, because there was no governed backend to dispatch to. We pulled on the thread and realised TapPass is conflating two different questions into one pipeline step. This concept untangles them.
1. The thesis
Section titled “1. The thesis”TapPass today asks one question of every tool call: "was this specific tool_call_id governed via /tools/govern or /tools/execute?" That's a per-call ceremony — useful for arg-level scrutiny but blunt as a way to decide whether a tool is allowed to exist in this org at all.
The proposal is to split that single check into two questions, and add a runtime loop that lets the catalog grow from observation:
The Catalog answers "does this tool name exist in our org?". The Per-Call Governance check answers "is this specific invocation OK to run?". They run in series. Both must pass. The Catalog is the organizational allow-list; per-call governance is the request-scoped scrutiny.
Default-deny on first sight. Schema captured. Operator promotes from a review queue. Subsequent calls pass without re-prompting.
This is the Auth0 split — application registration vs. per-request scope check — applied to tool calls. It also closes a loop the codebase already half-implements: CatalogSource.DISCOVERED exists, runtime tool capture exists, but nothing connects the two to the catalog write path (because the write path itself isn't shipped).
2. Today's behaviour and where it breaks
Section titled “2. Today's behaviour and where it breaks”| Layer | What it does today | Gap |
|---|---|---|
/v1/chat/completions | Captures every tools=[...] schema into registry/tools.py (in-memory, per org) | Capture is ephemeral; never promoted to the catalog table |
/tools/govern | Runs the pipeline on a single tool call, marks tool_call_id as governed | Doesn't ask whether the tool should exist — just whether this call is acceptable |
verify_tool_governance | Blocks the next chat completion if any prior tool result has an ungoverned tool_call_id | Conflates "unknown tool" with "ungoverned call". An app that legitimately governs every call still gets blocked if the tool isn't backend-registered |
/api/catalog/tools (GET) | Lists the tool catalog with provenance | No write path. CatalogSource.DISCOVERED is defined but no row ever lands with that source |
/tools/integrity/approve | CISO-style approval when a known tool's schema changes | Triggered on drift, not on first sight. The "first sight" path doesn't exist as a workflow |
The brokenness shows up the moment you try to do an honest demo: ad-hoc tools get blocked at call_tool (no executor) or at verify_tool_governance (no per-call ceremony match). Neither block carries the right product message — neither says "this tool is not in your catalog yet, here's how to add it".
3. The split
Section titled “3. The split”3.1 Catalog membership — organizational allow-list
Section titled “3.1 Catalog membership — organizational allow-list”A tool is in the catalog (source ∈ {builtin, tenant, discovered, mcp_advertised, sdk_declared, imported}, status=approved) iff an operator (or auto-approval rule, see §5) has decided it can exist in this org.
- Scope: org-level (with optional per-project narrowing — see §7)
- Identity: by
tool_name(org-scoped). Two tools with the same name in different orgs are independent rows. - Schema is fingerprinted — drift is the existing rug-pull detection's responsibility; new schemas for an existing approved tool re-enter review.
- Failure mode: unknown tool → request rejected with a catalog-aware error message: "Tool
foois not in your catalog. Pending review at/app/catalog/discovered."
3.2 Per-call governance — request-scoped scrutiny
Section titled “3.2 Per-call governance — request-scoped scrutiny”Even when a tool is in the catalog, every call to it goes through /tools/govern (Track B) or /tools/execute (Track A) for arg-level checks: PII in arguments, constraint violations, capability-token scope, rate limits, etc.
- Scope: per
tool_call_id - Failure mode: specific call rejected with a step-level reason ("argument
accountmatches PII pattern", "tool not in capability_token scope"), regardless of whether the tool type is allowed in the org.
3.3 How they compose in verify_tool_governance
Section titled “3.3 How they compose in verify_tool_governance”on each tool result entering /v1/chat/completions: if tool_name not in catalog(org_id, status=approved): BLOCK with {step: catalog_check, reason: "tool not in org catalog", tool_name} side_effect: ensure a `discovered` row exists for review elif tool_call_id not in governed_set: BLOCK with {step: verify_tool_governance, reason: "no /tools/govern ceremony for this call"} else: CONTINUETwo clear, separable rejection messages instead of today's conflated "1 ungoverned" string.
4. The discovery loop
Section titled “4. The discovery loop”Default behaviour for any tool name not yet in the catalog:
- Observe. First time a tool name is seen in a
/v1/chat/completionsrequest (already captured byregistry/tools.py). - Promote to pending. Background task writes a row:
(org_id, tool_name, schema, source=discovered, status=pending, first_seen_at, last_seen_at, observation_count, observed_by_agents=[...]). - Block at runtime.
verify_tool_governance's catalog check returns the block reason that includes the discovered row's id, so the dashboard can deep-link. - Surface in review queue. New page:
/app/catalog/discovered. Operator sees agent + tool name + inferred schema + sample arguments + how many times it's been attempted. - Decide.
- Approve → row flips to
status=approved. Next call passes the catalog check. - Refine → operator edits the schema (narrow types, add constraints), then approves. The original schema fingerprint is captured for drift detection.
- Deny → row flips to
status=denied. Future calls get a faster-path block with a clearer message. - Defer → row stays pending; no UI nudge until next observation.
- Approve → row flips to
The loop is intentionally human-in-the-loop by default. Auto-approval is opt-in (§5).
5. Auto-approval — opt-in heuristics
Section titled “5. Auto-approval — opt-in heuristics”The default for discovered tools is block until reviewed. But a senior buyer will hate that for high-trust sources. Three opt-in lanes:
| Lane | Trigger | Default? |
|---|---|---|
mcp_advertised from a registered MCP server | Tool was advertised via the org's registered MCP server's tool schema | Auto-approve (the MCP server is the trust anchor) |
sdk_declared matching agent's declared_capabilities | Agent registered with declared_capabilities=["lookup_refund"]; observed tool name matches | Off; flag for one-click batch approve |
discovered from agent in shadow / canary mode | Agent has mode=shadow set | Off; auto-promote to pending but don't block (already what shadow mode means for everything else) |
These are knobs, not defaults. The default story remains: see → block → review → approve.
6. What "approved" lets you skip
Section titled “6. What "approved" lets you skip”Approval grants exactly one thing: the catalog check passes. Per-call governance still applies. Specifically:
- ✅ The tool's existence is no longer a block reason
- ❌
/tools/governis still expected before feeding a tool result back to the LLM - ❌ Argument-level PII / secret / shell-bleed detections still run on every call
- ❌ Capability-token scope still gates which tools each chat completion can produce
This is important: approving a tool ≠ trusting its arguments. The two scrutiny layers stay independent.
7. Open questions
Section titled “7. Open questions”-
Approval scope: org-wide or per-project? The current Project & Teams concept project-scopes most resources. Should a tool approved in Project A also be approved in Project B of the same org? Lean: org-wide approval, per-project enable/disable. Approving once is a one-time decision; who can use it is a separate authorization knob handled by existing tool-permissions.
-
What happens to in-flight calls during review? Today: blocked. Alternative: queue and replay on approval (UX-friendly, but couples the agent's runtime to the operator's review SLA — bad). Lean: keep blocking, but the dashboard shows agents waiting on approval so the operator knows there's pressure.
-
Granularity: per
(tool_name, arg_schema_hash)or justtool_name? If the agent extends the tool's schema later (new optional field), is that a new tool or a drift event on the same tool? Lean: same tool, schema change goes through existing rug-pull integrity flow. Don't fragment the catalog. -
TTL on pending entries? Pending rows that no one approves and no agent retries probably want to expire. Lean: 30-day TTL on
status=pendingwith no observation in that window. Doesn't apply to denied or approved. -
Catalog-check cost on the hot path. Reading the catalog on every chat completion is not free. Lean: in-memory cache per (org_id, version), invalidated on catalog write. Same pattern as the OPA bridge already uses.
-
How does this interact with Track A (
/tools/execute)? Track A executes the tool through TapPass. If the tool is in the catalog but TapPass has no executor binding, what happens? Lean: catalog membership ≠ executor binding. A tool can be approved (LLM allowed to emit it, app allowed to call govern) without TapPass having a backend for it. Track A only works for tools with a registered executor, which is a separate row in a separate table. -
Audit story. Every approve / deny / refine should land in the audit trail with the reviewer's identity, the schema as approved, and a link from the original blocking event. Reuses the existing audit infra.
8. What's missing in code
Section titled “8. What's missing in code”Crisp work list, mapping the design back to the codebase:
| # | Piece | New / Edit | Where | Notes |
|---|---|---|---|---|
| 1 | Promote runtime captures to discovered rows | New | policy/catalog/tools_promoter.py | Background task; reads registry/tools._tools and upserts into the catalog table |
| 2 | Catalog write endpoints | New | api/routes/catalog/tools_admin.py | POST /api/catalog/tools, PATCH /api/catalog/tools/{id} for status transitions |
| 3 | Status field on CatalogTool | Edit | models/catalog_tool.py | Add status: Literal["pending", "approved", "denied"] = "pending" |
| 4 | verify_tool_governance reads catalog | Edit | pipeline/steps/verify_tool_governance.py | Two-layer check; catalog membership first, per-call governance second |
| 5 | Discovery review UI | New | frontend page /app/catalog/discovered | List pending rows, schema diff, approve / deny / refine actions |
| 6 | Audit events | Edit | audit event types | tool_discovered, tool_approved, tool_denied, tool_refined |
| 7 | Auto-approve lanes (§5) | New | policy/catalog/tools_auto_approve.py | Opt-in heuristics; off by default |
| 8 | Catalog cache invalidation | Edit | policy/catalog/tools_repo.py | Bump version on every write so consumers know to refetch |
None of this is huge in isolation. The shape is mostly assembling pieces that already exist (capture, integrity, GET catalog, audit trail) with one new admin endpoint and one new pipeline step branch.
9. What this unlocks for the demo
Section titled “9. What this unlocks for the demo”Beyond the obvious ("the staging demo would actually work end-to-end"), three sales motions land cleanly on this:
- "TapPass discovers your shadow IT." First demo opens with a list of tools the agents have tried to call but haven't been approved yet — visible shadow tool use, the buyer's headline fear.
- "You don't write a tool catalog upfront." Big objection to most governance products: "I don't have an inventory." TapPass fills the inventory by watching.
- "Default-deny is non-negotiable but not painful." The block reason is actionable (review queue) instead of just a 403, which is what every CISO asks for once they've lived with rule-based deny lists.
10. Next step
Section titled “10. Next step”If this concept holds, the spec is the natural next deliverable: full feature doc with API contracts, DB migration, OPA changes, UI wireframes, rollout plan. Per the "concept first" convention, that's a separate document and only written when you ask.