Skip to content

Runtime Tool Discovery & Catalog Promotion — Concept

Runtime Tool Discovery & Catalog Promotion — Concept

Section titled “Runtime Tool Discovery & Catalog Promotion — Concept”

Status: Concept / design draft. Not yet a feature spec. Date: 2026-05-06 Origin: Brainstorm while seeding a staging demo. Tools used in /v1/chat/completions got blocked by verify_tool_governance even after the app called /tools/govern, because there was no governed backend to dispatch to. We pulled on the thread and realised TapPass is conflating two different questions into one pipeline step. This concept untangles them.


TapPass today asks one question of every tool call: "was this specific tool_call_id governed via /tools/govern or /tools/execute?" That's a per-call ceremony — useful for arg-level scrutiny but blunt as a way to decide whether a tool is allowed to exist in this org at all.

The proposal is to split that single check into two questions, and add a runtime loop that lets the catalog grow from observation:

The Catalog answers "does this tool name exist in our org?". The Per-Call Governance check answers "is this specific invocation OK to run?". They run in series. Both must pass. The Catalog is the organizational allow-list; per-call governance is the request-scoped scrutiny.

Default-deny on first sight. Schema captured. Operator promotes from a review queue. Subsequent calls pass without re-prompting.

This is the Auth0 split — application registration vs. per-request scope check — applied to tool calls. It also closes a loop the codebase already half-implements: CatalogSource.DISCOVERED exists, runtime tool capture exists, but nothing connects the two to the catalog write path (because the write path itself isn't shipped).


LayerWhat it does todayGap
/v1/chat/completionsCaptures every tools=[...] schema into registry/tools.py (in-memory, per org)Capture is ephemeral; never promoted to the catalog table
/tools/governRuns the pipeline on a single tool call, marks tool_call_id as governedDoesn't ask whether the tool should exist — just whether this call is acceptable
verify_tool_governanceBlocks the next chat completion if any prior tool result has an ungoverned tool_call_idConflates "unknown tool" with "ungoverned call". An app that legitimately governs every call still gets blocked if the tool isn't backend-registered
/api/catalog/tools (GET)Lists the tool catalog with provenanceNo write path. CatalogSource.DISCOVERED is defined but no row ever lands with that source
/tools/integrity/approveCISO-style approval when a known tool's schema changesTriggered on drift, not on first sight. The "first sight" path doesn't exist as a workflow

The brokenness shows up the moment you try to do an honest demo: ad-hoc tools get blocked at call_tool (no executor) or at verify_tool_governance (no per-call ceremony match). Neither block carries the right product message — neither says "this tool is not in your catalog yet, here's how to add it".


3.1 Catalog membership — organizational allow-list

Section titled “3.1 Catalog membership — organizational allow-list”

A tool is in the catalog (source{builtin, tenant, discovered, mcp_advertised, sdk_declared, imported}, status=approved) iff an operator (or auto-approval rule, see §5) has decided it can exist in this org.

  • Scope: org-level (with optional per-project narrowing — see §7)
  • Identity: by tool_name (org-scoped). Two tools with the same name in different orgs are independent rows.
  • Schema is fingerprinted — drift is the existing rug-pull detection's responsibility; new schemas for an existing approved tool re-enter review.
  • Failure mode: unknown tool → request rejected with a catalog-aware error message: "Tool foo is not in your catalog. Pending review at /app/catalog/discovered."

3.2 Per-call governance — request-scoped scrutiny

Section titled “3.2 Per-call governance — request-scoped scrutiny”

Even when a tool is in the catalog, every call to it goes through /tools/govern (Track B) or /tools/execute (Track A) for arg-level checks: PII in arguments, constraint violations, capability-token scope, rate limits, etc.

  • Scope: per tool_call_id
  • Failure mode: specific call rejected with a step-level reason ("argument account matches PII pattern", "tool not in capability_token scope"), regardless of whether the tool type is allowed in the org.

3.3 How they compose in verify_tool_governance

Section titled “3.3 How they compose in verify_tool_governance”
on each tool result entering /v1/chat/completions:
if tool_name not in catalog(org_id, status=approved):
BLOCK with {step: catalog_check, reason: "tool not in org catalog", tool_name}
side_effect: ensure a `discovered` row exists for review
elif tool_call_id not in governed_set:
BLOCK with {step: verify_tool_governance, reason: "no /tools/govern ceremony for this call"}
else:
CONTINUE

Two clear, separable rejection messages instead of today's conflated "1 ungoverned" string.


Default behaviour for any tool name not yet in the catalog:

  1. Observe. First time a tool name is seen in a /v1/chat/completions request (already captured by registry/tools.py).
  2. Promote to pending. Background task writes a row: (org_id, tool_name, schema, source=discovered, status=pending, first_seen_at, last_seen_at, observation_count, observed_by_agents=[...]).
  3. Block at runtime. verify_tool_governance's catalog check returns the block reason that includes the discovered row's id, so the dashboard can deep-link.
  4. Surface in review queue. New page: /app/catalog/discovered. Operator sees agent + tool name + inferred schema + sample arguments + how many times it's been attempted.
  5. Decide.
    • Approve → row flips to status=approved. Next call passes the catalog check.
    • Refine → operator edits the schema (narrow types, add constraints), then approves. The original schema fingerprint is captured for drift detection.
    • Deny → row flips to status=denied. Future calls get a faster-path block with a clearer message.
    • Defer → row stays pending; no UI nudge until next observation.

The loop is intentionally human-in-the-loop by default. Auto-approval is opt-in (§5).


The default for discovered tools is block until reviewed. But a senior buyer will hate that for high-trust sources. Three opt-in lanes:

LaneTriggerDefault?
mcp_advertised from a registered MCP serverTool was advertised via the org's registered MCP server's tool schemaAuto-approve (the MCP server is the trust anchor)
sdk_declared matching agent's declared_capabilitiesAgent registered with declared_capabilities=["lookup_refund"]; observed tool name matchesOff; flag for one-click batch approve
discovered from agent in shadow / canary modeAgent has mode=shadow setOff; auto-promote to pending but don't block (already what shadow mode means for everything else)

These are knobs, not defaults. The default story remains: see → block → review → approve.


Approval grants exactly one thing: the catalog check passes. Per-call governance still applies. Specifically:

  • ✅ The tool's existence is no longer a block reason
  • /tools/govern is still expected before feeding a tool result back to the LLM
  • ❌ Argument-level PII / secret / shell-bleed detections still run on every call
  • ❌ Capability-token scope still gates which tools each chat completion can produce

This is important: approving a tool ≠ trusting its arguments. The two scrutiny layers stay independent.


  1. Approval scope: org-wide or per-project? The current Project & Teams concept project-scopes most resources. Should a tool approved in Project A also be approved in Project B of the same org? Lean: org-wide approval, per-project enable/disable. Approving once is a one-time decision; who can use it is a separate authorization knob handled by existing tool-permissions.

  2. What happens to in-flight calls during review? Today: blocked. Alternative: queue and replay on approval (UX-friendly, but couples the agent's runtime to the operator's review SLA — bad). Lean: keep blocking, but the dashboard shows agents waiting on approval so the operator knows there's pressure.

  3. Granularity: per (tool_name, arg_schema_hash) or just tool_name? If the agent extends the tool's schema later (new optional field), is that a new tool or a drift event on the same tool? Lean: same tool, schema change goes through existing rug-pull integrity flow. Don't fragment the catalog.

  4. TTL on pending entries? Pending rows that no one approves and no agent retries probably want to expire. Lean: 30-day TTL on status=pending with no observation in that window. Doesn't apply to denied or approved.

  5. Catalog-check cost on the hot path. Reading the catalog on every chat completion is not free. Lean: in-memory cache per (org_id, version), invalidated on catalog write. Same pattern as the OPA bridge already uses.

  6. How does this interact with Track A (/tools/execute)? Track A executes the tool through TapPass. If the tool is in the catalog but TapPass has no executor binding, what happens? Lean: catalog membership ≠ executor binding. A tool can be approved (LLM allowed to emit it, app allowed to call govern) without TapPass having a backend for it. Track A only works for tools with a registered executor, which is a separate row in a separate table.

  7. Audit story. Every approve / deny / refine should land in the audit trail with the reviewer's identity, the schema as approved, and a link from the original blocking event. Reuses the existing audit infra.


Crisp work list, mapping the design back to the codebase:

#PieceNew / EditWhereNotes
1Promote runtime captures to discovered rowsNewpolicy/catalog/tools_promoter.pyBackground task; reads registry/tools._tools and upserts into the catalog table
2Catalog write endpointsNewapi/routes/catalog/tools_admin.pyPOST /api/catalog/tools, PATCH /api/catalog/tools/{id} for status transitions
3Status field on CatalogToolEditmodels/catalog_tool.pyAdd status: Literal["pending", "approved", "denied"] = "pending"
4verify_tool_governance reads catalogEditpipeline/steps/verify_tool_governance.pyTwo-layer check; catalog membership first, per-call governance second
5Discovery review UINewfrontend page /app/catalog/discoveredList pending rows, schema diff, approve / deny / refine actions
6Audit eventsEditaudit event typestool_discovered, tool_approved, tool_denied, tool_refined
7Auto-approve lanes (§5)Newpolicy/catalog/tools_auto_approve.pyOpt-in heuristics; off by default
8Catalog cache invalidationEditpolicy/catalog/tools_repo.pyBump version on every write so consumers know to refetch

None of this is huge in isolation. The shape is mostly assembling pieces that already exist (capture, integrity, GET catalog, audit trail) with one new admin endpoint and one new pipeline step branch.


Beyond the obvious ("the staging demo would actually work end-to-end"), three sales motions land cleanly on this:

  • "TapPass discovers your shadow IT." First demo opens with a list of tools the agents have tried to call but haven't been approved yet — visible shadow tool use, the buyer's headline fear.
  • "You don't write a tool catalog upfront." Big objection to most governance products: "I don't have an inventory." TapPass fills the inventory by watching.
  • "Default-deny is non-negotiable but not painful." The block reason is actionable (review queue) instead of just a 403, which is what every CISO asks for once they've lived with rule-based deny lists.

If this concept holds, the spec is the natural next deliverable: full feature doc with API contracts, DB migration, OPA changes, UI wireframes, rollout plan. Per the "concept first" convention, that's a separate document and only written when you ask.