Skip to content

LLM provider

An LLM provider is an HTTP client to one LLM API.

It speaks the API's wire protocol (Anthropic Messages, OpenAI Chat Completions, Bedrock InvokeModel, …), handles auth, streaming, retries, and surfaces a uniform call surface to the gateway pipeline.

An LLM provider executes. It does not enforce. Enforcement happens around it — in the Pipeline before/during/after each call.

Note on naming. The earlier Provider card conflated this concept (HTTP client) with the Policy provider concept (translator from Compiled Policy → runtime config). They are now split into two cards.

Takesa normalized chat/completion request from the gateway
Outputsa normalized response stream (chunks, tool-use blocks, errors)
Speaksone LLM vendor's wire protocol
Where it livestappass/gateway/<vendor>.py (anthropic.py, openai.py, …)
Statusshipped (Anthropic, OpenAI, LiteLLM); planned (Vertex / Gemini, direct Bedrock)
LLM providerAPINotes
anthropicAnthropic Messagesstreaming SSE; tool use; thinking blocks; cache_control
openaiOpenAI Chat Completionsstreaming; function calling; reasoning models
litellm100+ providers via LiteLLM proxycovers long tail; already shipped
vertexGoogle Vertex AI / Geminiplanned
bedrockAWS Bedrock InvokeModelplanned (direct, beyond LiteLLM)
azure-openaiAzure OpenAIvia openai client with base-URL override

Adding a new vendor is mostly: implement the wire protocol, normalize the response shape, register the client.

When a customer agent calls TapPass with POST /v1/messages, the gateway:

  1. Authenticates the tp_ key, identifies the agent + tenant.
  2. Resolves the customer's chosen vendor (and credentials from the Vault).
  3. Selects the matching LLM provider module (gateway/anthropic.py, …).
  4. Runs before-the-call Pipeline steps.
  5. Hands the request to the LLM provider, which calls the vendor's API and streams chunks back.
  6. Runs after-the-call Pipeline steps on each chunk.
  7. Returns the streamed response to the agent.

The LLM provider is the only component that talks to the vendor. Everything else operates on normalized request/response shapes.

Why this concept exists separately from Policy provider

Section titled “Why this concept exists separately from Policy provider”

Both concepts were originally collapsed into a single "Provider" card. They are structurally different and shipping cadences differ:

Policy providerLLM provider
Concepttranslator (Compiled Policy → target config)HTTP client (TapPass → LLM API)
Pure function?yes — deterministic, no side effects in compileno — makes outbound HTTP calls
Where it runshost runtime / control plane on policy updateserver-side, on every governed LLM call
Statusconcept (most), partial (openshell, gateway)shipped (Anthropic, OpenAI, LiteLLM)
Adding one~2-week translator project~1-week wire-protocol integration

The LLM gateway internally uses both:

  • A Policy provider (anthropic-gateway) translates the Compiled Policy into gateway configuration (budget caps, base-URL redirect, redaction policy).
  • An LLM provider (anthropic) executes the actual API calls under that configuration.
[implement] Wire-protocol client written; response normalized to TapPass shape
[register] Module registered under gateway/<vendor>.py; client picked by vendor name
[credential] Customer adds vendor credentials to Vault (BYOK or managed)
[serve] Gateway routes calls through this LLM provider for matching agents
[evolve] Vendor API versions bumped; client follows; old versions kept until customers migrate
EngineWhat it doesStatus
Gateway pipelineRuns before/during/after steps around the LLM callshipped
VaultResolves vendor credentials per tenantshipped
Provider routerPicks the right LLM provider per agent based on customer configshipped
Cost telemetryRecords token + latency + cost per call into Meteringpartial
PersonaSurfaceWhat you do
OperatorAdmin UI → Settings → LLM ProvidersConfigure which vendors are available + BYOK credentials
Operatortappass llm-provider list / test <id>Verify connectivity / credentials
Customer agentPOST /v1/messages (or /v1/chat/completions)Issue a governed LLM call; gateway picks the right LLM provider
  • distinct fromPolicy provider — translator, not client
  • wrapsPipeline — every call runs through the engine
  • uses → Vault — for vendor credentials (BYOK or managed)
  • emitsAudit log — every call lands as an audit row
  • feedsMetering — cost / token / latency rollups
LLM providerStatusNotes
anthropicshippedstreaming, tool-use, thinking, prompt caching
openaishippedstreaming, function calling, reasoning models
litellmshipped100+ vendors via LiteLLM proxy
vertex (Gemini)planneddirect Vertex client
bedrockplanneddirect AWS Bedrock client (current path is via LiteLLM)
azure-openaishippedvia openai client + base-URL override