MCP in Production Means Solving Auth, Streaming, and Context Boundaries
Sun Mar 08 2026
David Bleeker, Founder

Introduction / Context
Model Context Protocol moved quickly from experimental curiosity to an expected part of AI tooling conversations. That is visible in the current question stream: developers are asking how to pass dynamic parameters into MCP servers, how to manage multiple MCP tool servers, how memory interacts with MCP-based tool calling, and how to keep streaming responses coherent across model and tool boundaries.
That pattern makes sense. MCP solves a real integration problem: it gives models and clients a standard way to discover and use tools. But standardization at the interface level does not remove the operational complexity underneath. It just makes the complexity easier to locate.
The moment an MCP-based system moves beyond a local demo, the real questions show up:
- who is allowed to call which tool?
- how is user identity propagated?
- what context is exposed to the server?
- how do you handle partial streaming while tools are still running?
- how do you prevent one noisy server from degrading the whole agent?
The Question
How do you use MCP in production without losing control of authentication, streaming behavior, and context boundaries?
The Answer
The production answer is to treat MCP as a transport and capability-discovery layer, not as your application security model.
That distinction matters. MCP can help a model discover tools, but it should not decide:
- tenant access
- environment access
- secret scope
- request shaping
- output visibility
Those decisions belong in your application runtime or gateway.
What goes wrong first
The first problem is overexposure. Teams often register tools directly from an MCP server and let the model decide what to call. That feels elegant, but it bypasses the policy context your product actually needs. A model may see a tool called searchKnowledgeBase, but your runtime knows whether the current user is allowed to search only their workspace, all workspaces, or none.
The second problem is context leakage. If the client forwards raw chat history, user metadata, and tool outputs to every server, you end up with an uncontrolled context graph. That is bad for privacy, latency, and debugging.
The third problem is streaming ambiguity. Users want streaming, but tool-based workflows do not always produce tokens linearly. If the model starts narrating before tool results are stable, you either get retractions or a strange "thinking out loud" artifact that is hard to trust.
What works in practice
The right production pattern is to add an MCP gateway or adapter layer. The model interacts with tool capabilities through that layer, and the gateway enforces:
- per-user and per-tenant authorization
- argument validation and rewriting
- server health checks and timeouts
- response normalization
- context minimization
- observability and audit logs
This keeps MCP useful without making it the place where you bury cross-cutting concerns.
Tradeoffs and implementation risks
The tradeoff is extra infrastructure. A direct MCP demo is simpler to write. A policy-aware gateway is simpler to operate. In production, operational simplicity wins.
A subtle risk is letting "tool discovery" become "tool sprawl." Once teams can expose tools quickly, they tend to expose too many. Models then have a larger search space, worse tool selection quality, and higher latency. You need capability curation, not just capability discovery.
An experienced engineering insight here is that MCP standardizes the edge of the interface, not the semantics of your business logic. You still need product-level rules for identity, quotas, and approvals.
Architecture / Implementation Guidance
The concrete recommendation is to place a typed gateway between the model runtime and all MCP servers.
That gateway should do four jobs:
- authenticate the user and derive an execution principal
- resolve which MCP servers are eligible for this request
- transform raw MCP capabilities into app-approved tool contracts
- orchestrate streaming so the UI only sees stable phases
A useful execution model has distinct phases:
discover: ask servers what capabilities existfilter: remove tools the current principal cannot useplan: let the model propose a tool or sequenceexecute: run through the gateway with budgets and timeoutsstream: emit status events and final answer segments separately
This produces cleaner UI behavior than unstructured token streaming. Instead of pretending the answer is fully formed from the start, the UI can stream state transitions:
- "searching workspace"
- "reading policy docs"
- "drafting answer"
That is both more honest and easier to debug.
For context, keep the server contract lean. Send only what the server needs:
- narrowed user intent
- validated parameters
- minimal identity scope
- request correlation ID
Do not send the full transcript unless the server genuinely needs it.
Code Snippets
type Capability = {
name: string
serverId: string
schema: Record<string, unknown>
}
type ExecutionContext = {
userId: string
tenantId: string
allowedServers: string[]
}
export function filterCapabilities(
capabilities: Capability[],
context: ExecutionContext,
): Capability[] {
return capabilities.filter((capability) =>
context.allowedServers.includes(capability.serverId),
)
}
export async function executeMcpTool(input: {
context: ExecutionContext
toolName: string
args: Record<string, unknown>
registry: ApprovedToolRegistry
gateway: MpcGateway
}) {
const approvedTool = input.registry.get(input.toolName)
if (!approvedTool) {
throw new Error(`unapproved tool: ${input.toolName}`)
}
approvedTool.assertArgs(input.args)
return input.gateway.call({
serverId: approvedTool.serverId,
tool: approvedTool.remoteToolName,
args: approvedTool.rewriteArgs(input.args, input.context),
headers: {
'x-tenant-id': input.context.tenantId,
'x-user-id': input.context.userId,
},
timeoutMs: 8_000,
})
}
export function streamPhases(send: (event: unknown) => void) {
send({ type: 'status', phase: 'discovering_tools' })
send({ type: 'status', phase: 'executing_tools' })
send({ type: 'status', phase: 'drafting_response' })
}
These patterns look conservative because they are. Production MCP systems need less magic and more control.
Key Takeaways
- MCP is a useful protocol boundary, not a substitute for policy or auth.
- Put a gateway between the model runtime and MCP servers.
- Filter capabilities by user, tenant, and task before the model sees them.
- Stream execution phases, not unstable partial guesses.
- Limit context aggressively or MCP systems become slow, leaky, and difficult to reason about.