Guide 27 mins

Stdio vs SSE vs HTTP MCP: Transport Trade-Offs in Production

Compare MCP transport layers: stdio for local dev, SSE for legacy web, HTTP for scale. When to migrate, trade-offs, and production selection guide.

The PADISO Team ·2026-05-13

What Is MCP and Why Transport Choice Matters
Stdio: Local Development and Subprocess Communication
SSE: The Legacy Web Standard
Streamable HTTP: The Production Standard
Transport Comparison Matrix
When Each Transport Wins
Migration Paths and Scaling Strategies
Security, Compliance, and Audit Readiness
Real-World Implementation Patterns
Conclusion and Next Steps

What Is MCP and Why Transport Choice Matters {#what-is-mcp}

The Model Context Protocol (MCP) is Anthropic’s open standard for connecting AI models to data, tools, and services. At its core, MCP defines how an AI client (Claude, or your custom agent) communicates with servers that expose resources, tools, and prompts. But how that communication happens—the transport layer—is not a detail. It’s a foundational choice that determines whether your AI application runs locally on a developer’s laptop, scales across cloud infrastructure, or sits behind enterprise firewalls with audit trails.

Three transport mechanisms dominate production deployments today: stdio, Server-Sent Events (SSE), and Streamable HTTP. Each was designed for a different context. Stdio emerged for local subprocess communication. SSE arrived as a browser-friendly alternative. Streamable HTTP represents the modern, scalable standard endorsed by the MCP specification.

Choosing the wrong transport early costs you weeks of refactoring later. Picking the right one from the start—or planning a migration path—means shipping faster, scaling without rewrites, and passing security audits without friction. This guide walks you through the trade-offs, shows you when each transport wins, and gives you a clear migration ladder as your product grows.

At PADISO, we’ve helped Sydney startups and enterprise teams architect AI agents and automation systems that move seamlessly from prototype to production. We’ve seen teams lock themselves into stdio and hit concurrency walls at Series A. We’ve watched others over-engineer SSE and waste months on deprecating infrastructure. This guide is built on that operational experience.

Stdio: Local Development and Subprocess Communication {#stdio-transport}

How Stdio Works

Stdio (standard input/output) is the simplest MCP transport. The AI client spawns a subprocess—your MCP server—and communicates with it via stdin (input) and stdout (output). Messages are JSON-RPC 2.0 formatted, one per line, streamed over pipes. No network, no sockets, no HTTP headers. Just raw process I/O.

This is how Claude Desktop runs local tools. You define an MCP server in your claude_desktop_config.json, Claude spawns it as a child process, and all tool calls flow through pipes. The elegance is real: no listening ports, no firewall rules, no authentication tokens. The subprocess runs in the same security context as the parent.

Strengths of Stdio

Simplicity and zero infrastructure. You write a server, define its command, and it works. No network binding, no TLS certificates, no load balancer. For local development, this is unbeatable. A single developer can prototype an MCP server in an afternoon and test it against Claude without touching networking code.

Tight process coupling. The client owns the server’s lifecycle. When the client exits, the server exits. When the server crashes, the client knows immediately. This implicit contract eliminates orphaned processes and dangling connections.

Immediate feedback loop. Subprocess I/O is synchronous from the client’s perspective. You call a tool, the server responds, you get the result. No polling, no eventual consistency, no race conditions around connection state.

Local security. The subprocess runs with the client’s permissions. If you trust the client process, you trust the server process. No network exposure, no cross-machine authentication, no token leakage over HTTP.

Weaknesses and Production Limits

Concurrency is fundamentally single-threaded. Stdio pipes are not multiplexed. If your MCP server receives two requests simultaneously, one blocks until the other completes. At scale, this becomes a hard bottleneck. A server handling 10 concurrent tool calls will queue 9 of them, introducing latency that users notice.

No horizontal scaling. You cannot run multiple stdio servers behind a load balancer. Each client spawns its own subprocess. If you have 100 concurrent clients, you have 100 server processes, each consuming memory and CPU. This works for local dev (one user, one process). It breaks for SaaS (thousands of users).

Subprocess overhead. Spawning a process is expensive. On Linux, it’s cheaper than on Windows, but still measurable. For a web service that needs to handle requests in milliseconds, spawning a subprocess per request is a non-starter. Even persistent subprocesses consume base memory and startup time.

No remote execution. Stdio only works on the same machine. If your MCP server lives on a GPU cluster and your client runs in a browser, stdio cannot bridge that gap. You need a network transport.

Debugging and observability are local. Stdio has no built-in logging, tracing, or metrics infrastructure. You can log to files, but aggregating logs from distributed stdio processes is manual and messy. Enterprise teams pursuing SOC 2 compliance or ISO 27001 audit readiness need audit trails and centralized logging, which stdio does not provide natively.

When Stdio Wins

Local development environments. A single developer running Claude Desktop with local tools.
Prototype and MVP phases. Founders building the first version of an AI agent without infrastructure overhead.
Offline-first applications. Desktop apps that do not need cloud connectivity.
Single-user or single-tenant scenarios. One client, one server, one machine.

If your entire user base is internal developers or you are in the “prove the idea” phase, stdio is the fastest path to a working MCP server. Ship it, test it, learn from it. Plan to migrate later.

SSE: The Legacy Web Standard {#sse-transport}

How SSE Works

Server-Sent Events (SSE) is an HTTP-based transport where the client opens a persistent HTTP connection to the server, and the server pushes messages down the stream. The client sends requests as separate HTTP POST calls; the server responds via the SSE stream. It’s half-duplex: the client initiates, the server responds on the same connection.

SSE was designed for browsers, where WebSocket support was inconsistent and developers needed a simpler push mechanism. A browser opens an EventSource connection, and the server streams events as newline-delimited text. For MCP, SSE replicates this pattern: the client holds an open GET request for incoming messages and sends outgoing messages via POST.

Strengths of SSE

HTTP-based, so it works through proxies and firewalls. Most corporate networks allow HTTP/HTTPS. SSE does not require special protocols or port forwarding. This made it attractive for enterprise deployments where network restrictions are tight.

Simpler than WebSocket for some scenarios. WebSocket requires upgrade negotiation and a separate protocol frame. SSE is just HTTP. For teams already familiar with REST APIs, SSE feels more natural.

Stateful connections with automatic reconnection. SSE clients can reconnect and resume, with built-in event ID tracking. If a connection drops, the client can ask for messages since the last received ID.

Better browser compatibility than WebSocket (historically). In the early 2010s, browsers supported SSE before WebSocket was stable. This advantage has eroded.

Weaknesses and Why SSE Is Deprecated for MCP

Half-duplex communication is inefficient. The client sends a request, waits for the response on the SSE stream, and only then sends the next request. For tools that call other tools, this creates round-trip latency. A chain of 5 tool calls means 5 round trips. With Streamable HTTP (which we’ll cover next), the same chain is pipelined.

Scaling requires sticky sessions or complex state management. If you have multiple SSE servers behind a load balancer, the client’s GET and POST requests must hit the same server (sticky sessions). Otherwise, the server handling the POST does not know about the SSE stream on a different server. This breaks horizontal scaling and complicates failover.

Connection overhead at scale. Each client holds an open HTTP connection. With thousands of concurrent clients, you are managing thousands of open sockets on the server. This consumes memory and file descriptors. Modern HTTP/2 and HTTP/3 multiplexing mitigate this, but SSE predates widespread HTTP/2 adoption.

No native support for request/response correlation. MCP uses JSON-RPC, which requires matching responses to requests by ID. SSE does not enforce this correlation at the transport level. If the client sends request A, then request B, and the server responds with B, then A, the client must buffer and reorder. This is error-prone.

Anthropic and the MCP community are moving away from SSE. The official MCP specification now emphasizes Streamable HTTP as the standard for network-based deployments. SSE is documented for compatibility, but new implementations should use HTTP.

Debugging and observability are harder than HTTP. SSE streams are opaque to standard HTTP debugging tools. You cannot easily inspect individual messages with curl or Postman. This matters when you are troubleshooting production issues.

When SSE Still Makes Sense

Legacy systems already using SSE. If you have an existing SSE infrastructure and want to minimize changes, SSE can work.
Browser-based clients with strict firewall rules. Some corporate networks block WebSocket but allow SSE.
Very small deployments. A single SSE server with a handful of clients may not hit the scaling limits.

But even in these scenarios, the recommendation is to plan a migration to Streamable HTTP. SSE is a stepping stone, not a destination.

Streamable HTTP: The Production Standard {#http-transport}

How Streamable HTTP Works

Streamable HTTP (sometimes called “HTTP with streaming” or just “HTTP transport”) is the modern MCP standard. The client opens a persistent HTTP connection using chunked transfer encoding. Requests and responses are multiplexed over this single connection, each as a separate JSON object. The connection stays open, allowing bidirectional messaging without half-duplex constraints.

Unlike SSE, which uses a separate POST channel for requests, Streamable HTTP uses a single connection for both directions. The protocol is symmetric: either party can send a message at any time. This is closer to WebSocket semantics, but without WebSocket’s complexity.

Implementation-wise, the client sends a request with Transfer-Encoding: chunked and keeps the connection open. Each chunk is a JSON-RPC message. The server responds with its own chunks. Both client and server can send multiple messages without waiting for responses. This enables pipelining and true bidirectional streaming.

Strengths of Streamable HTTP

Bidirectional, multiplexed communication. Both client and server can send messages simultaneously. Tool calls, responses, and subscriptions can interleave without artificial ordering. This is efficient and matches how modern AI agents actually work.

Scales horizontally without sticky sessions. Each HTTP request is independent. A load balancer can route requests to any server. No sticky sessions, no session replication, no distributed state management. This is the same scalability model that made REST APIs successful.

Standard HTTP infrastructure. You can use any HTTP load balancer, reverse proxy, or CDN. TLS termination, compression, caching, and routing all work with standard tools. Your DevOps team already knows how to operate HTTP services.

Full-duplex, truly concurrent. The server can handle hundreds of concurrent clients without spawning a process per client. A single Node.js or Python process can manage thousands of concurrent connections. This is why web-scale AI services use HTTP.

Request/response correlation is built-in. JSON-RPC 2.0 IDs ensure that responses match requests, even when messages arrive out of order. The transport layer handles this correctly.

Observability and debugging are first-class. You can log every request and response. You can trace latency. You can set up metrics for connection count, message throughput, and error rates. Standard HTTP monitoring tools work. This is critical for teams pursuing security audits and compliance via Vanta or similar platforms.

Supports authentication and authorization natively. HTTP headers carry Bearer tokens, API keys, or mutual TLS certificates. You can implement fine-grained access control. Stdio has none of this.

Works across networks and cloud infrastructure. Your MCP server can live in a container on AWS, Azure, or GCP. Your client can run in a browser, a mobile app, or another cloud service. Network boundaries are transparent.

Weaknesses and Trade-Offs

Network latency. Stdio and local SSE have microsecond latency. HTTP adds milliseconds. For most AI workloads, this is acceptable. For ultra-low-latency scenarios (real-time gaming, high-frequency trading), it matters.

Complexity in implementation. Stdio is a few lines of code. Streamable HTTP requires handling chunked encoding, connection lifecycle, and error recovery. Most teams use a library (like mcp-server-http or similar), but you need to understand the basics.

TLS certificates and key management. If you run Streamable HTTP over HTTPS (which you should in production), you need certificates. This is a minor operational burden, but it is a burden. Stdio has no such requirement.

Firewall and network configuration. You need to open ports, configure DNS, and manage ingress. Local stdio requires none of this. But this is a solved problem in modern DevOps.

When Streamable HTTP Wins

Any production deployment. SaaS, cloud, multi-tenant, or distributed systems.
Scaling beyond a handful of concurrent users.
Scenarios requiring audit trails, monitoring, and compliance.
Teams with modern infrastructure and DevOps practices.
Integration with existing APIs and microservices.

If you are shipping to users—internal or external—Streamable HTTP is the right choice. Period.

Transport Comparison Matrix {#comparison-matrix}

Criterion	Stdio	SSE	Streamable HTTP
Setup Complexity	Minimal	Low	Medium
Concurrency	Single-threaded (hard limit)	Limited (sticky sessions)	Unlimited
Horizontal Scaling	Not possible	Possible but complex	Native
Network Range	Local only	Local or network	Any (cloud-ready)
Latency	Sub-ms	1-10ms	5-50ms (HTTP)
Observability	Poor	Moderate	Excellent
Security (auth/authz)	Process-level only	Basic HTTP auth	Full HTTP security model
Compliance/Audit Ready	No	Partial	Yes
Debugging	Local logs	HTTP inspection	Standard HTTP tools
Maintenance Burden	None	Low	Medium (but well-understood)
Community Support	Growing	Declining	Standard (recommended)
TLS/Encryption	Implicit (same machine)	Via HTTPS	Via HTTPS
Multiplexing	No	Half-duplex	Full-duplex
Production-Ready	No	Transitional	Yes

When Each Transport Wins {#when-each-wins}

Stdio Wins: Local Development and Single-User Tools

Use stdio when:

You are building a Claude Desktop plugin or local tool. Claude Desktop spawns your server as a subprocess. Stdio is the natural choice. No network, no authentication, no infrastructure.
You are a founder or solo developer prototyping an AI agent. You want to ship something in days, not weeks. Stdio gets you there. Build the server, test it locally, prove the concept. Migrate later if you need to.
Your entire deployment is a single machine. A researcher running an agent on a GPU workstation. A small business with one AI assistant. Stdio is sufficient.
You need zero operational overhead. No DevOps, no monitoring, no infrastructure. Just code and run.

Real-world example: A Sydney fintech founder building a custom AI assistant for internal use. She writes an MCP server that queries her company’s database, runs it via stdio in Claude Desktop, and trains her team on it. Total setup time: 4 hours. Zero infrastructure cost. Perfect.

SSE Wins: Transitional Deployments and Legacy Systems

Use SSE when:

You have an existing SSE infrastructure and want to minimize rewrites. You already have SSE clients and servers. Adding MCP via SSE is incremental. Plan to migrate to HTTP eventually, but SSE works now.
You are migrating from stdio and want an intermediate step. SSE is slightly more complex than stdio but simpler than HTTP. If your team is new to network transports, SSE can be a learning step. But do not stay here long.
You have a small number of concurrent users (< 100) and network restrictions prevent HTTP. Some corporate firewalls block WebSocket but allow SSE. In this case, SSE is a workaround. But HTTP/HTTPS will work too, and is preferable.
You are integrating with legacy systems that already speak SSE. Some older APIs or event systems use SSE. If you are wrapping one as an MCP server, SSE might be the natural fit.

Real-world example: A mid-market enterprise has an existing event streaming system using SSE. They want to expose it as an MCP server for their AI team. SSE is the path of least resistance. They build an MCP wrapper around their SSE API. It works for 50 concurrent users. When they scale to 500, they refactor to HTTP.

HTTP Wins: Production, Scale, and Compliance

Use Streamable HTTP when:

You are shipping to external users or a large internal user base. Any production SaaS, any multi-tenant system, any scenario with more than a few concurrent users. HTTP is the only choice that scales.
You need audit trails, monitoring, and compliance. Teams pursuing SOC 2 or ISO 27001 compliance require centralized logging, request tracing, and access control. HTTP provides all of this natively. Stdio does not.
Your infrastructure is cloud-native (Kubernetes, serverless, microservices). Modern cloud platforms are built for HTTP. Deploying stdio or SSE servers in Kubernetes is awkward. HTTP is native.
You need to integrate with other services, APIs, or third-party tools. Your MCP server needs to call external APIs, authenticate with API keys, and respect rate limits. HTTP is the standard for this.
You want to use managed services or SaaS offerings. If you want to run your MCP server on Lambda, Cloud Run, or a similar platform, HTTP is the only option.
You are working with a venture studio or AI agency that is helping you scale. Professional teams building production AI systems use HTTP. It is the industry standard.

Real-world example: A Series A startup in Sydney is building an AI-powered workflow automation platform. They have 500 active users and are growing. They start with stdio for prototyping. When they hit 50 concurrent users, stdio becomes a bottleneck. They refactor to Streamable HTTP in 2 weeks. They can now handle 5,000 concurrent users on the same infrastructure. They add request logging for audit compliance. Six months later, they pass their SOC 2 audit because their HTTP infrastructure has complete audit trails.

Migration Paths and Scaling Strategies {#migration-paths}

The Natural Progression: Stdio → HTTP

Most teams follow a clear path:

Phase 1: Stdio (Weeks 1-4)

You build your MCP server as a subprocess. It works locally. You test it with Claude Desktop. You prove the concept: “Yes, our AI agent can call our tools.”

At this stage, do not over-engineer. Do not worry about scalability. Focus on shipping something that works.

Phase 2: Recognize the Limit (Week 5-8)

You start getting concurrent requests. Maybe you have 3 internal users testing the agent. Maybe you are running the agent in a loop, making multiple tool calls per iteration. Suddenly, stdio feels slow. You notice that tool calls queue up. You realize stdio is single-threaded.

This is the moment to plan a migration. Do not wait until you have 100 users. Do it now, with 10.

Phase 3: Refactor to HTTP (Week 9-12)

You rewrite your MCP server to listen on an HTTP port instead of reading from stdin. This is not a rewrite of your tool logic—only the transport layer changes. If you structured your code well, this takes 1-2 weeks.

You deploy it to a cloud platform (AWS, GCP, Azure, or a VPS). You point your client to the HTTP endpoint. Everything works the same, but now you can handle 100 concurrent users instead of 1.

Phase 4: Add Observability and Compliance (Week 13-16)

Now that you are on HTTP, you add logging, metrics, and access control. You implement request tracing with correlation IDs. You set up alerts for errors and latency. You prepare for security audits.

If you are working with a compliance-focused partner like PADISO, this is where you get help passing SOC 2 or ISO 27001 audits via Vanta. HTTP infrastructure is audit-friendly. Stdio is not.

The Skip-SSE Path

We recommend skipping SSE entirely. Go straight from stdio to HTTP.

Why? SSE introduces complexity without solving the real problem. If you are going to migrate once, migrate all the way. Migrating twice (stdio → SSE → HTTP) wastes time and introduces technical debt.

The only exception: if you have an existing SSE system and want to minimize disruption, SSE is a bridge. But plan to cross it quickly.

Handling the Migration Without Downtime

Deploy the HTTP server alongside the stdio server. Both run simultaneously. Your client can switch between them via configuration.
Test the HTTP server with a subset of users or in a staging environment. Verify that tool calls work, responses are correct, and performance is acceptable.
Gradually shift traffic to HTTP. Start with 10% of requests, then 50%, then 100%. Monitor error rates and latency. If anything goes wrong, you can roll back instantly.
Once HTTP is stable, retire stdio. Remove the stdio code, simplify your deployment, and move on.

This phased approach means zero downtime and minimal risk.

Scaling Beyond a Single HTTP Server

Once you are on HTTP, scaling is straightforward:

Run multiple instances of your MCP server. Each listens on the same port (e.g., 8000) on different machines or containers.
Put a load balancer in front. The load balancer (nginx, HAProxy, AWS ALB, etc.) distributes requests across instances. Each request goes to a different server, but the client sees a single endpoint.
Use stateless design. Each server instance should not rely on local state. If you need to store state (e.g., conversation history), use a shared database or cache (Redis, PostgreSQL, etc.).
Monitor and auto-scale. Watch CPU, memory, and request latency. When load increases, spin up more instances. When load decreases, shut them down. Cloud platforms like AWS and GCP do this automatically.

At PADISO, we help Sydney teams and enterprises architect this kind of scalable infrastructure. We’ve seen teams scale from 10 to 10,000 concurrent users without rewriting their core logic. The key is the transport layer and infrastructure design.

Security, Compliance, and Audit Readiness {#security-compliance}

Stdio and Security

Stdio is secure by default because it is local. The subprocess runs with the client’s permissions. There is no network exposure, no token leakage, no man-in-the-middle attacks.

But stdio has no audit trail. If you need to prove “who called what tool and when,” stdio cannot help. The client process calls the tool, the tool responds, and there is no record. This is fine for local development. It is not fine for regulated industries.

SSE and Security

SSE can use HTTPS for encryption in transit. You can add HTTP authentication headers (Bearer tokens, API keys, etc.). But SSE has the same audit trail problem as stdio: there is no standard way to log and correlate requests across a distributed system.

SSE is better than stdio for security, but not by much.

HTTP and Compliance

Streamable HTTP is built on standard HTTP security and audit patterns:

Authentication: Use Bearer tokens, mutual TLS, or OAuth. Every request includes credentials. The server verifies them before processing.

Authorization: Every request can be checked against a policy. “User A can call tools X and Y, but not Z.” This is fine-grained access control.

Encryption in transit: Use HTTPS. TLS 1.3 is standard. Your data is encrypted between client and server.

Audit trails: Log every request and response. Include timestamps, user IDs, tool names, arguments, and results. Store logs in a centralized system (CloudWatch, Datadog, Splunk, etc.). These logs are immutable and tamper-evident.

Request tracing: Use correlation IDs to link related requests. If a tool call triggers other tool calls, trace the entire chain. This is essential for debugging and compliance.

Rate limiting and abuse prevention: Detect and block suspicious activity. If a user makes 1,000 requests in 1 second, block them. If a tool is called with invalid arguments repeatedly, alert.

All of these are standard HTTP practices. Your DevOps team knows how to implement them. Your compliance auditor expects them.

Passing SOC 2 and ISO 27001 Audits

If you are pursuing SOC 2 Type II or ISO 27001 compliance, your transport layer matters.

Stdio: Not compliant. No audit trails, no access control, no encryption. Auditors will reject it.

SSE: Partially compliant. You can add HTTPS and logging, but the half-duplex nature and lack of standard patterns make it awkward. Auditors will ask questions.

HTTP: Fully compliant. Standard patterns, well-understood security, comprehensive logging. Auditors recognize and approve it.

If you are using Vanta or a similar compliance platform to manage your audit readiness, HTTP is the only transport that integrates well. Vanta can monitor your HTTP services, collect logs, verify encryption, and generate audit reports. It cannot do this for stdio.

At PADISO, we help teams architect systems that pass security audits on the first try. We use HTTP transports, implement comprehensive logging, and design for compliance from day one. This saves months of remediation work.

Real-World Implementation Patterns {#implementation-patterns}

Pattern 1: Local Development with Stdio

Setup:

You write an MCP server in Python or Node.js. It reads JSON-RPC messages from stdin, processes them, and writes responses to stdout.

import json
import sys

class MCPServer:
    def __init__(self):
        self.tools = {
            "get_weather": self.get_weather,
            "search": self.search,
        }

    def get_weather(self, location):
        return {"temperature": 72, "condition": "sunny"}

    def search(self, query):
        return {"results": ["Result 1", "Result 2"]}

    def run(self):
        while True:
            line = sys.stdin.readline()
            if not line:
                break
            request = json.loads(line)
            tool_name = request.get("method").split("/")[-1]
            params = request.get("params", {})
            result = self.tools[tool_name](**params)
            response = {
                "jsonrpc": "2.0",
                "id": request.get("id"),
                "result": result,
            }
            print(json.dumps(response))
            sys.stdout.flush()

if __name__ == "__main__":
    server = MCPServer()
    server.run()

You configure Claude Desktop to run this server:

{
  "mcpServers": {
    "my_tools": {
      "command": "python",
      "args": ["/path/to/mcp_server.py"]
    }
  }
}

Claude Desktop spawns the server, and you can use your tools in conversations. This works immediately. No infrastructure, no configuration.

Pattern 2: Scaling to HTTP with Docker and Kubernetes

Setup:

You refactor your server to listen on an HTTP port. You use a library like mcp-server-http (or write your own HTTP handler).

from flask import Flask, request, jsonify
import json

app = Flask(__name__)

class MCPServer:
    def __init__(self):
        self.tools = {
            "get_weather": self.get_weather,
            "search": self.search,
        }

    def get_weather(self, location):
        return {"temperature": 72, "condition": "sunny"}

    def search(self, query):
        return {"results": ["Result 1", "Result 2"]}

server = MCPServer()

@app.route("/mcp", methods=["POST"])
def handle_mcp():
    request_data = request.get_json()
    tool_name = request_data.get("method").split("/")[-1]
    params = request_data.get("params", {})
    result = server.tools[tool_name](**params)
    response = {
        "jsonrpc": "2.0",
        "id": request_data.get("id"),
        "result": result,
    }
    return jsonify(response)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

You containerize this:

FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "mcp_server.py"]

You deploy to Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
      - name: mcp-server
        image: my-registry/mcp-server:latest
        ports:
        - containerPort: 8000
        env:
        - name: LOG_LEVEL
          value: "INFO"
---
apiVersion: v1
kind: Service
metadata:
  name: mcp-server-service
spec:
  selector:
    app: mcp-server
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer

Kubernetes automatically manages 3 replicas, load balances requests, and restarts failed pods. You now handle 1,000 concurrent users instead of 1.

Pattern 3: Adding Compliance and Observability

Logging:

Every request is logged with timestamps, user IDs, and results.

import logging
import uuid
from datetime import datetime

logger = logging.getLogger(__name__)

@app.route("/mcp", methods=["POST"])
def handle_mcp():
    request_id = str(uuid.uuid4())
    user_id = request.headers.get("X-User-ID")
    request_data = request.get_json()
    tool_name = request_data.get("method").split("/")[-1]
    params = request_data.get("params", {})
    
    logger.info(
        "MCP request",
        extra={
            "request_id": request_id,
            "user_id": user_id,
            "tool": tool_name,
            "timestamp": datetime.utcnow().isoformat(),
        },
    )
    
    result = server.tools[tool_name](**params)
    response = {
        "jsonrpc": "2.0",
        "id": request_data.get("id"),
        "result": result,
    }
    
    logger.info(
        "MCP response",
        extra={
            "request_id": request_id,
            "status": "success",
        },
    )
    
    return jsonify(response)

You send logs to a centralized platform (CloudWatch, Datadog, Splunk, etc.). Auditors can query logs, verify that only authorized users called specific tools, and trace the entire audit trail.

Authentication:

You add JWT or API key validation:

from functools import wraps

def require_auth(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        token = request.headers.get("Authorization", "").replace("Bearer ", "")
        if not verify_token(token):
            return {"error": "Unauthorized"}, 401
        return f(*args, **kwargs)
    return decorated_function

@app.route("/mcp", methods=["POST"])
@require_auth
def handle_mcp():
    # ... rest of handler

Every request must include a valid token. You can revoke tokens, audit who accessed what, and enforce role-based access control.

With these patterns, you have a production-grade MCP server that is secure, scalable, and audit-ready. This is what enterprise teams and Series B startups deploy.

Conclusion and Next Steps {#conclusion}

Key Takeaways

Stdio is for local development. Ship your first MCP server with stdio. It is fast, simple, and sufficient for prototyping. Do not overthink it.
HTTP is for production. When you have more than a handful of concurrent users, scale to Streamable HTTP. It is the industry standard, scales horizontally, and integrates with compliance frameworks.
Skip SSE. SSE is a historical artifact. If you are building something new, go straight to HTTP. If you have existing SSE infrastructure, plan a migration to HTTP within 6 months.
Plan your migration early. When you realize stdio is becoming a bottleneck, refactor immediately. Do not wait until you have 100 users and performance is degraded. A proactive migration takes 2-4 weeks. A reactive one takes 8-12 weeks and introduces bugs.
Compliance is a transport-layer decision. If you need to pass SOC 2 or ISO 27001 audits, use HTTP. Stdio and SSE do not provide the audit trails and access control that auditors expect. HTTP does.

Next Steps

If you are just starting:

Build your MCP server with stdio.
Test it locally with Claude Desktop.
Prove your concept: “Our AI agent can call our tools, and it works.”
Plan a migration to HTTP for when you have 10+ concurrent users.

If you are hitting stdio limits:

Refactor your server to listen on an HTTP port.
Deploy it to a cloud platform (AWS, GCP, Azure, or a VPS).
Point your client to the HTTP endpoint.
Monitor performance and add more instances if needed.

If you need compliance:

Ensure your MCP server uses Streamable HTTP, not stdio or SSE.
Implement comprehensive logging and request tracing.
Add authentication and authorization.
Set up centralized log aggregation and alerting.
Work with a partner like PADISO to prepare for audits and pass SOC 2 or ISO 27001 on the first try.

Working with PADISO

At PADISO, we help Sydney startups, enterprises, and teams architect and scale AI systems. Whether you are building your first MCP server, scaling from stdio to HTTP, or preparing for a security audit, we have the operational expertise to get you there faster.

We understand the full stack: from MCP transport mechanics to Kubernetes deployment to compliance frameworks. We have helped teams pass SOC 2 audits in 8 weeks, scale AI services from 10 to 10,000 concurrent users, and architect systems that are both fast and secure.

If you are a founder or engineering leader building AI products, we can help. We offer fractional CTO leadership, co-build partnerships, and AI strategy and readiness consulting. We work with seed-stage startups and Series B companies, mid-market operators, and enterprises modernising with AI.

Get in touch to discuss your MCP architecture, transport strategy, or compliance roadmap. We are based in Sydney and work with teams globally.

For more on AI agency ROI, business models, and scaling AI teams, explore: