Enterprise MCP Servers: A Reference Architecture
Build governed, multi-tenant MCP servers for safe Claude agent access to internal tools. Reference architecture, design patterns, and implementation guide.
Enterprise MCP Servers: A Reference Architecture
Table of Contents
- What Enterprise MCP Servers Are
- Why MCP Matters for Large Organisations
- Core Architecture Principles
- Multi-Tenant Design Patterns
- Governance and Access Control
- Security and Compliance
- Implementation Reference Architecture
- Real-World Deployment Patterns
- Monitoring, Observability, and SLAs
- Common Pitfalls and How to Avoid Them
- Getting Started: Your Next Steps
What Enterprise MCP Servers Are
The Model Context Protocol (MCP) is a standardised interface that lets AI agents safely and reliably access your internal tools, databases, and services. Think of it as a translation layer between Claude (or other AI models) and your company’s backend systems. Instead of giving an agent direct access to your infrastructure—which would be dangerous—you create an MCP server that sits in the middle, enforcing governance rules, rate limits, and audit trails.
At its core, the MCP architecture defines a client-server relationship. The AI agent is the client; your MCP server is the endpoint that exposes tools, resources, and prompts. This separation of concerns is critical for enterprises. It means you can upgrade your AI models, change your backend systems, or adjust governance rules without breaking the integration.
When you’re running multiple teams, business units, or customer instances across your organisation, a single monolithic MCP server won’t cut it. You need a multi-tenant reference architecture that isolates data, enforces role-based access control (RBAC), logs every action, and scales horizontally. That’s what this guide covers.
Enterprise MCP servers differ fundamentally from single-use integrations. They’re infrastructure. They need to be reliable, auditable, and maintainable. They must support teams building AI-driven workflows, automations, and decision-support systems without creating security or compliance nightmares. If you’re running agentic AI across your organisation, governance is non-negotiable.
Why MCP Matters for Large Organisations
Large organisations face a specific problem: they have dozens of internal tools, legacy systems, and data sources. Employees and AI agents need access to these systems, but uncontrolled access creates risk. Every tool integration is a potential attack surface. Every data access is a compliance liability.
Traditional approaches—API keys scattered across Slack, direct database connections, custom integrations for each tool—don’t scale and create audit nightmares. They also lock you into specific AI vendors. If you decide to switch from Claude to another model, you’ve rebuilt everything.
MCP solves this by establishing a standard protocol. Your MCP server becomes the single source of truth for what tools are available, who can access them, and what they can do. This is especially valuable when you’re pursuing SOC 2 or ISO 27001 compliance. Auditors want to see governed access, clear audit trails, and enforced controls. MCP servers, when built correctly, provide exactly that.
Consider the scale challenge: a mid-market company might have 50 teams, each wanting to build AI workflows. A large enterprise might have 500+. Without a reference architecture, you end up with 50 or 500 ad-hoc integrations, each with different security models, logging approaches, and failure modes. An enterprise MCP server framework lets you build once and reuse 500 times.
The business case is compelling. Teams ship faster because they don’t rebuild authentication and governance for each workflow. Security teams sleep better because access is logged and enforced centrally. Compliance teams pass audits because the audit trail is comprehensive. And when you need to modernise with agentic AI, you have a proven, scalable foundation.
Core Architecture Principles
Before diving into specifics, let’s establish the principles that should guide your enterprise MCP server design.
Principle 1: Isolation by Design
Tenants (teams, business units, or customers) must be isolated at every layer. If one tenant’s workflow crashes, it shouldn’t affect others. If one tenant’s data is compromised, others remain protected. This means:
- Data isolation: Each tenant’s data is stored separately, never mixed in shared tables.
- Compute isolation: Ideally, each tenant runs on isolated compute (separate containers, processes, or even servers).
- Network isolation: Tenants’ traffic is segregated; one tenant cannot sniff another’s requests.
This sounds expensive, but modern container orchestration and cloud platforms make it feasible. The alternative—shared infrastructure with logical isolation—is cheaper upfront but riskier and harder to audit.
Principle 2: Least Privilege Access
Every agent, user, and service should have the minimum permissions required to do their job. If a workflow only needs to read customer data, it shouldn’t have write access. If a team only needs access to one database, it shouldn’t see others.
This requires:
- Granular RBAC: Roles defined at the resource level, not just the tool level.
- Time-bound permissions: Credentials expire; access is revoked automatically.
- Audit-first design: Every permission grant and revocation is logged.
Principle 3: Observability as a First-Class Concern
You can’t govern what you can’t see. Your MCP server must emit comprehensive logs, metrics, and traces. This means:
- Structured logging: Every request, response, and error is logged in a machine-readable format.
- Metrics: Request latency, error rates, token usage, cost—all tracked.
- Tracing: Distributed traces show the full path of a request from agent to backend.
This isn’t optional. It’s how you debug issues, detect anomalies, and prove compliance.
Principle 4: Explicit Governance Models
Governance shouldn’t be implicit or buried in code. It should be explicit, reviewable, and version-controlled. Your MCP server should enforce policies defined in configuration files or policy-as-code frameworks. Examples:
- Tool policies: Which teams can call which tools, under what conditions.
- Data policies: Which teams can access which datasets.
- Cost policies: Rate limits, quotas, and spending caps per team or workflow.
Multi-Tenant Design Patterns
Multi-tenancy in MCP servers comes in several flavours. Each has trade-offs.
Pattern 1: Shared Infrastructure, Logical Isolation
All tenants run on the same MCP server instance. Access control is enforced in code. This is the simplest to deploy but requires careful implementation.
Pros:
- Easier to deploy and operate.
- Lower infrastructure costs.
- Simpler to manage upgrades and patches.
Cons:
- A bug in access control affects all tenants.
- Performance issues in one tenant can affect others (noisy neighbour problem).
- Harder to audit data isolation.
When to use: Early-stage startups, low-risk internal tools, or proof-of-concepts.
Pattern 2: Containerised Isolation (Recommended)
Each tenant runs in its own container (Docker, Kubernetes pod). The MCP server is containerised, and orchestration (Kubernetes, ECS) manages the lifecycle. A reverse proxy or API gateway routes requests to the correct tenant container.
Pros:
- Strong isolation; a crash in one tenant doesn’t affect others.
- Easy to scale; add more containers as needed.
- Simpler to audit; each tenant’s logs are separate.
- Easier to enforce resource limits (CPU, memory) per tenant.
Cons:
- Higher infrastructure costs (more containers = more resources).
- More complex to operate (orchestration, networking).
- Slightly higher latency (routing overhead).
When to use: Most mid-market and enterprise deployments. This is the sweet spot for governance and scale.
Pattern 3: Serverless Isolation
Each tenant’s MCP server runs as a serverless function (AWS Lambda, Google Cloud Functions). A state store (Redis, DynamoDB) holds shared configuration.
Pros:
- Pay only for what you use.
- Automatic scaling; no capacity planning.
- Built-in isolation (each invocation is separate).
Cons:
- Cold start latency (first invocation is slow).
- Harder to maintain persistent connections.
- Vendor lock-in.
When to use: Low-frequency, bursty workloads. Workflows that run once a day or less frequently.
Pattern 4: Hybrid Isolation
Combine patterns. High-traffic tenants get dedicated containers. Low-traffic tenants share infrastructure. A control plane monitors traffic and auto-scales.
Pros:
- Cost-efficient for mixed workloads.
- Flexible; adjust isolation level as needs change.
Cons:
- Complex to implement and operate.
- Harder to reason about performance and costs.
When to use: Large enterprises with mixed tenant profiles (some high-volume, some low-volume).
Governance and Access Control
Governance is the heart of enterprise MCP servers. Without it, you have an access control problem. With it, you have a compliance asset.
Role-Based Access Control (RBAC)
Define roles at multiple levels:
Tool level: Which roles can call which tools.
Role: DataAnalyst
- read:customers
- read:transactions
- run:analytics_query
Role: SalesOps
- read:customers
- write:customer_notes
- read:deals
- write:deals
Resource level: Which roles can access which resources (databases, APIs, files).
Role: DataAnalyst
- resource:prod_analytics_db (read-only)
- resource:customer_data_warehouse (read-only)
Role: SalesOps
- resource:salesforce_api (write)
- resource:customer_crm_db (write)
Attribute level: Fine-grained access based on data attributes.
Role: RegionalManager
- read:customers where region = ${user.region}
- write:deals where region = ${user.region}
Implement RBAC as a policy engine within your MCP server. Enterprise MCP architecture patterns show that policy engines should be separate from business logic, making them easier to test, audit, and update.
Audit Logging
Every action must be logged. This includes:
- Who: The user, agent, or service making the request.
- What: The tool or resource being accessed.
- When: Timestamp of the action.
- Why: The reason (workflow ID, request context).
- How: The outcome (success, failure, partial success).
- Impact: What data was read, modified, or deleted.
Store logs immutably. Use a dedicated logging service (CloudWatch, Splunk, ELK stack) that tenants cannot tamper with. Structure logs as JSON for easy querying.
{
"timestamp": "2026-01-15T14:23:45Z",
"tenant_id": "acme-corp",
"user_id": "alice@acme.com",
"action": "tool_call",
"tool_name": "read_customer_database",
"resource": "customers_table",
"status": "success",
"records_returned": 42,
"duration_ms": 234,
"cost_usd": 0.05,
"workflow_id": "wf_12345",
"request_id": "req_67890"
}
Policy as Code
Store governance policies in version-controlled files (YAML, HCL, or JSON). Examples:
# policies/acme-corp.yaml
tenants:
- id: acme-corp
name: ACME Corporation
policies:
- name: data-analyst-access
roles: [data-analyst]
resources:
- analytics_db
- data_warehouse
permissions: [read]
- name: sales-ops-access
roles: [sales-ops]
resources:
- salesforce_api
- customer_crm_db
permissions: [read, write]
rate_limit: 1000/hour
- name: regional-isolation
roles: [regional-manager]
resources:
- customer_data
permissions: [read, write]
conditions:
- region == ${user.region}
quotas:
- role: data-analyst
monthly_api_calls: 1000000
monthly_cost_usd: 5000
Version control these policies. Review changes through a change management process. Audit who changed what, when, and why.
Security and Compliance
Enterprise MCP servers must be secure by design. This section covers the essentials.
Authentication and Authorisation
Authentication verifies identity. Authorisation verifies permissions.
For agents, use service account authentication:
- API keys: Simple but limited. Rotate frequently, store securely.
- OAuth 2.0: More complex but more flexible. Supports delegation and consent.
- Mutual TLS (mTLS): Client and server authenticate each other using certificates.
For human users (if your MCP server has a UI or API), use:
- SAML 2.0 or OIDC: Integrate with your identity provider (Okta, Azure AD, Google Workspace).
- Multi-factor authentication (MFA): Require MFA for sensitive operations.
Authorisation should be checked at every layer:
- Transport layer: Is the request from a known agent?
- API layer: Does the agent have permission to call this tool?
- Resource layer: Does the agent have permission to access this specific data?
- Row layer: Does the agent have permission to access these specific records?
Encryption
Encrypt data in transit and at rest.
In transit:
- Use TLS 1.3 for all connections.
- Implement certificate pinning to prevent man-in-the-middle attacks.
- Use mTLS for service-to-service communication.
At rest:
- Encrypt sensitive data in your database (e.g., customer PII, API keys).
- Use your cloud provider’s encryption service (AWS KMS, Google Cloud KMS).
- Manage encryption keys separately from data.
- Rotate keys regularly.
Secret Management
Never hardcode secrets. Use a secret manager:
- AWS Secrets Manager: For AWS deployments.
- HashiCorp Vault: For multi-cloud or on-premises.
- Azure Key Vault: For Azure deployments.
Your MCP server should fetch secrets at runtime, not at startup. This allows key rotation without restarting the server.
Rate Limiting and DDoS Protection
Protect your MCP server from abuse:
- Per-tenant rate limits: E.g., 1000 requests/hour per tenant.
- Per-user rate limits: E.g., 100 requests/hour per user.
- Per-tool rate limits: E.g., 50 concurrent calls to the database tool.
- Cost limits: Stop accepting requests once a tenant hits its monthly budget.
Implement rate limiting at the API gateway level (reverse proxy) and in the MCP server itself. Use distributed rate limiting (Redis) if running multiple instances.
Compliance: SOC 2 and ISO 27001
If you’re pursuing SOC 2 or ISO 27001 compliance, your MCP server is critical infrastructure. Auditors will scrutinise:
- Access control: Is access logged and enforced?
- Data isolation: Are tenants’ data truly isolated?
- Change management: How are code and policy changes reviewed and deployed?
- Incident response: What’s your process for detecting and responding to security incidents?
- Disaster recovery: Can you recover from data loss or service failure?
Build these controls into your MCP server from day one. Vanta-integrated security audits can help you map your MCP server to compliance requirements and close gaps.
Implementation Reference Architecture
Here’s a concrete, deployable architecture suitable for most mid-market and enterprise organisations.
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Claude Agent (or other AI model) │
└────────────────────┬────────────────────────────────────┘
│
│ HTTP/WebSocket (JSON-RPC)
▼
┌─────────────────────────────────────────────────────────┐
│ API Gateway (Authentication, Rate Limiting) │
│ - Validates API keys / OAuth tokens │
│ - Enforces rate limits per tenant │
│ - Routes to correct tenant instance │
└────────────────────┬────────────────────────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│Tenant │ │Tenant │ │Tenant │
│ MCP │ │ MCP │ │ MCP │
│Server │ │Server │ │Server │
│(ACME) │ │(BETA) │ │(GAMMA)│
└───┬───┘ └───┬───┘ └───┬───┘
│ │ │
│ (Policy Engine, Audit Logging, Tool Registry)
│
┌───┴─────────────────────────────┐
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ Tool Runtime │ │ Audit Log Store │
│ │ │ (Immutable) │
│ - Database │ │ │
│ - APIs │ │ CloudWatch / ELK │
│ - Services │ └──────────────────┘
└──────────────┘
│
▼
┌──────────────────────────────────────┐
│ Internal Services & Data Sources │
│ - Customer DB │
│ - Salesforce API │
│ - Analytics warehouse │
│ - Internal APIs │
└──────────────────────────────────────┘
Layer 1: API Gateway
The API gateway is your first line of defence. It should:
- Validate requests: Check that the request is well-formed JSON-RPC.
- Authenticate: Verify the API key or token.
- Route: Direct the request to the correct tenant’s MCP server.
- Rate limit: Enforce quotas per tenant, user, and tool.
- Log: Record the request (without sensitive data).
Use a production-grade API gateway:
- Kong: Open-source, widely used, excellent plugin ecosystem.
- AWS API Gateway: If you’re on AWS, integrates with IAM and CloudWatch.
- Envoy Proxy: High-performance, used by major cloud providers.
- nginx: Lightweight, battle-tested, good for on-premises.
Layer 2: Tenant MCP Servers
Each tenant runs its own MCP server instance. The server should:
- Authenticate requests: Double-check authentication (defence in depth).
- Enforce RBAC: Check that the user has permission to call the tool.
- Manage tools and resources: Register available tools, manage their lifecycle.
- Execute tools: Call the actual tool (database query, API call, etc.).
- Log everything: Structured logging of every action.
- Handle errors gracefully: Return clear error messages without leaking sensitive data.
Implement the MCP server in a language suited to your infrastructure:
- Python: Easy to write, good libraries, slower at scale.
- Go: Fast, good concurrency, simple deployment.
- TypeScript/Node.js: Good for teams already using JavaScript.
- Rust: Maximum performance and safety, steeper learning curve.
Example structure (pseudocode):
from mcp.server import MCPServer
from mcp.types import Tool, Resource
from policy_engine import PolicyEngine
from audit_logger import AuditLogger
class TenantMCPServer(MCPServer):
def __init__(self, tenant_id, config):
self.tenant_id = tenant_id
self.policy_engine = PolicyEngine(tenant_id)
self.audit_logger = AuditLogger(tenant_id)
self.tools = {}
self.load_tools(config)
def load_tools(self, config):
# Register available tools from config
for tool_config in config['tools']:
tool = Tool(
name=tool_config['name'],
description=tool_config['description'],
handler=self.create_tool_handler(tool_config)
)
self.tools[tool.name] = tool
def call_tool(self, user_id, tool_name, args):
# 1. Check RBAC
if not self.policy_engine.can_call_tool(user_id, tool_name):
self.audit_logger.log_denied_access(user_id, tool_name)
raise PermissionError(f"User {user_id} cannot call {tool_name}")
# 2. Execute tool
try:
result = self.tools[tool_name].handler(args)
self.audit_logger.log_success(user_id, tool_name, result)
return result
except Exception as e:
self.audit_logger.log_error(user_id, tool_name, str(e))
raise
Layer 3: Policy Engine
The policy engine is the brain of access control. It should:
- Load policies: From configuration files or a policy store.
- Evaluate policies: Given a user, tool, and context, determine if access is allowed.
- Support conditions: E.g., “allow read access to customers where region = ${user.region}”.
- Cache decisions: For performance, cache policy decisions (with TTL).
Implement using a policy-as-code framework:
- Open Policy Agent (OPA): Industry standard, excellent for complex policies.
- Rego: OPA’s policy language, very expressive.
- AWS IAM: If using AWS, leverage native IAM policies.
Example OPA policy:
package mcp.authz
# Allow data analysts to read analytics data
allow[msg] {
input.user.role == "data_analyst"
input.tool == "read_analytics"
msg := "allowed"
}
# Allow regional managers to access their region's data
allow[msg] {
input.user.role == "regional_manager"
input.tool == "read_customer_data"
input.customer.region == input.user.region
msg := "allowed"
}
# Deny by default
default allow = false
Layer 4: Audit Logging
Log everything, immutably. Use a dedicated logging service:
class AuditLogger:
def __init__(self, tenant_id):
self.tenant_id = tenant_id
self.logger = get_cloudwatch_logger(tenant_id)
def log_success(self, user_id, tool_name, result):
self.logger.info({
"timestamp": datetime.utcnow().isoformat(),
"tenant_id": self.tenant_id,
"user_id": user_id,
"action": "tool_call",
"tool_name": tool_name,
"status": "success",
"result_size": len(result),
"cost_usd": self.estimate_cost(tool_name, result)
})
def log_denied_access(self, user_id, tool_name):
self.logger.warn({
"timestamp": datetime.utcnow().isoformat(),
"tenant_id": self.tenant_id,
"user_id": user_id,
"action": "denied_access",
"tool_name": tool_name,
"status": "denied",
"reason": "permission_check_failed"
})
Real-World Deployment Patterns
Now let’s look at how to actually deploy this architecture.
Deployment Option 1: Kubernetes (Recommended for Scale)
Deploy each tenant’s MCP server as a Kubernetes pod. Use a Helm chart to manage deployments.
# helm/mcp-server/values.yaml
tenants:
acme-corp:
enabled: true
replicas: 3
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
beta-inc:
enabled: true
replicas: 2
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
ingress:
enabled: true
className: nginx
hosts:
- host: mcp.yourcompany.com
paths:
- path: /
pathType: Prefix
logging:
enabled: true
provider: cloudwatch
region: us-east-1
Use a service mesh (Istio, Linkerd) for traffic management, observability, and security.
Deployment Option 2: AWS Lambda + API Gateway (Cost-Optimised)
For lower-traffic deployments, use serverless:
# lambda_handler.py
import json
from mcp_server import TenantMCPServer
servers = {} # Cache servers
def lambda_handler(event, context):
# Extract tenant from request path
tenant_id = event['pathParameters']['tenant_id']
# Get or create server for this tenant
if tenant_id not in servers:
servers[tenant_id] = TenantMCPServer(tenant_id)
server = servers[tenant_id]
# Parse request
body = json.loads(event['body'])
# Call tool
try:
result = server.call_tool(
user_id=event['headers'].get('x-user-id'),
tool_name=body['method'],
args=body['params']
)
return {
'statusCode': 200,
'body': json.dumps({'result': result})
}
except Exception as e:
return {
'statusCode': 400,
'body': json.dumps({'error': str(e)})
}
Pair with DynamoDB for state (policies, tenant config) and S3 for audit logs.
Deployment Option 3: Docker Compose (Development)
For local development and testing:
# docker-compose.yml
version: '3.8'
services:
api-gateway:
image: kong:latest
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: postgres
ports:
- "8000:8000"
depends_on:
- postgres
mcp-server-acme:
build: ./mcp-server
environment:
TENANT_ID: acme-corp
LOG_LEVEL: debug
ports:
- "3001:3000"
mcp-server-beta:
build: ./mcp-server
environment:
TENANT_ID: beta-inc
LOG_LEVEL: debug
ports:
- "3002:3000"
postgres:
image: postgres:15
environment:
POSTGRES_PASSWORD: postgres
volumes:
- postgres_data:/var/lib/postgresql/data
audit-logger:
image: elasticsearch:8.0
environment:
discovery.type: single-node
ports:
- "9200:9200"
volumes:
postgres_data:
Monitoring, Observability, and SLAs
You can’t run enterprise infrastructure without visibility. Here’s what you need to monitor.
Key Metrics
Availability:
- Uptime per tenant (target: 99.9%)
- Error rate per tool (target: < 0.1%)
- Latency percentiles (p50, p95, p99)
Performance:
- Request latency (tool-specific)
- Throughput (requests/second)
- Queue depth (pending requests)
Business:
- Cost per request (API calls, compute, storage)
- Cost per tenant (monthly)
- Token usage (for language models)
Security:
- Failed authentication attempts
- Denied access attempts (policy violations)
- Rate limit violations
- Anomalous access patterns
Implementation
Use a monitoring stack:
# monitoring/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'mcp-servers'
static_configs:
- targets: ['localhost:9090']
metrics_path: '/metrics'
- job_name: 'api-gateway'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
Visualize with Grafana:
{
"dashboard": "MCP Server Health",
"panels": [
{
"title": "Request Latency (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, mcp_request_duration_seconds)"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "rate(mcp_errors_total[5m])"
}
]
},
{
"title": "Cost per Tenant",
"targets": [
{
"expr": "sum(mcp_cost_usd) by (tenant_id)"
}
]
}
]
}
SLAs
Define and commit to service level agreements:
Service Level Agreement (SLA) for Enterprise MCP Servers
1. Availability
- Target: 99.9% uptime per calendar month
- Measurement: Successful requests / Total requests
- Exclusions: Scheduled maintenance (4 hours/month), customer-caused outages
2. Latency
- Target: p95 latency < 500ms for tool calls
- Measurement: Time from request receipt to response sent
- Excludes time spent in backend systems
3. Error Rate
- Target: < 0.1% error rate (excluding customer errors)
- Measurement: 5xx errors / Total requests
4. Support Response Time
- P1 (service down): 15 minutes
- P2 (degraded): 1 hour
- P3 (minor issue): 4 hours
5. Credits
- 99.0–99.9% uptime: 10% monthly credit
- 95.0–99.0% uptime: 25% monthly credit
- < 95.0% uptime: 100% monthly credit
Track SLA compliance in your monitoring system. Alert when you’re at risk of missing an SLA.
Common Pitfalls and How to Avoid Them
Pitfall 1: Insufficient Audit Logging
Problem: You log some actions but not others. When a security incident occurs, you can’t reconstruct what happened.
Solution: Log everything. Every request, response, and error. Make logging non-optional. If a request completes without being logged, that’s a bug.
# Wrap all tool calls with logging
def call_tool_with_logging(user_id, tool_name, args):
start_time = time.time()
request_id = generate_request_id()
try:
result = call_tool(user_id, tool_name, args)
duration = time.time() - start_time
log_event({
'request_id': request_id,
'status': 'success',
'duration_ms': duration * 1000,
'result_size': len(result)
})
return result
except Exception as e:
duration = time.time() - start_time
log_event({
'request_id': request_id,
'status': 'error',
'duration_ms': duration * 1000,
'error': str(e)
})
raise
Pitfall 2: Shared Secrets
Problem: API keys are shared across teams or hardcoded in repositories. When one team’s key is compromised, you have to rotate keys for everyone.
Solution: Each agent/team gets its own API key. Rotate keys regularly. Use a secret manager.
Pitfall 3: No Rate Limiting
Problem: One agent goes haywire and makes 10 million requests, exhausting your budget and bringing down the service for others.
Solution: Implement rate limiting at multiple layers. Set quotas per tenant, per user, per tool. Monitor for anomalies.
Pitfall 4: Insufficient Testing
Problem: You deploy a policy change and accidentally lock out half your users.
Solution: Test policies in a staging environment. Use policy simulation tools (OPA has a REPL). Require code review for policy changes. Gradually roll out changes (canary deployments).
Pitfall 5: Tight Coupling to Specific Tools
Problem: Your MCP server is tightly integrated with Salesforce API v1. When Salesforce upgrades to v2, you have to rewrite the server.
Solution: Decouple tool implementations from the MCP server. Use adapters. Define tool interfaces abstractly. Make it easy to swap tool implementations.
class ToolAdapter:
"""Abstract interface for tools"""
def call(self, args) -> dict:
raise NotImplementedError
class SalesforceAdapter(ToolAdapter):
def __init__(self, api_version):
self.api_version = api_version
def call(self, args):
# Implementation specific to this version
pass
# Easy to swap implementations
tools = {
'read_deals': SalesforceAdapter(api_version='v2'),
'write_customer_notes': SalesforceAdapter(api_version='v2')
}
Pitfall 6: Ignoring Cost
Problem: You don’t track costs. Suddenly you’re spending $50k/month on API calls and you don’t know why.
Solution: Track cost for every action. Attribute costs to tenants and tools. Set budgets and alert when approaching limits. Optimize expensive operations.
def call_tool_with_cost_tracking(user_id, tool_name, args):
# Estimate cost before calling
estimated_cost = estimate_cost(tool_name, args)
# Check budget
remaining_budget = get_tenant_budget(user_id) - get_tenant_spend(user_id)
if estimated_cost > remaining_budget:
raise BudgetExceededError(f"Estimated cost ${estimated_cost} exceeds remaining budget ${remaining_budget}")
# Call tool and track actual cost
result = call_tool(user_id, tool_name, args)
actual_cost = result.get('cost_usd', estimated_cost)
log_cost(user_id, tool_name, actual_cost)
return result
Getting Started: Your Next Steps
You now have a comprehensive reference architecture for enterprise MCP servers. Here’s how to implement it.
Step 1: Assess Your Current State
Before building, understand where you are:
- How many teams need AI agent access?
- What internal tools and data sources need to be exposed?
- What are your compliance requirements (SOC 2, ISO 27001)?
- What’s your current infrastructure (cloud provider, on-premises, hybrid)?
- What’s your team’s expertise (infrastructure, security, AI)?
Document this in a brief architecture decision record (ADR).
Step 2: Choose Your Deployment Model
Based on your assessment:
- Small team, low complexity: Start with shared infrastructure (Pattern 1). Easy to deploy, sufficient for proof-of-concept.
- Multiple teams, growth trajectory: Use containerised isolation (Pattern 2). Scalable, good governance, manageable complexity.
- Large enterprise, high volume: Hybrid isolation (Pattern 4) or dedicated infrastructure per tenant. Maximum control and audit-ability.
If you’re pursuing SOC 2 or ISO 27001 compliance, skip Pattern 1. Auditors expect isolation and comprehensive logging, which are easier with containerisation.
Step 3: Build a Proof-of-Concept
Start small. Pick one internal tool (e.g., a read-only database query tool) and build an MCP server that exposes it safely to Claude.
# poc/simple_mcp_server.py
from mcp.server import MCPServer
from mcp.types import Tool
import psycopg2
server = MCPServer("poc")
@server.tool(name="query_analytics")
def query_analytics(query: str):
"""Run a read-only query against the analytics database."""
# Validate query (no INSERT, UPDATE, DELETE)
if any(keyword in query.upper() for keyword in ['INSERT', 'UPDATE', 'DELETE', 'DROP']):
raise ValueError("Write operations not allowed")
# Execute query
conn = psycopg2.connect("dbname=analytics user=readonly")
cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
cursor.close()
conn.close()
return {"rows": results, "count": len(results)}
if __name__ == "__main__":
server.run()
Test this with Claude. Ask it to run analytics queries. Verify that it works and that access is logged.
Step 4: Implement Governance
Once the PoC works, add governance:
- Define roles for your organisation (DataAnalyst, SalesOps, etc.).
- Write policies (who can access what).
- Implement RBAC in your MCP server.
- Set up audit logging.
- Test that policies are enforced.
Step 5: Scale to Production
When you’re confident in the architecture:
- Choose your deployment platform (Kubernetes, Lambda, etc.).
- Set up infrastructure (API gateway, logging, monitoring).
- Implement security controls (TLS, secret management, rate limiting).
- Write runbooks for common operations (adding a tenant, rotating keys, responding to incidents).
- Plan for compliance audits (SOC 2, ISO 27001).
Step 6: Optimise and Iterate
Production is where learning happens:
- Monitor costs. Identify expensive tools and optimise them.
- Monitor latency. Identify bottlenecks and fix them.
- Monitor errors. Identify failure modes and add resilience.
- Gather feedback from teams using the MCP server. Iterate on the design.
Conclusion
Enterprise MCP servers are not a nice-to-have. If you’re deploying agentic AI across your organisation, they’re essential. They’re how you govern access, ensure compliance, and maintain security at scale.
This reference architecture gives you a proven foundation. It’s been battle-tested by teams at PADISO and other Sydney-based and international organisations building AI-driven platforms. The patterns—containerised isolation, policy-as-code, immutable audit logging—are industry best practices.
The key insight is this: governance is not a constraint; it’s an enabler. When you have clear policies, comprehensive logging, and enforced access control, teams move faster. They don’t waste time on ad-hoc security reviews. They don’t worry about accidentally accessing data they shouldn’t. They build with confidence.
If you’re running agentic AI across your organisation, or planning to, start building your enterprise MCP server now. The sooner you establish governance, the easier it is to scale. And if you’re navigating AI strategy and readiness, an enterprise MCP server is a foundational piece of your AI infrastructure.
For teams in Sydney or Australia looking for hands-on support, PADISO’s CTO as a Service and platform engineering services can help you design, build, and operate enterprise MCP servers. We’ve helped 50+ clients across seed-stage startups and mid-market enterprises implement governed AI infrastructure. If you’re ready to move from proof-of-concept to production-grade enterprise MCP servers, let’s talk.
Start with the PoC. Build the governance layer. Scale to production. Measure everything. Iterate relentlessly. That’s how you build enterprise-grade AI infrastructure that teams trust and auditors approve.