Guide 5 mins

Enterprise MCP Servers: A Reference Architecture

Build governed, multi-tenant MCP servers for safe Claude agent access to internal tools. Reference architecture, design patterns, and implementation guide.

Padiso Team ·2026-04-17

Enterprise MCP Servers: A Reference Architecture

What Enterprise MCP Servers Are
Why MCP Matters for Large Organisations
Core Architecture Principles
Multi-Tenant Design Patterns
Governance and Access Control
Security and Compliance
Implementation Reference Architecture
Real-World Deployment Patterns
Monitoring, Observability, and SLAs
Common Pitfalls and How to Avoid Them
Getting Started: Your Next Steps

What Enterprise MCP Servers Are

The Model Context Protocol (MCP) is a standardised interface that lets AI agents safely and reliably access your internal tools, databases, and services. Think of it as a translation layer between Claude (or other AI models) and your company’s backend systems. Instead of giving an agent direct access to your infrastructure—which would be dangerous—you create an MCP server that sits in the middle, enforcing governance rules, rate limits, and audit trails.

At its core, the MCP architecture defines a client-server relationship. The AI agent is the client; your MCP server is the endpoint that exposes tools, resources, and prompts. This separation of concerns is critical for enterprises. It means you can upgrade your AI models, change your backend systems, or adjust governance rules without breaking the integration.

When you’re running multiple teams, business units, or customer instances across your organisation, a single monolithic MCP server won’t cut it. You need a multi-tenant reference architecture that isolates data, enforces role-based access control (RBAC), logs every action, and scales horizontally. That’s what this guide covers.

Enterprise MCP servers differ fundamentally from single-use integrations. They’re infrastructure. They need to be reliable, auditable, and maintainable. They must support teams building AI-driven workflows, automations, and decision-support systems without creating security or compliance nightmares. If you’re running agentic AI across your organisation, governance is non-negotiable.

Why MCP Matters for Large Organisations

Large organisations face a specific problem: they have dozens of internal tools, legacy systems, and data sources. Employees and AI agents need access to these systems, but uncontrolled access creates risk. Every tool integration is a potential attack surface. Every data access is a compliance liability.

Traditional approaches—API keys scattered across Slack, direct database connections, custom integrations for each tool—don’t scale and create audit nightmares. They also lock you into specific AI vendors. If you decide to switch from Claude to another model, you’ve rebuilt everything.

MCP solves this by establishing a standard protocol. Your MCP server becomes the single source of truth for what tools are available, who can access them, and what they can do. This is especially valuable when you’re pursuing SOC 2 or ISO 27001 compliance. Auditors want to see governed access, clear audit trails, and enforced controls. MCP servers, when built correctly, provide exactly that.

Consider the scale challenge: a mid-market company might have 50 teams, each wanting to build AI workflows. A large enterprise might have 500+. Without a reference architecture, you end up with 50 or 500 ad-hoc integrations, each with different security models, logging approaches, and failure modes. An enterprise MCP server framework lets you build once and reuse 500 times.

The business case is compelling. Teams ship faster because they don’t rebuild authentication and governance for each workflow. Security teams sleep better because access is logged and enforced centrally. Compliance teams pass audits because the audit trail is comprehensive. And when you need to modernise with agentic AI, you have a proven, scalable foundation.

Core Architecture Principles

Before diving into specifics, let’s establish the principles that should guide your enterprise MCP server design.

Principle 1: Isolation by Design

Tenants (teams, business units, or customers) must be isolated at every layer. If one tenant’s workflow crashes, it shouldn’t affect others. If one tenant’s data is compromised, others remain protected. This means:

Data isolation: Each tenant’s data is stored separately, never mixed in shared tables.
Compute isolation: Ideally, each tenant runs on isolated compute (separate containers, processes, or even servers).
Network isolation: Tenants’ traffic is segregated; one tenant cannot sniff another’s requests.

This sounds expensive, but modern container orchestration and cloud platforms make it feasible. The alternative—shared infrastructure with logical isolation—is cheaper upfront but riskier and harder to audit.

Principle 2: Least Privilege Access

Every agent, user, and service should have the minimum permissions required to do their job. If a workflow only needs to read customer data, it shouldn’t have write access. If a team only needs access to one database, it shouldn’t see others.

This requires:

Granular RBAC: Roles defined at the resource level, not just the tool level.
Time-bound permissions: Credentials expire; access is revoked automatically.
Audit-first design: Every permission grant and revocation is logged.

Principle 3: Observability as a First-Class Concern

You can’t govern what you can’t see. Your MCP server must emit comprehensive logs, metrics, and traces. This means:

Structured logging: Every request, response, and error is logged in a machine-readable format.
Metrics: Request latency, error rates, token usage, cost—all tracked.
Tracing: Distributed traces show the full path of a request from agent to backend.

This isn’t optional. It’s how you debug issues, detect anomalies, and prove compliance.

Principle 4: Explicit Governance Models

Governance shouldn’t be implicit or buried in code. It should be explicit, reviewable, and version-controlled. Your MCP server should enforce policies defined in configuration files or policy-as-code frameworks. Examples:

Tool policies: Which teams can call which tools, under what conditions.
Data policies: Which teams can access which datasets.
Cost policies: Rate limits, quotas, and spending caps per team or workflow.

Multi-Tenant Design Patterns

Multi-tenancy in MCP servers comes in several flavours. Each has trade-offs.

Pattern 1: Shared Infrastructure, Logical Isolation

All tenants run on the same MCP server instance. Access control is enforced in code. This is the simplest to deploy but requires careful implementation.

Pros:

Easier to deploy and operate.
Lower infrastructure costs.
Simpler to manage upgrades and patches.

Cons:

A bug in access control affects all tenants.
Performance issues in one tenant can affect others (noisy neighbour problem).
Harder to audit data isolation.

When to use: Early-stage startups, low-risk internal tools, or proof-of-concepts.

Pattern 2: Containerised Isolation (Recommended)

Each tenant runs in its own container (Docker, Kubernetes pod). The MCP server is containerised, and orchestration (Kubernetes, ECS) manages the lifecycle. A reverse proxy or API gateway routes requests to the correct tenant container.

Pros:

Strong isolation; a crash in one tenant doesn’t affect others.
Easy to scale; add more containers as needed.
Simpler to audit; each tenant’s logs are separate.
Easier to enforce resource limits (CPU, memory) per tenant.

Cons:

Higher infrastructure costs (more containers = more resources).
More complex to operate (orchestration, networking).
Slightly higher latency (routing overhead).

When to use: Most mid-market and enterprise deployments. This is the sweet spot for governance and scale.

Pattern 3: Serverless Isolation

Each tenant’s MCP server runs as a serverless function (AWS Lambda, Google Cloud Functions). A state store (Redis, DynamoDB) holds shared configuration.

Pros:

Pay only for what you use.
Automatic scaling; no capacity planning.
Built-in isolation (each invocation is separate).

Cons:

Cold start latency (first invocation is slow).
Harder to maintain persistent connections.
Vendor lock-in.

When to use: Low-frequency, bursty workloads. Workflows that run once a day or less frequently.

Pattern 4: Hybrid Isolation

Combine patterns. High-traffic tenants get dedicated containers. Low-traffic tenants share infrastructure. A control plane monitors traffic and auto-scales.

Pros:

Cost-efficient for mixed workloads.
Flexible; adjust isolation level as needs change.

Cons:

Complex to implement and operate.
Harder to reason about performance and costs.

When to use: Large enterprises with mixed tenant profiles (some high-volume, some low-volume).

Governance and Access Control

Governance is the heart of enterprise MCP servers. Without it, you have an access control problem. With it, you have a compliance asset.

Role-Based Access Control (RBAC)

Define roles at multiple levels:

Tool level: Which roles can call which tools.

Role: DataAnalyst
  - read:customers
  - read:transactions
  - run:analytics_query

Role: SalesOps
  - read:customers
  - write:customer_notes
  - read:deals
  - write:deals

Resource level: Which roles can access which resources (databases, APIs, files).

Role: DataAnalyst
  - resource:prod_analytics_db (read-only)
  - resource:customer_data_warehouse (read-only)

Role: SalesOps
  - resource:salesforce_api (write)
  - resource:customer_crm_db (write)

Attribute level: Fine-grained access based on data attributes.

Role: RegionalManager
  - read:customers where region = ${user.region}
  - write:deals where region = ${user.region}

Implement RBAC as a policy engine within your MCP server. Enterprise MCP architecture patterns show that policy engines should be separate from business logic, making them easier to test, audit, and update.

Audit Logging

Every action must be logged. This includes:

Who: The user, agent, or service making the request.
What: The tool or resource being accessed.
When: Timestamp of the action.
Why: The reason (workflow ID, request context).
How: The outcome (success, failure, partial success).
Impact: What data was read, modified, or deleted.

Store logs immutably. Use a dedicated logging service (CloudWatch, Splunk, ELK stack) that tenants cannot tamper with. Structure logs as JSON for easy querying.

{
  "timestamp": "2026-01-15T14:23:45Z",
  "tenant_id": "acme-corp",
  "user_id": "alice@acme.com",
  "action": "tool_call",
  "tool_name": "read_customer_database",
  "resource": "customers_table",
  "status": "success",
  "records_returned": 42,
  "duration_ms": 234,
  "cost_usd": 0.05,
  "workflow_id": "wf_12345",
  "request_id": "req_67890"
}

Policy as Code

Store governance policies in version-controlled files (YAML, HCL, or JSON). Examples:

# policies/acme-corp.yaml
tenants:
  - id: acme-corp
    name: ACME Corporation
    
policies:
  - name: data-analyst-access
    roles: [data-analyst]
    resources:
      - analytics_db
      - data_warehouse
    permissions: [read]
    
  - name: sales-ops-access
    roles: [sales-ops]
    resources:
      - salesforce_api
      - customer_crm_db
    permissions: [read, write]
    rate_limit: 1000/hour
    
  - name: regional-isolation
    roles: [regional-manager]
    resources:
      - customer_data
    permissions: [read, write]
    conditions:
      - region == ${user.region}
    
quotas:
  - role: data-analyst
    monthly_api_calls: 1000000
    monthly_cost_usd: 5000

Version control these policies. Review changes through a change management process. Audit who changed what, when, and why.

Security and Compliance

Enterprise MCP servers must be secure by design. This section covers the essentials.

Authentication and Authorisation

Authentication verifies identity. Authorisation verifies permissions.

For agents, use service account authentication:

API keys: Simple but limited. Rotate frequently, store securely.
OAuth 2.0: More complex but more flexible. Supports delegation and consent.
Mutual TLS (mTLS): Client and server authenticate each other using certificates.

For human users (if your MCP server has a UI or API), use:

SAML 2.0 or OIDC: Integrate with your identity provider (Okta, Azure AD, Google Workspace).
Multi-factor authentication (MFA): Require MFA for sensitive operations.

Authorisation should be checked at every layer:

Transport layer: Is the request from a known agent?
API layer: Does the agent have permission to call this tool?
Resource layer: Does the agent have permission to access this specific data?
Row layer: Does the agent have permission to access these specific records?

Encryption

Encrypt data in transit and at rest.

In transit:

Use TLS 1.3 for all connections.
Implement certificate pinning to prevent man-in-the-middle attacks.
Use mTLS for service-to-service communication.

At rest:

Encrypt sensitive data in your database (e.g., customer PII, API keys).
Use your cloud provider’s encryption service (AWS KMS, Google Cloud KMS).
Manage encryption keys separately from data.
Rotate keys regularly.

Secret Management

Never hardcode secrets. Use a secret manager:

AWS Secrets Manager: For AWS deployments.
HashiCorp Vault: For multi-cloud or on-premises.
Azure Key Vault: For Azure deployments.

Your MCP server should fetch secrets at runtime, not at startup. This allows key rotation without restarting the server.

Rate Limiting and DDoS Protection

Protect your MCP server from abuse:

Per-tenant rate limits: E.g., 1000 requests/hour per tenant.
Per-user rate limits: E.g., 100 requests/hour per user.
Per-tool rate limits: E.g., 50 concurrent calls to the database tool.
Cost limits: Stop accepting requests once a tenant hits its monthly budget.

Implement rate limiting at the API gateway level (reverse proxy) and in the MCP server itself. Use distributed rate limiting (Redis) if running multiple instances.

Compliance: SOC 2 and ISO 27001

If you’re pursuing SOC 2 or ISO 27001 compliance, your MCP server is critical infrastructure. Auditors will scrutinise:

Access control: Is access logged and enforced?
Data isolation: Are tenants’ data truly isolated?
Change management: How are code and policy changes reviewed and deployed?
Incident response: What’s your process for detecting and responding to security incidents?
Disaster recovery: Can you recover from data loss or service failure?

Build these controls into your MCP server from day one. Vanta-integrated security audits can help you map your MCP server to compliance requirements and close gaps.

Implementation Reference Architecture

Here’s a concrete, deployable architecture suitable for most mid-market and enterprise organisations.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│ Claude Agent (or other AI model)                        │
└────────────────────┬────────────────────────────────────┘
                     │
                     │ HTTP/WebSocket (JSON-RPC)
                     ▼
┌─────────────────────────────────────────────────────────┐
│ API Gateway (Authentication, Rate Limiting)            │
│ - Validates API keys / OAuth tokens                    │
│ - Enforces rate limits per tenant                      │
│ - Routes to correct tenant instance                    │
└────────────────────┬────────────────────────────────────┘
                     │
        ┌────────────┼────────────┐
        ▼            ▼            ▼
    ┌───────┐   ┌───────┐   ┌───────┐
    │Tenant │   │Tenant │   │Tenant │
    │  MCP  │   │  MCP  │   │  MCP  │
    │Server │   │Server │   │Server │
    │(ACME) │   │(BETA) │   │(GAMMA)│
    └───┬───┘   └───┬───┘   └───┬───┘
        │           │           │
        │ (Policy Engine, Audit Logging, Tool Registry)
        │
    ┌───┴─────────────────────────────┐
    ▼                                   ▼
┌──────────────┐              ┌──────────────────┐
│ Tool Runtime │              │ Audit Log Store  │
│              │              │ (Immutable)      │
│ - Database   │              │                  │
│ - APIs       │              │ CloudWatch / ELK │
│ - Services   │              └──────────────────┘
└──────────────┘
    │
    ▼
┌──────────────────────────────────────┐
│ Internal Services & Data Sources     │
│ - Customer DB                        │
│ - Salesforce API                     │
│ - Analytics warehouse                │
│ - Internal APIs                      │
└──────────────────────────────────────┘

Layer 1: API Gateway

The API gateway is your first line of defence. It should:

Validate requests: Check that the request is well-formed JSON-RPC.
Authenticate: Verify the API key or token.
Route: Direct the request to the correct tenant’s MCP server.
Rate limit: Enforce quotas per tenant, user, and tool.
Log: Record the request (without sensitive data).

Use a production-grade API gateway:

Kong: Open-source, widely used, excellent plugin ecosystem.
AWS API Gateway: If you’re on AWS, integrates with IAM and CloudWatch.
Envoy Proxy: High-performance, used by major cloud providers.
nginx: Lightweight, battle-tested, good for on-premises.

Layer 2: Tenant MCP Servers

Each tenant runs its own MCP server instance. The server should:

Authenticate requests: Double-check authentication (defence in depth).
Enforce RBAC: Check that the user has permission to call the tool.
Manage tools and resources: Register available tools, manage their lifecycle.
Execute tools: Call the actual tool (database query, API call, etc.).
Log everything: Structured logging of every action.
Handle errors gracefully: Return clear error messages without leaking sensitive data.

Implement the MCP server in a language suited to your infrastructure:

Python: Easy to write, good libraries, slower at scale.
Go: Fast, good concurrency, simple deployment.
TypeScript/Node.js: Good for teams already using JavaScript.
Rust: Maximum performance and safety, steeper learning curve.

Example structure (pseudocode):

from mcp.server import MCPServer
from mcp.types import Tool, Resource
from policy_engine import PolicyEngine
from audit_logger import AuditLogger

class TenantMCPServer(MCPServer):
    def __init__(self, tenant_id, config):
        self.tenant_id = tenant_id
        self.policy_engine = PolicyEngine(tenant_id)
        self.audit_logger = AuditLogger(tenant_id)
        self.tools = {}
        self.load_tools(config)
    
    def load_tools(self, config):
        # Register available tools from config
        for tool_config in config['tools']:
            tool = Tool(
                name=tool_config['name'],
                description=tool_config['description'],
                handler=self.create_tool_handler(tool_config)
            )
            self.tools[tool.name] = tool
    
    def call_tool(self, user_id, tool_name, args):
        # 1. Check RBAC
        if not self.policy_engine.can_call_tool(user_id, tool_name):
            self.audit_logger.log_denied_access(user_id, tool_name)
            raise PermissionError(f"User {user_id} cannot call {tool_name}")
        
        # 2. Execute tool
        try:
            result = self.tools[tool_name].handler(args)
            self.audit_logger.log_success(user_id, tool_name, result)
            return result
        except Exception as e:
            self.audit_logger.log_error(user_id, tool_name, str(e))
            raise

Layer 3: Policy Engine

The policy engine is the brain of access control. It should:

Load policies: From configuration files or a policy store.
Evaluate policies: Given a user, tool, and context, determine if access is allowed.
Support conditions: E.g., “allow read access to customers where region = ${user.region}”.
Cache decisions: For performance, cache policy decisions (with TTL).

Implement using a policy-as-code framework:

Open Policy Agent (OPA): Industry standard, excellent for complex policies.
Rego: OPA’s policy language, very expressive.
AWS IAM: If using AWS, leverage native IAM policies.

Example OPA policy:

package mcp.authz

# Allow data analysts to read analytics data
allow[msg] {
    input.user.role == "data_analyst"
    input.tool == "read_analytics"
    msg := "allowed"
}

# Allow regional managers to access their region's data
allow[msg] {
    input.user.role == "regional_manager"
    input.tool == "read_customer_data"
    input.customer.region == input.user.region
    msg := "allowed"
}

# Deny by default
default allow = false

Layer 4: Audit Logging

Log everything, immutably. Use a dedicated logging service:

class AuditLogger:
    def __init__(self, tenant_id):
        self.tenant_id = tenant_id
        self.logger = get_cloudwatch_logger(tenant_id)
    
    def log_success(self, user_id, tool_name, result):
        self.logger.info({
            "timestamp": datetime.utcnow().isoformat(),
            "tenant_id": self.tenant_id,
            "user_id": user_id,
            "action": "tool_call",
            "tool_name": tool_name,
            "status": "success",
            "result_size": len(result),
            "cost_usd": self.estimate_cost(tool_name, result)
        })
    
    def log_denied_access(self, user_id, tool_name):
        self.logger.warn({
            "timestamp": datetime.utcnow().isoformat(),
            "tenant_id": self.tenant_id,
            "user_id": user_id,
            "action": "denied_access",
            "tool_name": tool_name,
            "status": "denied",
            "reason": "permission_check_failed"
        })

Real-World Deployment Patterns

Now let’s look at how to actually deploy this architecture.

Deployment Option 1: Kubernetes (Recommended for Scale)

Deploy each tenant’s MCP server as a Kubernetes pod. Use a Helm chart to manage deployments.

# helm/mcp-server/values.yaml
tenants:
  acme-corp:
    enabled: true
    replicas: 3
    resources:
      requests:
        cpu: 500m
        memory: 512Mi
      limits:
        cpu: 2000m
        memory: 2Gi
    
  beta-inc:
    enabled: true
    replicas: 2
    resources:
      requests:
        cpu: 250m
        memory: 256Mi
      limits:
        cpu: 1000m
        memory: 1Gi

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: mcp.yourcompany.com
      paths:
        - path: /
          pathType: Prefix

logging:
  enabled: true
  provider: cloudwatch
  region: us-east-1

Use a service mesh (Istio, Linkerd) for traffic management, observability, and security.

Deployment Option 2: AWS Lambda + API Gateway (Cost-Optimised)

For lower-traffic deployments, use serverless:

# lambda_handler.py
import json
from mcp_server import TenantMCPServer

servers = {}  # Cache servers

def lambda_handler(event, context):
    # Extract tenant from request path
    tenant_id = event['pathParameters']['tenant_id']
    
    # Get or create server for this tenant
    if tenant_id not in servers:
        servers[tenant_id] = TenantMCPServer(tenant_id)
    
    server = servers[tenant_id]
    
    # Parse request
    body = json.loads(event['body'])
    
    # Call tool
    try:
        result = server.call_tool(
            user_id=event['headers'].get('x-user-id'),
            tool_name=body['method'],
            args=body['params']
        )
        return {
            'statusCode': 200,
            'body': json.dumps({'result': result})
        }
    except Exception as e:
        return {
            'statusCode': 400,
            'body': json.dumps({'error': str(e)})
        }

Pair with DynamoDB for state (policies, tenant config) and S3 for audit logs.

Deployment Option 3: Docker Compose (Development)

For local development and testing:

# docker-compose.yml
version: '3.8'

services:
  api-gateway:
    image: kong:latest
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: postgres
    ports:
      - "8000:8000"
    depends_on:
      - postgres
  
  mcp-server-acme:
    build: ./mcp-server
    environment:
      TENANT_ID: acme-corp
      LOG_LEVEL: debug
    ports:
      - "3001:3000"
  
  mcp-server-beta:
    build: ./mcp-server
    environment:
      TENANT_ID: beta-inc
      LOG_LEVEL: debug
    ports:
      - "3002:3000"
  
  postgres:
    image: postgres:15
    environment:
      POSTGRES_PASSWORD: postgres
    volumes:
      - postgres_data:/var/lib/postgresql/data
  
  audit-logger:
    image: elasticsearch:8.0
    environment:
      discovery.type: single-node
    ports:
      - "9200:9200"

volumes:
  postgres_data:

Monitoring, Observability, and SLAs

You can’t run enterprise infrastructure without visibility. Here’s what you need to monitor.

Key Metrics

Availability:

Uptime per tenant (target: 99.9%)
Error rate per tool (target: < 0.1%)
Latency percentiles (p50, p95, p99)

Performance:

Request latency (tool-specific)
Throughput (requests/second)
Queue depth (pending requests)

Business:

Cost per request (API calls, compute, storage)
Cost per tenant (monthly)
Token usage (for language models)

Security:

Failed authentication attempts
Denied access attempts (policy violations)
Rate limit violations
Anomalous access patterns

Implementation

Use a monitoring stack:

# monitoring/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'mcp-servers'
    static_configs:
      - targets: ['localhost:9090']
    metrics_path: '/metrics'

  - job_name: 'api-gateway'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: '/metrics'

Visualize with Grafana:

{
  "dashboard": "MCP Server Health",
  "panels": [
    {
      "title": "Request Latency (p95)",
      "targets": [
        {
          "expr": "histogram_quantile(0.95, mcp_request_duration_seconds)"
        }
      ]
    },
    {
      "title": "Error Rate",
      "targets": [
        {
          "expr": "rate(mcp_errors_total[5m])"
        }
      ]
    },
    {
      "title": "Cost per Tenant",
      "targets": [
        {
          "expr": "sum(mcp_cost_usd) by (tenant_id)"
        }
      ]
    }
  ]
}

SLAs

Define and commit to service level agreements:

Service Level Agreement (SLA) for Enterprise MCP Servers

1. Availability
   - Target: 99.9% uptime per calendar month
   - Measurement: Successful requests / Total requests
   - Exclusions: Scheduled maintenance (4 hours/month), customer-caused outages

2. Latency
   - Target: p95 latency < 500ms for tool calls
   - Measurement: Time from request receipt to response sent
   - Excludes time spent in backend systems

3. Error Rate
   - Target: < 0.1% error rate (excluding customer errors)
   - Measurement: 5xx errors / Total requests

4. Support Response Time
   - P1 (service down): 15 minutes
   - P2 (degraded): 1 hour
   - P3 (minor issue): 4 hours

5. Credits
   - 99.0–99.9% uptime: 10% monthly credit
   - 95.0–99.0% uptime: 25% monthly credit
   - < 95.0% uptime: 100% monthly credit

Track SLA compliance in your monitoring system. Alert when you’re at risk of missing an SLA.

Common Pitfalls and How to Avoid Them

Pitfall 1: Insufficient Audit Logging

Problem: You log some actions but not others. When a security incident occurs, you can’t reconstruct what happened.

Solution: Log everything. Every request, response, and error. Make logging non-optional. If a request completes without being logged, that’s a bug.

# Wrap all tool calls with logging
def call_tool_with_logging(user_id, tool_name, args):
    start_time = time.time()
    request_id = generate_request_id()
    
    try:
        result = call_tool(user_id, tool_name, args)
        duration = time.time() - start_time
        
        log_event({
            'request_id': request_id,
            'status': 'success',
            'duration_ms': duration * 1000,
            'result_size': len(result)
        })
        
        return result
    except Exception as e:
        duration = time.time() - start_time
        
        log_event({
            'request_id': request_id,
            'status': 'error',
            'duration_ms': duration * 1000,
            'error': str(e)
        })
        
        raise

Pitfall 2: Shared Secrets

Problem: API keys are shared across teams or hardcoded in repositories. When one team’s key is compromised, you have to rotate keys for everyone.

Solution: Each agent/team gets its own API key. Rotate keys regularly. Use a secret manager.

Pitfall 3: No Rate Limiting

Problem: One agent goes haywire and makes 10 million requests, exhausting your budget and bringing down the service for others.

Solution: Implement rate limiting at multiple layers. Set quotas per tenant, per user, per tool. Monitor for anomalies.

Pitfall 4: Insufficient Testing

Problem: You deploy a policy change and accidentally lock out half your users.

Solution: Test policies in a staging environment. Use policy simulation tools (OPA has a REPL). Require code review for policy changes. Gradually roll out changes (canary deployments).

Pitfall 5: Tight Coupling to Specific Tools

Problem: Your MCP server is tightly integrated with Salesforce API v1. When Salesforce upgrades to v2, you have to rewrite the server.

Solution: Decouple tool implementations from the MCP server. Use adapters. Define tool interfaces abstractly. Make it easy to swap tool implementations.

class ToolAdapter:
    """Abstract interface for tools"""
    def call(self, args) -> dict:
        raise NotImplementedError

class SalesforceAdapter(ToolAdapter):
    def __init__(self, api_version):
        self.api_version = api_version
    
    def call(self, args):
        # Implementation specific to this version
        pass

# Easy to swap implementations
tools = {
    'read_deals': SalesforceAdapter(api_version='v2'),
    'write_customer_notes': SalesforceAdapter(api_version='v2')
}

Pitfall 6: Ignoring Cost

Problem: You don’t track costs. Suddenly you’re spending $50k/month on API calls and you don’t know why.

Solution: Track cost for every action. Attribute costs to tenants and tools. Set budgets and alert when approaching limits. Optimize expensive operations.

def call_tool_with_cost_tracking(user_id, tool_name, args):
    # Estimate cost before calling
    estimated_cost = estimate_cost(tool_name, args)
    
    # Check budget
    remaining_budget = get_tenant_budget(user_id) - get_tenant_spend(user_id)
    if estimated_cost > remaining_budget:
        raise BudgetExceededError(f"Estimated cost ${estimated_cost} exceeds remaining budget ${remaining_budget}")
    
    # Call tool and track actual cost
    result = call_tool(user_id, tool_name, args)
    actual_cost = result.get('cost_usd', estimated_cost)
    
    log_cost(user_id, tool_name, actual_cost)
    
    return result

Getting Started: Your Next Steps

You now have a comprehensive reference architecture for enterprise MCP servers. Here’s how to implement it.

Step 1: Assess Your Current State

Before building, understand where you are:

How many teams need AI agent access?
What internal tools and data sources need to be exposed?
What are your compliance requirements (SOC 2, ISO 27001)?
What’s your current infrastructure (cloud provider, on-premises, hybrid)?
What’s your team’s expertise (infrastructure, security, AI)?

Document this in a brief architecture decision record (ADR).

Step 2: Choose Your Deployment Model

Based on your assessment:

Small team, low complexity: Start with shared infrastructure (Pattern 1). Easy to deploy, sufficient for proof-of-concept.
Multiple teams, growth trajectory: Use containerised isolation (Pattern 2). Scalable, good governance, manageable complexity.
Large enterprise, high volume: Hybrid isolation (Pattern 4) or dedicated infrastructure per tenant. Maximum control and audit-ability.

If you’re pursuing SOC 2 or ISO 27001 compliance, skip Pattern 1. Auditors expect isolation and comprehensive logging, which are easier with containerisation.

Step 3: Build a Proof-of-Concept

Start small. Pick one internal tool (e.g., a read-only database query tool) and build an MCP server that exposes it safely to Claude.

# poc/simple_mcp_server.py
from mcp.server import MCPServer
from mcp.types import Tool
import psycopg2

server = MCPServer("poc")

@server.tool(name="query_analytics")
def query_analytics(query: str):
    """Run a read-only query against the analytics database."""
    # Validate query (no INSERT, UPDATE, DELETE)
    if any(keyword in query.upper() for keyword in ['INSERT', 'UPDATE', 'DELETE', 'DROP']):
        raise ValueError("Write operations not allowed")
    
    # Execute query
    conn = psycopg2.connect("dbname=analytics user=readonly")
    cursor = conn.cursor()
    cursor.execute(query)
    results = cursor.fetchall()
    cursor.close()
    conn.close()
    
    return {"rows": results, "count": len(results)}

if __name__ == "__main__":
    server.run()

Test this with Claude. Ask it to run analytics queries. Verify that it works and that access is logged.

Step 4: Implement Governance

Once the PoC works, add governance:

Define roles for your organisation (DataAnalyst, SalesOps, etc.).
Write policies (who can access what).
Implement RBAC in your MCP server.
Set up audit logging.
Test that policies are enforced.

Step 5: Scale to Production

When you’re confident in the architecture:

Choose your deployment platform (Kubernetes, Lambda, etc.).
Set up infrastructure (API gateway, logging, monitoring).
Implement security controls (TLS, secret management, rate limiting).
Write runbooks for common operations (adding a tenant, rotating keys, responding to incidents).
Plan for compliance audits (SOC 2, ISO 27001).

Step 6: Optimise and Iterate

Production is where learning happens:

Monitor costs. Identify expensive tools and optimise them.
Monitor latency. Identify bottlenecks and fix them.
Monitor errors. Identify failure modes and add resilience.
Gather feedback from teams using the MCP server. Iterate on the design.

Conclusion

Enterprise MCP servers are not a nice-to-have. If you’re deploying agentic AI across your organisation, they’re essential. They’re how you govern access, ensure compliance, and maintain security at scale.

This reference architecture gives you a proven foundation. It’s been battle-tested by teams at PADISO and other Sydney-based and international organisations building AI-driven platforms. The patterns—containerised isolation, policy-as-code, immutable audit logging—are industry best practices.

The key insight is this: governance is not a constraint; it’s an enabler. When you have clear policies, comprehensive logging, and enforced access control, teams move faster. They don’t waste time on ad-hoc security reviews. They don’t worry about accidentally accessing data they shouldn’t. They build with confidence.

If you’re running agentic AI across your organisation, or planning to, start building your enterprise MCP server now. The sooner you establish governance, the easier it is to scale. And if you’re navigating AI strategy and readiness, an enterprise MCP server is a foundational piece of your AI infrastructure.

For teams in Sydney or Australia looking for hands-on support, PADISO’s CTO as a Service and platform engineering services can help you design, build, and operate enterprise MCP servers. We’ve helped 50+ clients across seed-stage startups and mid-market enterprises implement governed AI infrastructure. If you’re ready to move from proof-of-concept to production-grade enterprise MCP servers, let’s talk.

Start with the PoC. Build the governance layer. Scale to production. Measure everything. Iterate relentlessly. That’s how you build enterprise-grade AI infrastructure that teams trust and auditors approve.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Enterprise MCP Servers: A Reference Architecture

Enterprise MCP Servers: A Reference Architecture

Table of Contents

What Enterprise MCP Servers Are

Why MCP Matters for Large Organisations

Core Architecture Principles

Principle 1: Isolation by Design

Principle 2: Least Privilege Access

Principle 3: Observability as a First-Class Concern

Principle 4: Explicit Governance Models

Multi-Tenant Design Patterns

Pattern 1: Shared Infrastructure, Logical Isolation

Pattern 2: Containerised Isolation (Recommended)

Pattern 3: Serverless Isolation

Pattern 4: Hybrid Isolation

Governance and Access Control

Role-Based Access Control (RBAC)

Audit Logging

Policy as Code

Security and Compliance

Authentication and Authorisation

Encryption

Secret Management

Rate Limiting and DDoS Protection

Compliance: SOC 2 and ISO 27001

Implementation Reference Architecture

Architecture Overview

Layer 1: API Gateway

Layer 2: Tenant MCP Servers

Layer 3: Policy Engine

Layer 4: Audit Logging

Real-World Deployment Patterns

Deployment Option 1: Kubernetes (Recommended for Scale)

Deployment Option 2: AWS Lambda + API Gateway (Cost-Optimised)

Deployment Option 3: Docker Compose (Development)

Monitoring, Observability, and SLAs

Key Metrics

Implementation

SLAs

Common Pitfalls and How to Avoid Them

Pitfall 1: Insufficient Audit Logging

Pitfall 2: Shared Secrets

Pitfall 3: No Rate Limiting

Pitfall 4: Insufficient Testing

Pitfall 5: Tight Coupling to Specific Tools

Pitfall 6: Ignoring Cost

Getting Started: Your Next Steps

Step 1: Assess Your Current State

Step 2: Choose Your Deployment Model

Step 3: Build a Proof-of-Concept

Step 4: Implement Governance

Step 5: Scale to Production

Step 6: Optimise and Iterate

Conclusion

Want to talk through your situation?