Orchestrating Subagents Across MCP Boundaries
Master multi-agent orchestration across MCP boundaries. Learn auth scoping, rate-limit isolation, and trace correlation for production systems.
Orchestrating Subagents Across MCP Boundaries
Table of Contents
- Why Subagent Orchestration Matters
- Shared vs. Isolated MCP Server Architecture
- Authentication and Authorization Scoping
- Rate-Limit Isolation and Resource Governance
- Trace Correlation and Observability
- Production Patterns for Multi-Agent Systems
- Common Pitfalls and How to Avoid Them
- Implementation Checklist
Why Subagent Orchestration Matters
When you move beyond single-agent systems into production multi-agent architectures, the stakes change fundamentally. A single agent querying a database is one problem. An orchestrator agent spawning five subagents—each with their own responsibilities, permissions, and resource constraints—is another beast entirely.
The core challenge isn’t new: how do you coordinate work across distributed components? But in the context of AI agents and the Model Context Protocol (MCP), the answer is non-trivial. You need to decide whether each subagent maintains its own MCP server connection, or whether they share a single server instance. That choice cascades into decisions about authentication, rate limiting, observability, and failure recovery.
At PADISO, we’ve built AI systems for Sydney startups and enterprises that need this orchestration to work reliably. We’ve also helped teams modernise legacy automation systems into agentic architectures. The pattern that emerges—shared vs. isolated MCP servers—determines whether your system scales cleanly or becomes a debugging nightmare.
The stakes are high. If your orchestrator spawns subagents without proper isolation, one runaway agent can starve others of resources, obscure audit trails, and break compliance. If you isolate too aggressively, you’ll pay a latency and complexity tax that makes the system unmaintainable. This guide cuts through the theory and gives you the operational patterns that work in production.
Shared vs. Isolated MCP Server Architecture
The Shared Server Model
In a shared server model, all subagents—and the orchestrator—connect to the same MCP server instance. The server exposes a set of tools, resources, and prompts that any agent can invoke.
Advantages:
- Lower overhead: One server process, one set of connections, minimal resource footprint.
- Simpler deployment: Deploy once, wire up all agents to the same endpoint.
- Faster iteration: Changes to tool definitions or server logic apply instantly to all agents.
- Easier state sharing: If agents need to read shared context (e.g., a customer record fetched by one agent and used by another), a shared server makes this natural.
Disadvantages:
- Blast radius: A bug in the server code affects all agents simultaneously.
- Rate-limit contention: If one subagent hammers the server, others wait.
- Auth complexity: You need sub-agent-level permission scoping within the server logic, not at the transport layer.
- Observability fog: Requests from different agents blend together unless you tag them carefully.
The Isolated Server Model
In the isolated model, each subagent (or small cluster of agents) gets its own MCP server instance. The orchestrator routes requests to the appropriate server based on the task.
Advantages:
- Fault isolation: One server crash doesn’t take down the whole system.
- Per-agent rate limits: Each server has its own quota, so one agent can’t starve others.
- Clear audit trails: Every request is associated with a specific server instance, making compliance easier.
- Independent scaling: Scale servers separately based on demand.
Disadvantages:
- Operational complexity: More servers to deploy, monitor, and patch.
- State synchronisation: Shared context requires explicit synchronisation between servers.
- Higher latency: Cross-server communication adds round-trip time.
- Deployment overhead: Each server needs its own configuration, secrets, and lifecycle management.
The Hybrid Approach
Most production systems use a hybrid: a shared server for common, low-risk operations (e.g., read-only queries, metadata lookups), and isolated servers for high-risk or high-load operations (e.g., database writes, external API calls, billing operations).
For example, you might have:
- Shared metadata server: Exposes read-only schemas, enumerations, and system state.
- Isolated write servers: One per critical domain (accounts, payments, inventory).
- Isolated external API servers: One per third-party integration (Stripe, Salesforce, Slack).
This approach balances simplicity with safety. The orchestrator knows which server to route to based on the task type, and each server enforces its own constraints.
Authentication and Authorization Scoping
The Problem: Sub-Agent Identity
When your orchestrator spawns a subagent, that subagent needs to authenticate to the MCP server. But who is it? Is it the orchestrator? Is it a service account? Is it the end user on whose behalf the orchestrator is acting?
If you get this wrong, you’ll either grant too much permission (security risk) or too little (the agent can’t do its job). You’ll also lose accountability—if a subagent deletes a critical record, you need to know which subagent did it, not just that “something” happened.
Identity Propagation
The cleanest pattern is identity propagation: the orchestrator receives an initial identity (user, service account, or API key), and passes that identity—or a derived identity—to each subagent.
For example:
User logs in → Orchestrator receives JWT with user ID and role
↓
Orchestrator spawns SubagentA
↓
Orchestrator passes (derived token for SubagentA, original user context) to MCP server
↓
MCP server validates token, checks SubagentA's role, enforces row-level security based on user ID
The key insight: the MCP server sees both the subagent’s identity (for role-based access control) and the original user context (for audit and data isolation).
Scope Limitation
Each subagent should have minimal scope. If SubagentA is responsible for reading customer records, it shouldn’t have permission to delete them. If SubagentB is responsible for processing payments, it shouldn’t be able to modify user profiles.
Implement this at the MCP server level:
if (subagent_role == "read_customer") {
allow("SELECT * FROM customers WHERE user_id = ?");
deny("DELETE FROM customers");
deny("UPDATE customers SET ...");
} else if (subagent_role == "process_payment") {
allow("SELECT * FROM payments WHERE user_id = ?");
allow("INSERT INTO payments ...");
deny("SELECT * FROM customers");
}
This ensures that even if a subagent is compromised or misbehaves, it can only access what it needs.
Token Lifecycle
Tokens passed to subagents should be short-lived. If an orchestrator spawns a subagent with a 1-hour token, and the subagent is compromised after 30 minutes, the attacker has 30 minutes of access. Instead, use tokens with 5–15 minute lifetimes.
For long-running workflows, the orchestrator should refresh tokens periodically, or the MCP server should support token refresh without re-authentication.
Audit Logging
Every authentication attempt and every permission check should be logged. Include:
- Timestamp
- Subagent ID and role
- Original user context
- Resource accessed
- Permission granted or denied
- Reason for denial (if applicable)
This is non-negotiable for SOC 2 and ISO 27001 compliance. When you’re pursuing audit-readiness via Vanta, these logs are your evidence.
Rate-Limit Isolation and Resource Governance
The Contention Problem
Imagine your orchestrator spawns five subagents to process a batch of customer records. SubagentA is fast and finishes its work quickly. SubagentB gets stuck in a loop, making the same request over and over. If all five agents share the same MCP server with a global rate limit of 100 requests per second, SubagentB’s runaway requests will starve SubagentA, SubagentC, SubagentD, and SubagentE.
In a shared server model, you need per-agent rate limits to prevent this.
Per-Agent Rate Limits
Implement token-bucket or sliding-window rate limiting at the subagent level:
rate_limit_config = {
"subagent_a": {
"requests_per_second": 10,
"burst_size": 20
},
"subagent_b": {
"requests_per_second": 5,
"burst_size": 10
},
"subagent_c": {
"requests_per_second": 20,
"burst_size": 50
}
}
When SubagentB hits its limit, it gets a 429 (Too Many Requests) response. The orchestrator can then back off, retry, or escalate. Critically, SubagentA continues running unaffected.
Resource Quotas
Rate limiting is just one dimension. You also need to limit:
- Memory: Each subagent’s context window and intermediate state shouldn’t exceed a budget.
- CPU time: Long-running operations (e.g., complex data transformations) should have timeouts.
- Concurrent connections: Limit how many simultaneous requests a subagent can make.
- Storage: If agents cache data, cap the cache size per agent.
Implement these as hard limits in the MCP server:
if (subagent_memory_usage > quota) {
return error("Memory quota exceeded");
}
if (subagent_concurrent_requests > max_concurrent) {
return error("Too many concurrent requests");
}
if (operation_duration > timeout) {
cancel_operation();
return error("Operation timeout");
}
Backpressure and Graceful Degradation
When a subagent hits a limit, it shouldn’t just fail silently. Implement backpressure:
- Return a 429 response with a
Retry-Afterheader. - Log the limit breach (for observability).
- Optionally, alert the orchestrator to slow down or prioritise.
For non-critical operations, the orchestrator might gracefully degrade: if SubagentC can’t get results in time, fall back to a cached or approximate answer instead of blocking.
Burst Handling
Real-world workloads have bursts. A batch job might create a spike of requests. Allow for short-term bursts via a token bucket:
token_bucket = {
"tokens": 20,
"max_tokens": 20,
"refill_rate": 10 per second
}
if (tokens >= cost_of_request) {
tokens -= cost_of_request;
allow_request();
} else {
deny_request("Rate limit exceeded");
}
This allows a subagent to burst up to 20 requests (if tokens are full), then refill at 10 per second. Over time, the agent settles to an average rate.
Trace Correlation and Observability
The Observability Challenge
When an orchestrator spawns five subagents, and something goes wrong, how do you trace the issue? If you log at the agent level, you see five separate threads of execution. If you log at the server level, you see 50+ requests and no clear connection between them.
Trace correlation solves this: you assign a unique ID to the entire orchestration workflow, and propagate it through every subagent and every request. When you query logs, you can reconstruct the entire execution flow.
Trace IDs and Context Propagation
When the orchestrator starts, generate a root trace ID:
root_trace_id = generate_uuid(); // e.g., "550e8400-e29b-41d4-a716-446655440000"
When the orchestrator spawns a subagent, pass the trace ID:
subagent_a.execute(
task="fetch_customer_records",
trace_id=root_trace_id,
span_id=generate_uuid() // unique ID for this subagent's work
);
The subagent includes the trace ID in every request to the MCP server:
GET /api/customers?user_id=123
Headers: {
"X-Trace-ID": "550e8400-e29b-41d4-a716-446655440000",
"X-Span-ID": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"X-Parent-Span-ID": "6ba7b811-9dad-11d1-80b4-00c04fd430c8"
}
The MCP server logs the trace ID with every operation. Now, when you query logs for the root trace ID, you see the entire execution graph: orchestrator → subagent A → server request, orchestrator → subagent B → server request, etc.
Structured Logging
Don’t log free-form text. Use structured logs (JSON) with consistent fields:
{
"timestamp": "2025-01-15T10:30:45.123Z",
"level": "INFO",
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"span_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"parent_span_id": "6ba7b811-9dad-11d1-80b4-00c04fd430c8",
"agent_id": "subagent_a",
"operation": "fetch_customer_records",
"user_id": "user_123",
"status": "success",
"duration_ms": 145,
"rows_returned": 42
}
With structured logs, you can filter, aggregate, and correlate easily. You can ask: “Show me all requests from subagent_a in trace 550e8400-e29b-41d4-a716-446655440000 that took more than 1 second.”
Distributed Tracing
For complex systems, use a distributed tracing tool like Jaeger, Datadog, or Honeycomb. These tools ingest trace data and visualise the execution flow:
Orchestrator (root_span)
├── SubagentA (span_1)
│ ├── MCP Server: fetch_customers (span_1.1)
│ └── MCP Server: validate_data (span_1.2)
├── SubagentB (span_2)
│ ├── MCP Server: process_payment (span_2.1)
│ └── MCP Server: send_notification (span_2.2)
└── SubagentC (span_3)
└── MCP Server: log_audit_event (span_3.1)
You can see at a glance which subagents ran in parallel, which were serial, where the bottlenecks are, and where errors occurred.
Metrics and Alerting
Beyond logs, track metrics:
- Latency: P50, P95, P99 latencies per subagent and per MCP endpoint.
- Error rates: Percentage of requests that fail, grouped by agent and operation.
- Rate-limit violations: How often subagents hit rate limits.
- Resource usage: CPU, memory, and connection count per agent.
Set up alerts:
- If error rate > 5% for a subagent, page on-call.
- If P99 latency > 5 seconds, investigate.
- If rate-limit violations spike, check for runaway agents.
Production Patterns for Multi-Agent Systems
Pattern 1: Orchestrator with Isolated Task Servers
The orchestrator is the brain. Each subagent is a specialist with its own MCP server.
Orchestrator
├── Subagent A → MCP Server (Customer Data)
├── Subagent B → MCP Server (Payment Processing)
├── Subagent C → MCP Server (Notifications)
└── Subagent D → MCP Server (Audit Logging)
When to use: When subagents have distinct responsibilities and different scaling needs. The orchestrator can be stateless and horizontally scalable.
Trade-offs: More operational complexity, but clear isolation and auditability. This is the pattern we recommend for AI Strategy & Readiness engagements at scale.
Pattern 2: Shared Server with Sub-Agent Roles
All subagents connect to the same MCP server, but the server enforces role-based access control.
Orchestrator
├── Subagent A (role: reader) \
├── Subagent B (role: writer) ├→ Shared MCP Server
├── Subagent C (role: admin) /
└── Subagent D (role: auditor)
When to use: When subagents need to share context and the server is simple enough to manage centrally. Good for small teams and early-stage startups.
Trade-offs: Simpler deployment, but tighter coupling. One server bug affects all agents. This works well when you’re moving fast and iterating quickly.
Pattern 3: Hierarchical Orchestration
The root orchestrator spawns mid-level orchestrators, which spawn leaf subagents. Useful for deeply nested workflows.
Root Orchestrator
├── Mid-level Orchestrator A
│ ├── Subagent A1 → MCP Server
│ └── Subagent A2 → MCP Server
└── Mid-level Orchestrator B
├── Subagent B1 → MCP Server
└── Subagent B2 → MCP Server
When to use: When you have large, complex workflows with many subagents. The hierarchy keeps the system organised and allows for recursive delegation.
Trade-offs: More moving parts, but better composability. Trace correlation becomes crucial here.
Pattern 4: Fan-Out with Aggregation
The orchestrator spawns many subagents in parallel, then aggregates results.
Orchestrator
├── Subagent A → Task 1
├── Subagent B → Task 2
├── Subagent C → Task 3
└── Subagent D → Task 4
↓
Aggregator (waits for all results)
↓
Return combined result
When to use: When you need to parallelise independent tasks. Common in data processing pipelines.
Trade-offs: Orchestrator must manage concurrency, timeouts, and partial failures. If one subagent is slow, it blocks the entire aggregation (unless you implement partial aggregation).
Pattern 5: Circuit Breaker and Fallback
If a subagent or MCP server fails, fall back to a cached or degraded response.
Orchestrator
↓
Call Subagent A
├─ Success → Return result
├─ Timeout → Try fallback (cached data)
└─ Error → Try Subagent B (alternative)
When to use: When reliability is critical. E-commerce, payments, customer support.
Trade-offs: More complex logic, but much better user experience. You need to manage cache staleness and fallback accuracy.
Common Pitfalls and How to Avoid Them
Pitfall 1: No Trace Correlation
The problem: You deploy a multi-agent system, something breaks, and you can’t figure out which subagent caused it.
The solution: Implement trace IDs from day one. It takes 30 minutes to add, and saves hours of debugging later.
Pitfall 2: Shared Rate Limits
The problem: One subagent hammers the server, and all others starve.
The solution: Implement per-agent rate limits. Use token buckets to allow bursts but enforce average rates.
Pitfall 3: Overly Permissive Subagent Roles
The problem: A subagent is compromised, and the attacker can access anything.
The solution: Apply the principle of least privilege. Each subagent should have minimal scope. Review permissions regularly, especially before compliance audits.
Pitfall 4: No Timeout Handling
The problem: A subagent gets stuck, and the orchestrator waits forever.
The solution: Set timeouts on every subagent call. If a subagent doesn’t respond within the timeout, cancel and escalate.
try {
result = await subagent_a.execute(task, timeout=30s);
} catch (TimeoutError) {
log_error("Subagent A timeout");
// Implement fallback or escalation
}
Pitfall 5: Losing Audit Trail
The problem: When pursuing SOC 2 or ISO 27001 compliance, you can’t prove who did what.
The solution: Log every authentication, every permission check, and every data access. Include the subagent ID, user context, and timestamp. Make logs immutable (append-only).
Pitfall 6: Tight Coupling Between Agents
The problem: Subagent A’s output format is hardcoded into Subagent B’s input parser. When you change A, B breaks.
The solution: Use schemas and versioning. Define a clear contract between agents. Use JSON Schema or Protocol Buffers to make the contract explicit.
// Subagent A output schema
{
"version": "1.0",
"customer_id": "string",
"name": "string",
"email": "string"
}
// Subagent B validates input against schema
if (!validate_schema(input, CUSTOMER_SCHEMA_V1)) {
return error("Invalid input format");
}
Implementation Checklist
Use this checklist to ensure your multi-agent system is production-ready:
Architecture
- Decide on shared vs. isolated MCP servers (or hybrid).
- Document the decision and rationale.
- Design the subagent roles and responsibilities.
- Define the orchestration workflow (fan-out, hierarchical, etc.).
Authentication & Authorization
- Implement identity propagation from orchestrator to subagents.
- Define per-subagent roles and scopes.
- Implement role-based access control in the MCP server.
- Set token lifetimes (recommend 5–15 minutes).
- Log every authentication attempt and permission check.
Rate Limiting & Resource Governance
- Implement per-agent rate limits (token bucket or sliding window).
- Set memory, CPU, and connection quotas per subagent.
- Implement timeouts on all subagent calls.
- Test rate-limit enforcement under load.
- Set up alerts for rate-limit violations.
Observability
- Implement trace IDs and context propagation.
- Use structured logging (JSON).
- Set up distributed tracing (Jaeger, Datadog, etc.).
- Define key metrics (latency, error rate, resource usage).
- Set up dashboards and alerts.
Resilience
- Implement timeouts and retries.
- Implement circuit breakers for failing subagents.
- Implement fallback strategies (cached data, alternative agents).
- Test failure scenarios (agent crash, MCP server down, network partition).
- Document runbook for common failures.
Compliance & Audit
- Ensure all logs are immutable (append-only).
- Implement audit logging for sensitive operations.
- Review logs regularly for anomalies.
- Document data flows and access patterns.
- Prepare for SOC 2 / ISO 27001 audits (consider Vanta for audit-readiness).
Testing
- Unit test each subagent in isolation.
- Integration test orchestrator with all subagents.
- Load test with realistic concurrency and request rates.
- Chaos test (kill agents, introduce network delays, etc.).
- Security test (try to exceed rate limits, bypass auth, escalate privileges).
Real-World Example: E-Commerce Order Processing
Let’s walk through a concrete example to tie everything together. Imagine you’re building an order processing system for an e-commerce platform. When a customer places an order, the orchestrator needs to:
- Validate the order (inventory check).
- Process payment (payment processor).
- Create shipment (logistics).
- Send confirmation email (notifications).
- Log audit event (compliance).
You decide on an isolated server model:
Orchestrator (receives order from API)
├── Subagent A (Inventory) → MCP Server A
├── Subagent B (Payment) → MCP Server B
├── Subagent C (Logistics) → MCP Server C
├── Subagent D (Notifications) → MCP Server D
└── Subagent E (Audit) → MCP Server E
Trace flow:
- API receives order, generates root trace ID:
order-12345-trace. - Orchestrator spawns Subagent A with trace ID.
- Subagent A calls MCP Server A to check inventory, includes trace ID in request header.
- Server A logs:
{trace_id: "order-12345-trace", span_id: "...", operation: "check_inventory", status: "success"}. - Subagent A returns result to orchestrator.
- Orchestrator spawns Subagent B (payment) with trace ID.
- … and so on.
Rate limiting:
- Subagent A (Inventory): 100 requests/sec (bursty, reads only).
- Subagent B (Payment): 10 requests/sec (critical, writes to ledger).
- Subagent C (Logistics): 50 requests/sec (external API, moderate load).
- Subagent D (Notifications): 200 requests/sec (fire-and-forget).
- Subagent E (Audit): 1000 requests/sec (append-only, high volume).
Auth scoping:
- Subagent A: read-only access to inventory table.
- Subagent B: read-only access to customer and payment tables, write access to ledger.
- Subagent C: read-only access to order and customer tables, write access to shipment table.
- Subagent D: read-only access to order table, write access to email queue.
- Subagent E: write-only access to audit log table.
Resilience:
- If Subagent A times out, fail the order (inventory check is mandatory).
- If Subagent B fails, retry up to 3 times, then escalate to manual review.
- If Subagent C fails, queue the shipment for later (asynchronous).
- If Subagent D fails, log and continue (notifications are non-critical).
- If Subagent E fails, still log to local buffer and retry later (audit is critical but not blocking).
With this design, you have clear isolation, observable flows, and resilience to failures. When an auditor asks “Who accessed what data and when?”, you can pull logs by trace ID and answer confidently.
This is the kind of architecture we help teams build at PADISO. Whether you’re a Sydney startup scaling your first AI product or an enterprise modernising legacy systems with agentic AI, the principles remain the same: isolation, observability, and resilience.
Moving Forward: Build, Test, Iterate
Orchestrating subagents across MCP boundaries is not a one-time design decision. It’s an iterative process:
- Start simple: Begin with a shared server and one or two subagents. Get the basics right (auth, logging, rate limits).
- Measure: Instrument your system with traces and metrics. Understand where the bottlenecks are.
- Scale selectively: As load grows, isolate the critical path (e.g., payment processing) into its own server. Keep read-only operations shared.
- Audit regularly: Before compliance deadlines, review logs and access patterns. Fix gaps.
- Iterate: New features, new subagents, new MCP servers. Each iteration should improve observability and resilience.
If you’re building a production multi-agent system and need guidance on architecture, compliance, or implementation, PADISO’s AI & Agents Automation service can help. We’ve built these systems for Sydney startups and enterprises, and we know the pitfalls.
For teams pursuing SOC 2 or ISO 27001 compliance, orchestration and observability are key evidence. A well-designed multi-agent system with comprehensive logging and audit trails is audit-ready from day one. We’ve helped teams pass Vanta audits by implementing the patterns in this guide.
Start with the checklist. Pick one pattern that fits your use case. Build, test, and measure. You’ll be surprised how much clarity comes from good trace correlation and structured logging.
Summary
Orchestrating subagents across MCP boundaries requires careful thought on three fronts:
- Architecture: Decide between shared and isolated MCP servers based on your scale, isolation needs, and operational capacity.
- Auth & governance: Propagate identity, enforce minimal scope, and log everything.
- Observability: Implement trace IDs, structured logging, and distributed tracing from day one.
The patterns and checklist in this guide have been battle-tested in production. Start with the simpler shared-server model if you’re early-stage, and graduate to isolated servers as you scale. Always prioritise observability—it’s the difference between a debuggable system and a black box.
Your multi-agent system is only as reliable as its weakest link. By isolating rate limits, scoping permissions, and correlating traces, you ensure that one runaway subagent doesn’t take down the whole system. You also ensure that when an auditor asks “What happened?”, you have the data to answer.
Build confidently. Orchestrate deliberately. Observe relentlessly.