Building MCP Servers for Internal Tools: A Claude-First Approach
Learn to build MCP servers exposing internal APIs for Claude agents. Master authentication, rate-limiting, observability, and real-world deployment patterns.
Building MCP Servers for Internal Tools: A Claude-First Approach
Table of Contents
- Why MCP Servers Matter for Internal Tools
- Understanding the Model Context Protocol
- Architecture and Design Patterns
- Building Your First MCP Server
- Authentication and Security
- Rate-Limiting and Observability
- Integration with Claude Desktop and API
- Real-World Implementation Patterns
- Common Pitfalls and Solutions
- Next Steps and Scaling
Why MCP Servers Matter for Internal Tools
Your internal tools are goldmines of value. A customer database, order management system, analytics pipeline, or billing engine—these systems hold the operational intelligence that drives your business. Yet most organisations keep this intelligence trapped behind APIs that Claude and other AI models can’t directly access.
This is where Model Context Protocol (MCP) servers change the game.
An MCP server acts as a bridge. It exposes your internal APIs, databases, and tools as callable resources that Claude can use natively. Instead of manually copying data into prompts or building custom integrations, your Claude agents can query your systems directly, execute workflows, and return real-time results.
The business impact is concrete. Teams we’ve worked with at PADISO have seen:
- 40% reduction in manual data entry by allowing Claude agents to pull data directly from internal systems
- 3-week acceleration to MVP by exposing legacy APIs via MCP instead of rebuilding integrations
- 50+ workflows automated across customer service, operations, and finance using Claude agents with MCP access
When you’re building agentic AI solutions or pursuing AI automation, MCP servers are the connective tissue that turns Claude from a chat interface into an operational engine.
Understanding the Model Context Protocol
The Model Context Protocol is Anthropic’s open standard for connecting Claude to external systems. Think of it as a standardised way to tell Claude: “Here are the tools you can use, here’s how to use them, and here’s what they do.”
Unlike traditional API integrations where you hardcode endpoints and responses, MCP provides a declarative, schema-driven approach. You define your resources once, and Claude can discover and use them intelligently.
Core Concepts
Resources: These are data sources Claude can read. A resource might be a customer record, a product catalogue, or a log file. Resources are read-only by default and can return structured or unstructured data.
Tools: These are actions Claude can perform. A tool might create an order, send an email, or update a database record. Tools accept parameters, execute logic, and return results.
Prompts: These are reusable instruction templates that guide Claude’s behaviour. A prompt might say: “When the user asks about customer data, use the get_customer tool and format the response as a summary.”
Roots: These are filesystem paths or URIs that Claude can access. If you’re exposing local files or directories, you define the root path so Claude knows what it can read.
According to Anthropic’s official engineering documentation on code execution with MCP, the protocol is designed to be transport-agnostic. You can implement MCP servers over stdio (for local Claude Desktop), HTTP, WebSockets, or other transports.
The key insight: MCP is not a new API format. It’s a metadata layer that makes your existing tools discoverable and usable by Claude in a structured way.
Architecture and Design Patterns
Before you write a single line of code, you need to think about what you’re exposing and why.
The Three-Layer Model
Layer 1: Your Internal Systems (the source of truth) This is your existing infrastructure—databases, APIs, microservices, legacy systems. You’re not replacing these. You’re adding a translation layer on top.
Layer 2: The MCP Server (the translation layer) This is the new code you write. It sits between Claude and your internal systems. It handles authentication, rate-limiting, logging, and transformation of requests and responses.
Layer 3: Claude (the agent) Claude uses the MCP server to access your systems. From Claude’s perspective, the MCP server is the source of truth.
Design Principles
Principle 1: Minimal Surface Area Don’t expose everything. Expose only the tools and resources Claude actually needs. If Claude doesn’t need to create users, don’t expose that tool. This reduces complexity, improves security, and makes observability easier.
Principle 2: Stateless Design MCP servers should be stateless. All state lives in your backend systems. This makes scaling, debugging, and testing straightforward.
Principle 3: Fail Gracefully When something goes wrong—a database is down, a rate limit is hit, authentication fails—return clear error messages. Claude needs to understand what went wrong so it can decide whether to retry, escalate, or ask the user for help.
Principle 4: Observability First Log everything. Every request, every response, every error. You need visibility into what Claude is doing with your systems. This is non-negotiable for security and compliance.
Common Patterns
Pattern 1: Read-Heavy Workflows Claude reads data from your systems to answer questions or generate reports. Example: “What are our top 10 customers by revenue this month?” Claude uses the MCP server to query your analytics database and formats the response.
Pattern 2: Transactional Workflows Claude reads data, makes decisions, and writes data back. Example: “Create a support ticket for this customer issue.” Claude reads the customer record, creates the ticket, and returns a confirmation.
Pattern 3: Multi-Step Orchestration Claude chains multiple MCP calls together. Example: “Find all overdue invoices, send a reminder email to each customer, and log the action.” This requires Claude to call multiple tools in sequence and handle responses.
Pattern 4: Context Enrichment Claude uses MCP to fetch context before responding to a user query. Example: “Answer this customer’s question, but first fetch their account history and recent interactions.” The MCP server provides the context; Claude synthesises it into a response.
Building Your First MCP Server
Let’s build something concrete. We’ll create an MCP server that exposes a simple customer database.
Prerequisites
You’ll need:
- Python 3.10+ (or Node.js/TypeScript)
- The MCP SDK (
pip install mcp) - A basic understanding of async/await
- A running Claude Desktop (for testing)
Step 1: Define Your Schema
Start by defining what resources and tools your MCP server will expose. Create a file called schema.json:
{
"resources": [
{
"name": "customer_database",
"description": "Access to the customer database",
"schema": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"name": {"type": "string"},
"email": {"type": "string"},
"status": {"type": "string", "enum": ["active", "inactive", "churned"]}
}
}
}
],
"tools": [
{
"name": "get_customer",
"description": "Fetch a customer by ID",
"inputSchema": {
"type": "object",
"properties": {
"customer_id": {"type": "string", "description": "The customer ID"}
},
"required": ["customer_id"]
}
},
{
"name": "list_customers",
"description": "List all customers with optional filtering",
"inputSchema": {
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["active", "inactive", "churned"]},
"limit": {"type": "integer", "default": 10}
}
}
}
]
}
Step 2: Implement the Server
Create server.py:
import asyncio
from mcp.server import Server
from mcp.types import Tool, TextContent
import json
app = Server("customer-database")
# Mock database
CUSTOMERS = {
"cust_001": {"id": "cust_001", "name": "Acme Corp", "email": "contact@acme.com", "status": "active"},
"cust_002": {"id": "cust_002", "name": "TechStart Inc", "email": "hello@techstart.io", "status": "active"},
"cust_003": {"id": "cust_003", "name": "Legacy Systems Ltd", "email": "info@legacy.co.uk", "status": "inactive"},
}
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> str:
if name == "get_customer":
customer_id = arguments.get("customer_id")
customer = CUSTOMERS.get(customer_id)
if customer:
return json.dumps(customer)
else:
return json.dumps({"error": f"Customer {customer_id} not found"})
elif name == "list_customers":
status = arguments.get("status")
limit = arguments.get("limit", 10)
customers = list(CUSTOMERS.values())
if status:
customers = [c for c in customers if c["status"] == status]
return json.dumps(customers[:limit])
else:
return json.dumps({"error": f"Unknown tool: {name}"})
if __name__ == "__main__":
import mcp.server
mcp.server.stdio_server(app)
Step 3: Register with Claude Desktop
Add your server to Claude Desktop’s configuration. Edit ~/.config/Claude/claude_desktop_config.json:
{
"mcpServers": {
"customer-database": {
"command": "python",
"args": ["/path/to/server.py"]
}
}
}
Step 4: Test with Claude
Open Claude Desktop. In a conversation, try:
“Get me the details for customer cust_001.”
Claude will automatically discover the get_customer tool and call it. You’ll see the customer data returned directly in the conversation.
For more detailed guidance on this process, refer to the step-by-step guide for building your first MCP server.
Authentication and Security
Now that you have a working MCP server, let’s talk about what happens when you expose real systems.
Authentication Patterns
Pattern 1: API Keys The simplest approach. Your MCP server validates an API key before processing requests. The key is stored in the Claude Desktop config or environment variables.
import os
from functools import wraps
EXPECTED_KEY = os.getenv("MCP_API_KEY")
def require_auth(func):
async def wrapper(self, *args, **kwargs):
auth_header = self.request_headers.get("Authorization")
if not auth_header or not auth_header.startswith("Bearer "):
raise ValueError("Missing or invalid authorization")
token = auth_header.split(" ")[1]
if token != EXPECTED_KEY:
raise ValueError("Invalid API key")
return await func(self, *args, **kwargs)
return wrapper
Pattern 2: OAuth 2.0 For more sophisticated scenarios, use OAuth. Your MCP server exchanges a refresh token for an access token, then uses that token to call downstream APIs.
Pattern 3: mTLS (Mutual TLS) If your MCP server runs on a private network, use certificate-based authentication. Both the client (Claude) and server authenticate each other via certificates.
Permission Scoping
Not all users should have access to all tools. Implement role-based access control (RBAC) in your MCP server.
USER_ROLES = {
"support_agent": ["get_customer", "list_customers"],
"admin": ["get_customer", "list_customers", "update_customer", "delete_customer"],
}
async def call_tool(name: str, arguments: dict, user_role: str) -> str:
allowed_tools = USER_ROLES.get(user_role, [])
if name not in allowed_tools:
return json.dumps({"error": f"User role '{user_role}' cannot access tool '{name}'"})
# ... rest of tool logic
Data Masking
Some data is sensitive. Your MCP server should mask or redact it before returning to Claude.
import re
def mask_email(email: str) -> str:
# Show only first character and domain
local, domain = email.split("@")
return f"{local[0]}***@{domain}"
def mask_customer(customer: dict) -> dict:
customer_copy = customer.copy()
if "email" in customer_copy:
customer_copy["email"] = mask_email(customer_copy["email"])
return customer_copy
For comprehensive guidance on securing MCP servers, consult Anthropic’s official documentation on tool use and MCP integration.
Rate-Limiting and Observability
Without rate-limiting and observability, your MCP server becomes a liability. Claude might hammer your backend with requests, or you might miss security incidents.
Rate-Limiting Strategies
Strategy 1: Token Bucket Allow a certain number of requests per time window. Requests are “tokens” that refill over time.
from time import time
from collections import defaultdict
class RateLimiter:
def __init__(self, max_requests: int, window_seconds: int):
self.max_requests = max_requests
self.window_seconds = window_seconds
self.requests = defaultdict(list)
def is_allowed(self, user_id: str) -> bool:
now = time()
# Remove old requests outside the window
self.requests[user_id] = [
req_time for req_time in self.requests[user_id]
if now - req_time < self.window_seconds
]
if len(self.requests[user_id]) < self.max_requests:
self.requests[user_id].append(now)
return True
return False
limiter = RateLimiter(max_requests=100, window_seconds=60)
async def call_tool(name: str, arguments: dict, user_id: str) -> str:
if not limiter.is_allowed(user_id):
return json.dumps({"error": "Rate limit exceeded. Max 100 requests per minute."})
# ... rest of tool logic
Strategy 2: Per-Tool Rate Limits Some tools are more expensive than others. A tool that queries a large dataset might have a lower rate limit than a tool that returns a single record.
TOOL_RATE_LIMITS = {
"get_customer": {"requests": 1000, "window": 60},
"list_customers": {"requests": 100, "window": 60},
"export_all_customers": {"requests": 10, "window": 3600}, # 10 per hour
}
Observability: Logging and Metrics
Log every interaction. You need to know:
- Who called what
- When they called it
- What parameters they passed
- What the response was
- How long it took
- Whether it succeeded or failed
import logging
import json
from datetime import datetime
from time import time
logger = logging.getLogger("mcp_server")
logger.setLevel(logging.INFO)
# File handler
fh = logging.FileHandler("mcp_server.log")
fh.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
logger.addHandler(fh)
async def call_tool(name: str, arguments: dict, user_id: str) -> str:
start_time = time()
logger.info(json.dumps({
"event": "tool_call_start",
"user_id": user_id,
"tool_name": name,
"arguments": arguments,
"timestamp": datetime.utcnow().isoformat(),
}))
try:
result = await execute_tool(name, arguments)
elapsed = time() - start_time
logger.info(json.dumps({
"event": "tool_call_success",
"user_id": user_id,
"tool_name": name,
"elapsed_ms": int(elapsed * 1000),
"timestamp": datetime.utcnow().isoformat(),
}))
return result
except Exception as e:
elapsed = time() - start_time
logger.error(json.dumps({
"event": "tool_call_error",
"user_id": user_id,
"tool_name": name,
"error": str(e),
"elapsed_ms": int(elapsed * 1000),
"timestamp": datetime.utcnow().isoformat(),
}))
return json.dumps({"error": "Tool execution failed"})
For production systems, integrate with a centralised logging service (e.g., Datadog, New Relic, CloudWatch). Set up alerts for:
- High error rates
- Unusual rate-limiting triggers
- Slow response times
- Unauthorised access attempts
Integration with Claude Desktop and API
Your MCP server can run in two contexts: Claude Desktop (local, synchronous) and Claude API (remote, asynchronous). Each has different requirements.
Claude Desktop Integration
Claude Desktop runs on the user’s machine. Your MCP server typically runs locally via stdio (standard input/output).
Configuration:
{
"mcpServers": {
"customer-database": {
"command": "python",
"args": ["/path/to/server.py"],
"env": {
"MCP_API_KEY": "your-secret-key",
"DATABASE_URL": "postgresql://localhost/customers"
}
}
}
}
Advantages:
- Low latency (local execution)
- No network overhead
- Simple deployment
Disadvantages:
- Requires local setup
- Difficult to share across teams
- Hard to update without restarting Claude Desktop
For detailed configuration options, see the guide on Claude Desktop MCP server configuration.
Claude API Integration
For production systems, your MCP server runs on a remote server (e.g., AWS Lambda, a Docker container, a managed service). Claude API calls your server over HTTP.
Architecture:
Claude API → Your MCP Server (HTTP) → Your Backend Systems
Implementation:
from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel
import json
app = FastAPI()
class ToolCall(BaseModel):
name: str
arguments: dict
@app.post("/tools/call")
async def call_tool(
request: ToolCall,
authorization: str = Header(None)
):
# Validate authorization
if not authorization or not authorization.startswith("Bearer "):
raise HTTPException(status_code=401, detail="Unauthorized")
token = authorization.split(" ")[1]
if token != os.getenv("MCP_API_KEY"):
raise HTTPException(status_code=401, detail="Invalid token")
# Check rate limit
user_id = extract_user_id(token) # Implement this
if not limiter.is_allowed(user_id):
raise HTTPException(status_code=429, detail="Rate limit exceeded")
# Execute tool
try:
result = await execute_tool(request.name, request.arguments)
return {"status": "success", "result": result}
except Exception as e:
logger.error(f"Tool execution failed: {e}")
raise HTTPException(status_code=500, detail="Tool execution failed")
Deployment Considerations:
- Use HTTPS only
- Implement request signing (HMAC-SHA256)
- Set appropriate timeouts (30-60 seconds)
- Use connection pooling for database connections
- Implement graceful degradation (return cached data if backend is slow)
For step-by-step instructions on integrating with Claude Code, see this detailed guide on adding MCP servers to Claude Code.
Real-World Implementation Patterns
Let’s look at how real organisations are using MCP servers.
Pattern 1: Support Ticket Automation
A SaaS company exposes their support ticket system via MCP. Claude can:
- Read incoming customer emails
- Query the ticket database for similar issues
- Suggest responses based on historical resolutions
- Create new tickets if needed
- Log actions for audit purposes
Tools exposed:
search_tickets(read-only)get_ticket(read-only)create_ticket(write)update_ticket_status(write)add_comment(write)
Rate limits:
- Search: 1000/hour
- Create: 100/hour
- Update: 500/hour
Results: 40% faster ticket resolution, 30% reduction in duplicate tickets.
Pattern 2: Data Enrichment Pipeline
A fintech company uses MCP to enrich transaction data. Claude:
- Receives a raw transaction
- Queries the MCP server for merchant information
- Queries for customer history
- Queries for fraud patterns
- Returns enriched data with risk score
Tools exposed:
get_merchant(read-only)get_customer_history(read-only)check_fraud_patterns(read-only)log_transaction_enrichment(write)
Performance: 200ms average latency, 99.9% uptime.
Pattern 3: Legacy System Integration
An enterprise company has a 20-year-old mainframe system. Instead of rebuilding it, they expose it via MCP. Claude can:
- Query customer records
- Check order status
- Retrieve historical data
- Trigger batch jobs
Key challenge: The mainframe API is synchronous and slow (5-10 second responses). Solution: Implement caching in the MCP server.
from functools import lru_cache
import asyncio
@lru_cache(maxsize=1000)
async def get_customer_from_mainframe(customer_id: str):
# This is slow, so we cache it
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://mainframe.company.com/api/customer/{customer_id}",
timeout=10
)
return response.json()
For real-world insights from 40+ developers, see this deep dive into building MCP servers in production.
Common Pitfalls and Solutions
Pitfall 1: Exposing Too Much
Problem: You expose every tool and resource your system has. Claude gets confused, makes mistakes, and you lose control.
Solution: Start minimal. Expose only what Claude needs. Add more tools as you understand the use cases.
Checklist:
- Can Claude complete the intended workflow with these tools?
- Are there tools that Claude might call but shouldn’t?
- Can any tools be combined or simplified?
Pitfall 2: Ignoring Error Handling
Problem: Your MCP server returns cryptic errors. Claude doesn’t know what went wrong and keeps retrying.
Solution: Return clear, actionable error messages.
# Bad
return json.dumps({"error": "Database error"})
# Good
return json.dumps({
"error": "Customer not found",
"customer_id": customer_id,
"suggestion": "Check the customer ID and try again"
})
Pitfall 3: Slow Responses
Problem: Your MCP server calls slow backend APIs. Claude times out or waits too long.
Solution: Implement timeouts, caching, and async execution.
import asyncio
async def call_tool(name: str, arguments: dict) -> str:
try:
# Set a 10-second timeout
result = await asyncio.wait_for(
execute_tool(name, arguments),
timeout=10
)
return result
except asyncio.TimeoutError:
return json.dumps({
"error": "Tool execution timed out",
"suggestion": "Try again in a few moments"
})
Pitfall 4: No Audit Trail
Problem: Claude makes changes to your systems, but you can’t trace who did what or why.
Solution: Log everything with context.
logger.info(json.dumps({
"event": "ticket_created",
"user_id": user_id,
"customer_id": arguments.get("customer_id"),
"subject": arguments.get("subject"),
"timestamp": datetime.utcnow().isoformat(),
"claude_session_id": session_id,
}))
Pitfall 5: Hardcoded Credentials
Problem: Your MCP server config contains API keys and passwords in plain text.
Solution: Use environment variables and secrets management.
import os
from dotenv import load_dotenv
load_dotenv()
DB_URL = os.getenv("DATABASE_URL")
API_KEY = os.getenv("MCP_API_KEY")
if not DB_URL or not API_KEY:
raise ValueError("Missing required environment variables")
For production, use AWS Secrets Manager, HashiCorp Vault, or similar.
Scaling and Production Readiness
Once your MCP server is working, you need to think about production.
Horizontal Scaling
If you expect high traffic, run multiple instances of your MCP server behind a load balancer.
# Docker Compose example
version: '3'
services:
mcp-server-1:
image: my-mcp-server:latest
environment:
- MCP_API_KEY=${MCP_API_KEY}
- DATABASE_URL=${DATABASE_URL}
mcp-server-2:
image: my-mcp-server:latest
environment:
- MCP_API_KEY=${MCP_API_KEY}
- DATABASE_URL=${DATABASE_URL}
load-balancer:
image: nginx:latest
ports:
- "8000:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
Monitoring and Alerting
Set up dashboards and alerts:
Metrics to track:
- Request rate (requests/second)
- Error rate (errors/second)
- Response time (p50, p95, p99)
- Rate limit violations
- Authentication failures
Tools: Prometheus + Grafana, DataDog, New Relic, CloudWatch.
Versioning
As your MCP server evolves, you’ll add new tools and change existing ones. Version your API.
app = FastAPI()
@app.post("/v1/tools/call")
async def call_tool_v1(request: ToolCall):
# Old implementation
pass
@app.post("/v2/tools/call")
async def call_tool_v2(request: ToolCall):
# New implementation
pass
Support multiple versions simultaneously during a transition period.
Compliance and Security Audit Readiness
If you’re pursuing SOC 2 or ISO 27001 compliance, your MCP server needs to be audit-ready. We’ve helped many organisations achieve this through our Security Audit and Vanta implementation services.
Key Requirements
Access Control:
- Role-based access control (RBAC) implemented
- API keys rotated regularly
- Least privilege principle enforced
Logging and Monitoring:
- All tool calls logged with user ID and timestamp
- Error logs retained for 90+ days
- Real-time alerting for suspicious activity
Data Protection:
- Sensitive data masked in logs
- Encryption in transit (HTTPS/TLS)
- Encryption at rest for stored logs
Incident Response:
- Process for investigating tool misuse
- Procedure for revoking compromised credentials
- Documentation of security incidents
For a comprehensive approach to AI readiness and compliance, explore our AI Strategy & Readiness services, which help teams design secure, auditable AI systems from the ground up.
Next Steps and Scaling
You now have a working MCP server with authentication, rate-limiting, and observability. Here’s what comes next.
Phase 1: Validate Use Cases (Weeks 1-4)
- Deploy your MCP server to a staging environment
- Test with real Claude workflows
- Measure performance and identify bottlenecks
- Gather feedback from users
Phase 2: Harden for Production (Weeks 5-8)
- Implement comprehensive logging and monitoring
- Set up alerting and on-call rotation
- Document runbooks for common issues
- Conduct security review
Phase 3: Scale and Optimise (Weeks 9-12)
- Deploy to production with multiple instances
- Implement caching where appropriate
- Optimise database queries
- Monitor and iterate
Building a Broader AI Strategy
MCP servers are one piece of a larger AI strategy. As you expand, consider:
-
AI & Agents Automation: How will you orchestrate multiple AI agents to solve complex problems? Check out our guide on agentic AI versus traditional automation to understand when agents are the right choice.
-
AI Adoption and Readiness: Are your teams ready for AI? Explore AI adoption strategies for Sydney businesses to understand the organisational changes required.
-
Platform Engineering: MCP servers are part of your platform. Invest in platform design and engineering to make it easy for teams to build on top of your infrastructure.
-
Custom Software Development: Many organisations need custom tools that work alongside Claude. Our custom software development services help you build these integrations.
If you’re building AI products or automating operations with Claude, you need a partner who understands both the technical and organisational challenges. At PADISO, we’re a Sydney-based venture studio and AI digital agency that helps founders, operators, and enterprises ship AI products, automate workflows, and pass security audits.
We’ve helped 50+ organisations build MCP servers and integrate Claude into their operations. We’ve also guided teams through AI agency consultation to define their AI strategy, implemented AI automation agency services to execute on that strategy, and supported them through AI agency project management to ensure successful delivery.
Whether you’re a seed-stage startup needing fractional CTO leadership, a mid-market company modernising with agentic AI, or an enterprise pursuing SOC 2 compliance, we’re here to help. Our AI agency services span strategy, implementation, and operations.
Immediate Actions
- Define your MVP: What’s the one workflow Claude needs to automate? Start there.
- Build your first tool: Use the code examples in this guide to create a simple MCP server.
- Test with Claude Desktop: Validate that Claude can use your tools correctly.
- Add observability: Implement logging before you deploy to production.
- Plan your security review: Identify who needs to sign off on your MCP server before it goes live.
MCP servers are a powerful way to extend Claude’s capabilities into your organisation. They’re not magic—they’re a well-designed protocol for exposing your systems to AI agents in a safe, controlled way.
Start small, measure everything, and scale deliberately. Your internal tools are valuable; make them accessible to Claude, and you’ll unlock new possibilities for automation and efficiency.
For more on building AI systems in the real world, check out our case studies to see how we’ve helped other organisations solve similar challenges. If you’re ready to build, let’s talk.