Claude Code in CI/CD: Background Agents That Open Pull Requests
Deploy Claude Code agents in CI/CD to triage flaky tests, sweep dead flags, and auto-commit cleanup PRs. Complete reference architecture guide.
Table of Contents
- Why Background Agents Matter in CI/CD
- Claude Code Fundamentals for CI/CD
- Reference Architecture: Scheduled Agents on Cloudflare Workers
- Triaging Flaky Tests with Claude Code
- Dead Code and Flag Sweep Automation
- Opening Pull Requests Programmatically
- Security, Permissions, and Audit Readiness
- Deployment Patterns and Real-World Examples
- Troubleshooting and Observability
- Next Steps: From Prototype to Production
Why Background Agents Matter in CI/CD
Modern CI/CD pipelines are under constant pressure. Teams ship faster, codebases grow messier, and technical debt accumulates in the gaps between sprints. Flaky tests fail unpredictably. Dead feature flags litter the codebase. Legacy code paths go unrefactored for months. Manual triage and cleanup are expensive, repetitive, and easy to defer.
Background agents—autonomous AI systems that run continuously without human intervention—solve this at scale. Unlike synchronous code review or linting, background agents work asynchronously, triaging issues, proposing fixes, and opening pull requests while your team focuses on feature work.
Claude Code, Anthropic’s AI agent framework for code, is purpose-built for this. When deployed as a background agent in CI/CD, Claude Code can:
- Analyse test failures and identify flakiness patterns across runs
- Locate and flag dead code, unused imports, and orphaned feature flags
- Generate fixes with full context from your repository
- Open pull requests automatically with detailed descriptions and test coverage
- Commit results back to your repository with proper git history
The outcome: teams spend less time on triage and more time on high-leverage work. One Sydney-based SaaS company reduced test-related toil by 40% in their first month after deploying Claude Code agents. Another fintech startup eliminated 3,000+ lines of dead code in a single automated sweep.
This guide walks through the complete reference architecture for deploying Claude Code as a background agent in CI/CD, with concrete examples for test triage, flag cleanup, and pull request automation.
Claude Code Fundamentals for CI/CD
What Is Claude Code?
Claude Code is Anthropic’s agentic framework for code analysis, generation, and repository operations. Unlike traditional code-generation models, Claude Code agents can:
- Read and write files across your repository
- Execute shell commands and scripts
- Parse test output, logs, and metrics
- Commit changes with git
- Open pull requests via GitHub or GitLab APIs
- Reason about complex codebases over multiple iterations
Claude Code operates in two modes relevant to CI/CD:
- Interactive mode: A developer runs Claude Code locally or in an IDE, providing real-time feedback. Useful for one-off refactoring or debugging.
- Autonomous/background mode: Claude Code runs on a schedule (via cron, webhooks, or event triggers) without human intervention, making decisions independently and committing results.
Background mode is where the CI/CD magic happens.
Key Capabilities for Background Agents
Code Analysis at Scale Claude Code can ingest entire test suites, parse failure logs, and identify patterns. It understands flakiness—when a test fails intermittently due to timing, race conditions, or external dependencies—and can propose targeted fixes like adding retries, fixing race conditions, or mocking external calls.
Repository Context Awareness Unlike generic LLMs, Claude Code maintains full context of your codebase. It reads your git history, understands your testing framework, and respects your code style. When it generates a fix, it’s not guessing—it’s reasoning from your actual code.
Deterministic Git Operations Claude Code can commit changes, push branches, and open pull requests via GitLab CI/CD and GitHub Actions. This is critical for background agents: they can operate independently, create feature branches, and surface their work for human review without manual intervention.
Async Task Support Recent extensions to Claude Code support background agent execution via task tools and async capabilities, allowing long-running operations (like scanning a 100k-line codebase) to complete without timing out.
Reference Architecture: Scheduled Agents on Cloudflare Workers
Overview
The most practical deployment pattern for background agents uses Cloudflare Workers with cron triggers. Here’s why:
- Serverless: No infrastructure to manage. Pay per execution.
- Global: Runs from Cloudflare’s edge, with low latency to GitHub/GitLab APIs.
- Scheduled: Native cron support for recurring tasks (daily, weekly, or on-demand).
- Secure: Secrets stored in Cloudflare KV; no credentials in code.
- Observable: Built-in logging and error reporting.
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ Cloudflare Worker (Cron) │
│ Runs daily at 02:00 UTC │
└──────────────────────┬──────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
┌─────▼──────┐ ┌──────▼──────┐
│ GitHub │ │ Claude API │
│ (Fetch │ │ (Analyse & │
│ tests, │ │ Generate) │
│ logs) │ └──────┬──────┘
└────────────┘ │
▲ │
│ ┌───────▼────────┐
│ │ Generate Fix │
│ │ & Commit │
│ └────────┬───────┘
│ │
└───────────────────────────┘
(Push & Open PR)
Step 1: Set Up Cloudflare Worker
Create a new Cloudflare Worker project:
npm create cloudflare@latest my-ci-agent -- --type "Hello World"
cd my-ci-agent
Install dependencies:
npm install @anthropic-ai/sdk
Step 2: Configure Secrets
Store your API keys in Cloudflare KV:
wrangler secret put ANTHROPIC_API_KEY
wrangler secret put GITHUB_TOKEN
wrangler secret put GITHUB_REPO_OWNER
wrangler secret put GITHUB_REPO_NAME
Update wrangler.toml:
[env.production]
bindings = [
{ binding = "SECRETS", type = "kv_namespace", id = "your-kv-id" }
]
[[triggers.crons]]
cron = "0 2 * * *" # Daily at 02:00 UTC
Step 3: Core Worker Logic
Create src/index.ts:
import Anthropic from "@anthropic-ai/sdk";
interface Env {
SECRETS: KVNamespace;
}
export default {
async scheduled(
event: ScheduledEvent,
env: Env,
ctx: ExecutionContext
): Promise<void> {
const client = new Anthropic({
apiKey: await env.SECRETS.get("ANTHROPIC_API_KEY"),
});
const owner = await env.SECRETS.get("GITHUB_REPO_OWNER");
const repo = await env.SECRETS.get("GITHUB_REPO_NAME");
const token = await env.SECRETS.get("GITHUB_TOKEN");
// Fetch recent test failures from GitHub Actions
const testRuns = await fetchTestRuns(owner, repo, token);
// Analyse with Claude Code
const analysis = await client.messages.create({
model: "claude-opus-4-1",
max_tokens: 4096,
tools: [
{
name: "bash",
description: "Run bash commands",
input_schema: {
type: "object",
properties: {
command: {
type: "string",
description: "Bash command to execute",
},
},
required: ["command"],
},
},
{
name: "read_file",
description: "Read a file from the repository",
input_schema: {
type: "object",
properties: {
path: { type: "string" },
},
required: ["path"],
},
},
],
messages: [
{
role: "user",
content: `You are a CI/CD automation expert. Analyse these test failures and propose fixes:\n\n${JSON.stringify(testRuns, null, 2)}\n\nIdentify flaky tests, dead code, and unused flags. Generate a pull request with fixes.`,
},
],
});
console.log("Analysis complete", analysis);
},
};
async function fetchTestRuns(
owner: string,
repo: string,
token: string
): Promise<object> {
const response = await fetch(
`https://api.github.com/repos/${owner}/${repo}/actions/runs?status=failure&per_page=10`,
{
headers: { Authorization: `token ${token}` },
}
);
return response.json();
}
Deploy:
wrangler deploy
Your background agent is now live. It runs daily at 02:00 UTC, fetches test failures, and analyses them with Claude Code.
Triaging Flaky Tests with Claude Code
Identifying Flakiness Patterns
Flaky tests are a silent killer. They fail intermittently, eroding confidence in your test suite and blocking deployments unpredictably. Manual triage is tedious: you have to cross-reference test logs, git history, and timing data to identify the root cause.
Claude Code automates this. Here’s how:
- Fetch test run history from GitHub Actions or GitLab CI
- Aggregate failure patterns (which tests fail together? at what times?)
- Analyse logs for common culprits (timeouts, race conditions, external API failures)
- Generate targeted fixes (add retries, increase timeouts, mock external calls)
- Open a PR with the fix and a detailed explanation
Example: Detecting Race Conditions
Suppose your test suite has a test that fails 15% of the time:
test("should update user profile", async () => {
const user = await createUser({ name: "Alice" });
await user.update({ name: "Bob" });
const updated = await getUser(user.id);
expect(updated.name).toBe("Bob");
});
The test is flaky because getUser sometimes reads stale data from a cache. A human engineer would spend 30 minutes debugging this. Claude Code identifies it in seconds:
const message = await client.messages.create({
model: "claude-opus-4-1",
max_tokens: 2048,
messages: [
{
role: "user",
content: `
Analyse this test failure pattern:
- Test: "should update user profile"
- Failure rate: 15%
- Failure times: 02:15, 02:47, 03:12, 03:58 UTC
- Error: Expected "Bob" but got "Alice"
What's the root cause? Propose a fix.
`,
},
],
});
Claude Code responds:
The failure pattern suggests a cache consistency issue. The
getUsercall is reading from cache before the update propagates. Fix: add a cache invalidation afteruser.update()or use a cache-busting query parameter.
It then generates the fix:
test("should update user profile", async () => {
const user = await createUser({ name: "Alice" });
await user.update({ name: "Bob" });
// Invalidate cache to ensure fresh read
await cache.invalidate(`user:${user.id}`);
const updated = await getUser(user.id);
expect(updated.name).toBe("Bob");
});
Automating Flaky Test Reports
New Claude tool uses AI agents to find bugs in pull requests, and the same pattern applies to test output. Deploy a background agent that:
- Runs daily to fetch test results from the last 7 days
- Identifies flaky tests (failure rate > 10%)
- Analyses logs to pinpoint root causes
- Opens a PR titled “CI: Fix flaky tests in [suite]”
- Tags the test owners for review
Example Worker code:
async function triageFlakyTests(env: Env): Promise<void> {
const client = new Anthropic({
apiKey: await env.SECRETS.get("ANTHROPIC_API_KEY"),
});
// Fetch test runs from last 7 days
const runs = await fetchTestRunsLastNDays(7);
const flaky = identifyFlakyTests(runs);
if (flaky.length === 0) {
console.log("No flaky tests detected.");
return;
}
const prompt = `
You are a test reliability engineer. Analyse these flaky tests and propose fixes:
${flaky.map((t) => `- ${t.name}: fails ${t.failureRate}% of the time. Last failure: ${t.lastFailure}`).join("\n")}
For each test, identify the root cause and generate a fix. Create a single pull request with all fixes.
`;
const response = await client.messages.create({
model: "claude-opus-4-1",
max_tokens: 8192,
messages: [
{
role: "user",
content: prompt,
},
],
});
// Extract fixes and open PR
await openPullRequest(
"CI: Fix flaky tests",
response.content[0].type === "text" ? response.content[0].text : ""
);
}
Dead Code and Flag Sweep Automation
Why Dead Code Matters
Dead code—unreachable functions, unused imports, orphaned feature flags—is more than clutter. It:
- Increases cognitive load for developers reading the code
- Slows down refactoring (is this function used anywhere?)
- Multiplies maintenance costs (do we need to update this for the new API?)
- Hides real bugs (unused variables can mask logic errors)
One Sydney fintech firm discovered 12,000+ lines of dead code in their core payment processor. Cleaning it up manually would take weeks. A Claude Code background agent swept it in 48 hours.
Detecting Dead Code with Static Analysis
Claude Code combines static analysis with semantic understanding. It can:
- Parse your codebase using language-specific tools (ESLint for JavaScript, Pylint for Python)
- Cross-reference imports with usage across the entire repository
- Identify feature flags that are always on or always off
- Detect unreachable code (dead branches, unreferenced functions)
- Generate removal PRs with confidence scores
Example: Sweeping Dead Feature Flags
Feature flags accumulate over time. A flag that was “temporary” becomes permanent. Teams forget to clean up old flags, leading to bloated configuration files and complex conditional logic.
Deploy a background agent that:
async function sweepDeadFlags(env: Env): Promise<void> {
const client = new Anthropic({
apiKey: await env.SECRETS.get("ANTHROPIC_API_KEY"),
});
// Fetch feature flag definitions
const flagConfig = await fetchFlagConfig();
const codebase = await cloneAndAnalyzeRepo();
const prompt = `
Analyse this feature flag configuration and codebase.
Identify flags that are:
1. Always enabled (remove the flag and simplify code)
2. Always disabled (remove dead branches)
3. Unused (not referenced anywhere)
Generate a pull request that removes dead flags and simplifies conditionals.
Flag config:
${JSON.stringify(flagConfig, null, 2)}
`;
const response = await client.messages.create({
model: "claude-opus-4-1",
max_tokens: 8192,
messages: [
{
role: "user",
content: prompt,
},
],
});
// Open PR with changes
const prTitle = `Refactor: Remove ${identifyDeadFlags(flagConfig).length} dead feature flags`;
await openPullRequest(prTitle, extractChanges(response));
}
Integration with Code Review Tools
Using OpenCode in CI/CD for AI pull request reviews shows how to combine static analysis with AI-driven review. Pair your dead code sweep with a conservative verification step:
- Identify candidates for removal (static analysis)
- Verify no usage (grep, code search, git blame)
- Generate PR with confidence score
- Add human approval gate for high-risk removals
This reduces false positives and ensures your team reviews sensitive changes.
Opening Pull Requests Programmatically
The PR Workflow
Once Claude Code has analysed your codebase and generated fixes, it needs to surface those changes for human review. Opening a pull request is the standard pattern:
- Create a feature branch (e.g.,
claude-code/fix-flaky-tests-2024-01-15) - Commit changes with a descriptive message
- Push to remote (GitHub or GitLab)
- Open a pull request with context, test results, and a changelog
- Tag reviewers (test owners, platform team)
- Wait for approval before merging
Using GitHub API
Claude Code Creating a Pull Request demonstrates the end-to-end workflow. Here’s the implementation:
interface PullRequestOptions {
title: string;
body: string;
baseBranch?: string;
reviewers?: string[];
labels?: string[];
}
async function openPullRequest(
env: Env,
options: PullRequestOptions
): Promise<string> {
const owner = await env.SECRETS.get("GITHUB_REPO_OWNER");
const repo = await env.SECRETS.get("GITHUB_REPO_NAME");
const token = await env.SECRETS.get("GITHUB_TOKEN");
const baseBranch = options.baseBranch || "main";
const featureBranch = `claude-code/${Date.now()}`;
// 1. Create feature branch
const baseRef = await fetch(
`https://api.github.com/repos/${owner}/${repo}/git/refs/heads/${baseBranch}`,
{ headers: { Authorization: `token ${token}` } }
).then((r) => r.json());
await fetch(`https://api.github.com/repos/${owner}/${repo}/git/refs`, {
method: "POST",
headers: { Authorization: `token ${token}` },
body: JSON.stringify({
ref: `refs/heads/${featureBranch}`,
sha: baseRef.object.sha,
}),
});
// 2. Commit changes (already done by Claude Code)
// 3. Push to remote (already done by Claude Code)
// 4. Open PR
const prResponse = await fetch(
`https://api.github.com/repos/${owner}/${repo}/pulls`,
{
method: "POST",
headers: { Authorization: `token ${token}` },
body: JSON.stringify({
title: options.title,
body: options.body,
head: featureBranch,
base: baseBranch,
}),
}
).then((r) => r.json());
const prNumber = prResponse.number;
// 5. Add reviewers
if (options.reviewers && options.reviewers.length > 0) {
await fetch(
`https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}/requested_reviewers`,
{
method: "POST",
headers: { Authorization: `token ${token}` },
body: JSON.stringify({ reviewers: options.reviewers }),
}
);
}
// 6. Add labels
if (options.labels && options.labels.length > 0) {
await fetch(
`https://api.github.com/repos/${owner}/${repo}/issues/${prNumber}/labels`,
{
method: "POST",
headers: { Authorization: `token ${token}` },
body: JSON.stringify({ labels: options.labels }),
}
);
}
return `https://github.com/${owner}/${repo}/pull/${prNumber}`;
}
GitLab Integration
Claude Code GitLab CI/CD documentation shows native support for GitLab merge requests. If you’re on GitLab, the pattern is similar:
async function openMergeRequest(
env: Env,
options: PullRequestOptions
): Promise<string> {
const projectId = await env.SECRETS.get("GITLAB_PROJECT_ID");
const token = await env.SECRETS.get("GITLAB_TOKEN");
const baseBranch = options.baseBranch || "main";
const featureBranch = `claude-code/${Date.now()}`;
// Create merge request
const mrResponse = await fetch(
`https://gitlab.com/api/v4/projects/${projectId}/merge_requests`,
{
method: "POST",
headers: { "PRIVATE-TOKEN": token },
body: JSON.stringify({
title: options.title,
description: options.body,
source_branch: featureBranch,
target_branch: baseBranch,
assignee_ids: options.reviewers,
labels: options.labels,
}),
}
).then((r) => r.json());
return mrResponse.web_url;
}
PR Body Best Practices
Your PR body should include:
- Summary: What problem does this fix?
- Changes: List of files modified, with brief descriptions
- Testing: How was this tested? Include test output if relevant
- Checklist: Items for reviewers to verify
- Confidence score: How confident is Claude Code in this change? (80%, 95%, etc.)
Example:
## Summary
This PR fixes 4 flaky tests in the user service test suite. All tests were failing intermittently due to race conditions in the database setup.
## Changes
- `tests/user.test.js`: Added cache invalidation after user updates
- `tests/auth.test.js`: Increased timeout for external API calls from 5s to 10s
- `tests/profile.test.js`: Fixed race condition in concurrent profile updates
- `src/user.js`: Simplified feature flag logic (removed always-on flag)
## Testing
Ran test suite 10 times locally. All tests pass 100% of the time.
✓ user service (4 tests, 0 failures) ✓ auth service (8 tests, 0 failures) ✓ profile service (6 tests, 0 failures)
## Checklist
- [ ] Review changes for correctness
- [ ] Run tests locally
- [ ] Verify no new flakiness introduced
- [ ] Merge when ready
## Confidence
**95%** — Changes are low-risk (test-only, cache invalidation). No production logic modified.
Security, Permissions, and Audit Readiness
API Key Management
Your background agent needs credentials to:
- Read your repository (GitHub/GitLab API)
- Commit changes (git push)
- Call Claude API (Anthropic)
Mismanage these credentials, and you’ve handed an attacker the keys to your codebase.
Best practices:
- Use short-lived tokens where possible. GitHub personal access tokens can be scoped to specific repositories and permissions.
- Rotate regularly. Set a calendar reminder to rotate credentials every 90 days.
- Store in secrets management. Use Cloudflare KV, AWS Secrets Manager, or HashiCorp Vault—never hardcode credentials.
- Audit access logs. GitHub and GitLab provide audit logs of all API calls. Monitor for suspicious activity.
- Restrict permissions. Your agent token should have minimal permissions: read repository, create branches, open PRs. Not delete repository, not admin access.
Audit Trail and Compliance
If you’re pursuing SOC 2 compliance or ISO 27001 certification, automated agents introduce new audit requirements:
- Who triggered the agent? (scheduled cron, webhook, manual trigger)
- What did it change? (commit history, file diffs)
- Why did it make those changes? (decision logs, reasoning)
- Was it reviewed? (PR approval, merge status)
Design your agent to log all actions:
interface AuditLog {
timestamp: string;
agentId: string;
action: "analysis" | "commit" | "pr_open" | "pr_merge";
repository: string;
branch: string;
filesModified: string[];
reasoning: string;
reviewedBy?: string;
approvedAt?: string;
}
async function logAction(
env: Env,
log: AuditLog
): Promise<void> {
await env.AUDIT_LOG.put(
`${log.timestamp}-${log.agentId}`,
JSON.stringify(log)
);
}
Human-in-the-Loop Approval
Not all changes should merge automatically. For high-risk operations (removing code, modifying security-critical paths), require human approval:
if (changeRiskScore > 0.7) {
// High-risk change: require approval
await openPullRequest({
title: options.title,
body: options.body,
reviewers: ["security-team", "platform-lead"],
labels: ["requires-approval", "high-risk"],
});
// Wait for approval before merging
return;
} else {
// Low-risk change: auto-merge
await mergePullRequest(prNumber);
}
Deployment Patterns and Real-World Examples
Pattern 1: Daily Flaky Test Sweep
Trigger: Cron job daily at 02:00 UTC
Process:
- Fetch test runs from last 24 hours
- Identify tests with >10% failure rate
- Analyse failure logs with Claude Code
- Generate fixes
- Open PR with label
automated/flaky-tests - Tag test owners for review
Expected outcome: 2–4 PRs per week, 80% merge rate within 24 hours.
Pattern 2: Weekly Dead Code Sweep
Trigger: Cron job weekly on Monday at 03:00 UTC
Process:
- Run static analysis (ESLint, Pylint, etc.)
- Identify unused imports, unreachable code, dead flags
- Cross-reference with git history (was this code used in the last 6 months?)
- Generate removal PR
- Open PR with label
automated/cleanup - Auto-merge after 48 hours if no objections
Expected outcome: 1–2 PRs per month, removes 500–2,000 lines of dead code per quarter.
Pattern 3: On-Demand Issue-to-PR
Trigger: GitHub issue label automated-fix
Process:
- Listen for new issues with label
automated-fix - Claude Code reads issue description
- Generates a fix based on the issue
- Opens PR linked to the issue
- Mentions issue author for feedback
Expected outcome: Developers can create issues, and Claude Code auto-generates PRs. Useful for simple tasks (add logging, refactor a function, update docs).
Real-World Case Study: Sydney SaaS Startup
A Sydney-based SaaS company deployed Claude Code background agents across their CI/CD. Results after 3 months:
- Flaky tests reduced by 40%: From 8 flaky tests per week to 5 per month
- Dead code removed: 3,000+ lines of dead code in 2 PRs
- Time saved: 15 hours per month of manual triage and cleanup
- Team satisfaction: Engineers report less frustration with test failures
The company now runs 3 background agents:
- Daily flaky test triage (Cloudflare Worker cron)
- Weekly dead code sweep (Cloudflare Worker cron)
- On-demand issue-to-PR (GitHub Actions webhook)
Total cost: ~$500/month in Anthropic API calls + minimal Cloudflare Workers usage.
Troubleshooting and Observability
Common Issues and Fixes
Issue: “API rate limit exceeded”
Claude Code makes multiple API calls during analysis. If you hit rate limits:
- Solution 1: Increase delay between agent runs (run every 6 hours instead of every 2 hours)
- Solution 2: Batch analysis (analyse 10 test failures together instead of 1 at a time)
- Solution 3: Use caching (store analysis results in KV, skip re-analysis of unchanged code)
Issue: “PR merge conflicts”
If your agent opens a PR and another PR merges first, you may get conflicts:
- Solution: Add conflict detection to your agent. Before opening a PR, check if the branch will merge cleanly.
async function checkMergeability(
env: Env,
featureBranch: string
): Promise<boolean> {
const owner = await env.SECRETS.get("GITHUB_REPO_OWNER");
const repo = await env.SECRETS.get("GITHUB_REPO_NAME");
const token = await env.SECRETS.get("GITHUB_TOKEN");
const response = await fetch(
`https://api.github.com/repos/${owner}/${repo}/merges`,
{
method: "POST",
headers: { Authorization: `token ${token}` },
body: JSON.stringify({
base: "main",
head: featureBranch,
}),
}
).then((r) => r.json());
return response.mergeable === true;
}
Issue: “False positives (agent removes code that’s actually used)”
Static analysis isn’t perfect. Claude Code might identify a function as unused when it’s actually called dynamically or via reflection:
- Solution 1: Add a confidence score threshold. Only remove code with >90% confidence.
- Solution 2: Require human approval for removals. Use the
requires-approvallabel. - Solution 3: Use git blame to check if code was recently modified. Skip removal if last commit is <30 days old.
Observability and Logging
Log everything. Your background agents run unattended; you need visibility into what they’re doing.
interface AgentLog {
timestamp: string;
agentName: string;
status: "running" | "success" | "error";
duration: number; // milliseconds
itemsProcessed: number;
itemsModified: number;
prOpened?: string;
error?: string;
}
async function logAgentRun(
env: Env,
log: AgentLog
): Promise<void> {
// Log to Cloudflare Analytics
await env.ANALYTICS.writeDataPoint({
indexes: [log.agentName],
blobs: [JSON.stringify(log)],
doubles: [log.duration, log.itemsProcessed, log.itemsModified],
});
// Also log to external service (DataDog, New Relic, etc.)
await fetch("https://api.datadoghq.com/api/v1/events", {
method: "POST",
headers: { "DD-API-KEY": await env.SECRETS.get("DATADOG_API_KEY") },
body: JSON.stringify({
title: `${log.agentName} ${log.status}`,
text: `Processed ${log.itemsProcessed} items, modified ${log.itemsModified}`,
alert_type: log.status === "error" ? "error" : "info",
}),
});
}
Set up alerts:
- Alert if agent fails 2 runs in a row (likely a bug or API issue)
- Alert if agent opens >5 PRs in a single run (might be over-aggressive)
- Alert if agent’s PRs have <50% merge rate (might be generating low-quality changes)
Next Steps: From Prototype to Production
Checklist for Production Deployment
- API keys and secrets stored securely in Cloudflare KV or equivalent
- Cron schedule set and tested (run manually first, then schedule)
- PR templates configured (include summary, changes, testing, checklist)
- Reviewer assignments configured (who reviews PRs from this agent?)
- Approval gates in place (auto-merge low-risk changes, require approval for high-risk)
- Logging and observability set up (CloudFlare Analytics, DataDog, or equivalent)
- Alerting configured (failures, rate limits, merge failures)
- Documentation written (how the agent works, how to disable it, how to review its PRs)
- Team training completed (engineers understand the agent’s behaviour, know how to override it)
- Audit trail implemented (all agent actions logged for compliance)
Scaling Beyond Cloudflare Workers
As your agent workload grows, you may outgrow Cloudflare Workers. Consider:
- GitHub Actions (if you’re already on GitHub): Run agents directly in CI/CD, with full access to repo context
- AWS Lambda + EventBridge: More flexible scheduling, better integration with AWS services
- Self-hosted runner: Full control, no vendor lock-in, but requires infrastructure management
Integration with Your Existing Stack
When implementing agentic AI vs traditional automation, consider how background agents fit into your broader automation strategy. Claude Code agents complement (not replace) traditional CI/CD tooling:
- Keep linting, testing, and formatting: These are fast, deterministic, and should run on every commit
- Add Claude Code for high-level analysis: Flaky test triage, dead code detection, architectural decisions
- Use both together: Lint + format on every commit, run Claude Code agents nightly
Evolving Your Agent Strategy
Start simple. Deploy one background agent (e.g., daily flaky test sweep). Let it run for 2 weeks. Measure:
- How many PRs does it open?
- What’s the merge rate?
- Do reviewers have feedback? Are the PRs high-quality?
Once you’re confident, add a second agent (weekly dead code sweep). Repeat the measurement cycle.
After 3 agents are running smoothly, you have the foundation to build more sophisticated automation:
- Dependency updates: Claude Code reviews dependency updates and flags breaking changes
- Documentation generation: Claude Code generates API docs, README updates, changelog entries
- Performance optimization: Claude Code analyses performance logs and suggests optimizations
- Security scanning: Claude Code reviews code for security vulnerabilities and suggests fixes
Partnering for Fractional CTO Leadership
If you’re building sophisticated automation but lack in-house expertise, consider fractional CTO support. PADISO offers CTO as a Service and AI & Agents Automation services to help Sydney and Australian teams architect, deploy, and operate background agents at scale. Our team has shipped production Claude Code agents for 50+ clients, from seed-stage startups to Series-B companies.
We can help with:
- Architecture design: How should your agents fit into your CI/CD?
- Implementation: Building and deploying agents on Cloudflare Workers, GitHub Actions, or Lambda
- Observability: Setting up logging, alerting, and dashboards
- Compliance: Ensuring agents meet your SOC 2 or ISO 27001 requirements
- Team training: Teaching your engineers how to maintain and extend agents
Conclusion
Background agents powered by Claude Code represent a fundamental shift in how teams manage CI/CD. Instead of manual triage, cleanup, and repetitive tasks, you deploy autonomous agents that work 24/7, learning from your codebase and proposing high-quality fixes.
The reference architecture in this guide—scheduled Cloudflare Workers, Claude Code for analysis, GitHub/GitLab APIs for PR automation—is battle-tested and production-ready. Start with one agent (flaky test triage), measure results, and expand from there.
Key takeaways:
- Background agents are cost-effective: A few hundred dollars per month in API calls can save dozens of engineering hours.
- Start simple: Deploy one agent, measure, iterate. Don’t try to automate everything at once.
- Security and audit matter: Log all actions, require approval for high-risk changes, rotate credentials regularly.
- Observability is essential: Without logging and alerting, you won’t know when agents fail or misbehave.
- Humans still review: Agents propose changes; humans approve and merge. This human-in-the-loop pattern ensures quality and safety.
If you’re ready to deploy Claude Code agents in your CI/CD, start with the reference architecture in this guide. If you need help architecting, deploying, or scaling agents, reach out to PADISO for fractional CTO support and AI & Agents Automation services.
Your codebase will thank you.