Guide 22 mins

Claude Code in CI/CD: Background Agents That Open Pull Requests

Deploy Claude Code agents in CI/CD to triage flaky tests, sweep dead flags, and auto-commit cleanup PRs. Complete reference architecture guide.

The PADISO Team ·2026-05-01

Why Background Agents Matter in CI/CD
Claude Code Fundamentals for CI/CD
Reference Architecture: Scheduled Agents on Cloudflare Workers
Triaging Flaky Tests with Claude Code
Dead Code and Flag Sweep Automation
Opening Pull Requests Programmatically
Security, Permissions, and Audit Readiness
Deployment Patterns and Real-World Examples
Troubleshooting and Observability
Next Steps: From Prototype to Production

Why Background Agents Matter in CI/CD

Modern CI/CD pipelines are under constant pressure. Teams ship faster, codebases grow messier, and technical debt accumulates in the gaps between sprints. Flaky tests fail unpredictably. Dead feature flags litter the codebase. Legacy code paths go unrefactored for months. Manual triage and cleanup are expensive, repetitive, and easy to defer.

Background agents—autonomous AI systems that run continuously without human intervention—solve this at scale. Unlike synchronous code review or linting, background agents work asynchronously, triaging issues, proposing fixes, and opening pull requests while your team focuses on feature work.

Claude Code, Anthropic’s AI agent framework for code, is purpose-built for this. When deployed as a background agent in CI/CD, Claude Code can:

Analyse test failures and identify flakiness patterns across runs
Locate and flag dead code, unused imports, and orphaned feature flags
Generate fixes with full context from your repository
Open pull requests automatically with detailed descriptions and test coverage
Commit results back to your repository with proper git history

The outcome: teams spend less time on triage and more time on high-leverage work. One Sydney-based SaaS company reduced test-related toil by 40% in their first month after deploying Claude Code agents. Another fintech startup eliminated 3,000+ lines of dead code in a single automated sweep.

This guide walks through the complete reference architecture for deploying Claude Code as a background agent in CI/CD, with concrete examples for test triage, flag cleanup, and pull request automation.

Claude Code Fundamentals for CI/CD

What Is Claude Code?

Claude Code is Anthropic’s agentic framework for code analysis, generation, and repository operations. Unlike traditional code-generation models, Claude Code agents can:

Read and write files across your repository
Execute shell commands and scripts
Parse test output, logs, and metrics
Commit changes with git
Open pull requests via GitHub or GitLab APIs
Reason about complex codebases over multiple iterations

Claude Code operates in two modes relevant to CI/CD:

Interactive mode: A developer runs Claude Code locally or in an IDE, providing real-time feedback. Useful for one-off refactoring or debugging.
Autonomous/background mode: Claude Code runs on a schedule (via cron, webhooks, or event triggers) without human intervention, making decisions independently and committing results.

Background mode is where the CI/CD magic happens.

Key Capabilities for Background Agents

Code Analysis at Scale Claude Code can ingest entire test suites, parse failure logs, and identify patterns. It understands flakiness—when a test fails intermittently due to timing, race conditions, or external dependencies—and can propose targeted fixes like adding retries, fixing race conditions, or mocking external calls.

Repository Context Awareness Unlike generic LLMs, Claude Code maintains full context of your codebase. It reads your git history, understands your testing framework, and respects your code style. When it generates a fix, it’s not guessing—it’s reasoning from your actual code.

Deterministic Git Operations Claude Code can commit changes, push branches, and open pull requests via GitLab CI/CD and GitHub Actions. This is critical for background agents: they can operate independently, create feature branches, and surface their work for human review without manual intervention.

Async Task Support Recent extensions to Claude Code support background agent execution via task tools and async capabilities, allowing long-running operations (like scanning a 100k-line codebase) to complete without timing out.

Reference Architecture: Scheduled Agents on Cloudflare Workers

Overview

The most practical deployment pattern for background agents uses Cloudflare Workers with cron triggers. Here’s why:

Serverless: No infrastructure to manage. Pay per execution.
Global: Runs from Cloudflare’s edge, with low latency to GitHub/GitLab APIs.
Scheduled: Native cron support for recurring tasks (daily, weekly, or on-demand).
Secure: Secrets stored in Cloudflare KV; no credentials in code.
Observable: Built-in logging and error reporting.

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                   Cloudflare Worker (Cron)                  │
│  Runs daily at 02:00 UTC                                    │
└──────────────────────┬──────────────────────────────────────┘
                       │
          ┌────────────┴────────────┐
          │                         │
    ┌─────▼──────┐           ┌──────▼──────┐
    │  GitHub    │           │  Claude API │
    │  (Fetch    │           │  (Analyse & │
    │   tests,   │           │   Generate) │
    │  logs)     │           └──────┬──────┘
    └────────────┘                  │
          ▲                          │
          │                  ┌───────▼────────┐
          │                  │  Generate Fix  │
          │                  │  & Commit      │
          │                  └────────┬───────┘
          │                           │
          └───────────────────────────┘
               (Push & Open PR)

Step 1: Set Up Cloudflare Worker

Create a new Cloudflare Worker project:

npm create cloudflare@latest my-ci-agent -- --type "Hello World"
cd my-ci-agent

Install dependencies:

npm install @anthropic-ai/sdk

Step 2: Configure Secrets

Store your API keys in Cloudflare KV:

wrangler secret put ANTHROPIC_API_KEY
wrangler secret put GITHUB_TOKEN
wrangler secret put GITHUB_REPO_OWNER
wrangler secret put GITHUB_REPO_NAME

Update wrangler.toml:

[env.production]
bindings = [
  { binding = "SECRETS", type = "kv_namespace", id = "your-kv-id" }
]

[[triggers.crons]]
cron = "0 2 * * *"  # Daily at 02:00 UTC

Step 3: Core Worker Logic

Create src/index.ts:

import Anthropic from "@anthropic-ai/sdk";

interface Env {
  SECRETS: KVNamespace;
}

export default {
  async scheduled(
    event: ScheduledEvent,
    env: Env,
    ctx: ExecutionContext
  ): Promise<void> {
    const client = new Anthropic({
      apiKey: await env.SECRETS.get("ANTHROPIC_API_KEY"),
    });

    const owner = await env.SECRETS.get("GITHUB_REPO_OWNER");
    const repo = await env.SECRETS.get("GITHUB_REPO_NAME");
    const token = await env.SECRETS.get("GITHUB_TOKEN");

    // Fetch recent test failures from GitHub Actions
    const testRuns = await fetchTestRuns(owner, repo, token);

    // Analyse with Claude Code
    const analysis = await client.messages.create({
      model: "claude-opus-4-1",
      max_tokens: 4096,
      tools: [
        {
          name: "bash",
          description: "Run bash commands",
          input_schema: {
            type: "object",
            properties: {
              command: {
                type: "string",
                description: "Bash command to execute",
              },
            },
            required: ["command"],
          },
        },
        {
          name: "read_file",
          description: "Read a file from the repository",
          input_schema: {
            type: "object",
            properties: {
              path: { type: "string" },
            },
            required: ["path"],
          },
        },
      ],
      messages: [
        {
          role: "user",
          content: `You are a CI/CD automation expert. Analyse these test failures and propose fixes:\n\n${JSON.stringify(testRuns, null, 2)}\n\nIdentify flaky tests, dead code, and unused flags. Generate a pull request with fixes.`,
        },
      ],
    });

    console.log("Analysis complete", analysis);
  },
};

async function fetchTestRuns(
  owner: string,
  repo: string,
  token: string
): Promise<object> {
  const response = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/actions/runs?status=failure&per_page=10`,
    {
      headers: { Authorization: `token ${token}` },
    }
  );
  return response.json();
}

Deploy:

wrangler deploy

Your background agent is now live. It runs daily at 02:00 UTC, fetches test failures, and analyses them with Claude Code.

Triaging Flaky Tests with Claude Code

Identifying Flakiness Patterns

Flaky tests are a silent killer. They fail intermittently, eroding confidence in your test suite and blocking deployments unpredictably. Manual triage is tedious: you have to cross-reference test logs, git history, and timing data to identify the root cause.

Claude Code automates this. Here’s how:

Fetch test run history from GitHub Actions or GitLab CI
Aggregate failure patterns (which tests fail together? at what times?)
Analyse logs for common culprits (timeouts, race conditions, external API failures)
Generate targeted fixes (add retries, increase timeouts, mock external calls)
Open a PR with the fix and a detailed explanation

Example: Detecting Race Conditions

Suppose your test suite has a test that fails 15% of the time:

test("should update user profile", async () => {
  const user = await createUser({ name: "Alice" });
  await user.update({ name: "Bob" });
  const updated = await getUser(user.id);
  expect(updated.name).toBe("Bob");
});

The test is flaky because getUser sometimes reads stale data from a cache. A human engineer would spend 30 minutes debugging this. Claude Code identifies it in seconds:

const message = await client.messages.create({
  model: "claude-opus-4-1",
  max_tokens: 2048,
  messages: [
    {
      role: "user",
      content: `
Analyse this test failure pattern:
- Test: "should update user profile"
- Failure rate: 15%
- Failure times: 02:15, 02:47, 03:12, 03:58 UTC
- Error: Expected "Bob" but got "Alice"

What's the root cause? Propose a fix.
      `,
    },
  ],
});

Claude Code responds:

The failure pattern suggests a cache consistency issue. The getUser call is reading from cache before the update propagates. Fix: add a cache invalidation after user.update() or use a cache-busting query parameter.

It then generates the fix:

test("should update user profile", async () => {
  const user = await createUser({ name: "Alice" });
  await user.update({ name: "Bob" });
  // Invalidate cache to ensure fresh read
  await cache.invalidate(`user:${user.id}`);
  const updated = await getUser(user.id);
  expect(updated.name).toBe("Bob");
});

Automating Flaky Test Reports

New Claude tool uses AI agents to find bugs in pull requests, and the same pattern applies to test output. Deploy a background agent that:

Runs daily to fetch test results from the last 7 days
Identifies flaky tests (failure rate > 10%)
Analyses logs to pinpoint root causes
Opens a PR titled “CI: Fix flaky tests in [suite]”
Tags the test owners for review

Example Worker code:

async function triageFlakyTests(env: Env): Promise<void> {
  const client = new Anthropic({
    apiKey: await env.SECRETS.get("ANTHROPIC_API_KEY"),
  });

  // Fetch test runs from last 7 days
  const runs = await fetchTestRunsLastNDays(7);
  const flaky = identifyFlakyTests(runs);

  if (flaky.length === 0) {
    console.log("No flaky tests detected.");
    return;
  }

  const prompt = `
You are a test reliability engineer. Analyse these flaky tests and propose fixes:

${flaky.map((t) => `- ${t.name}: fails ${t.failureRate}% of the time. Last failure: ${t.lastFailure}`).join("\n")}

For each test, identify the root cause and generate a fix. Create a single pull request with all fixes.
  `;

  const response = await client.messages.create({
    model: "claude-opus-4-1",
    max_tokens: 8192,
    messages: [
      {
        role: "user",
        content: prompt,
      },
    ],
  });

  // Extract fixes and open PR
  await openPullRequest(
    "CI: Fix flaky tests",
    response.content[0].type === "text" ? response.content[0].text : ""
  );
}

Dead Code and Flag Sweep Automation

Why Dead Code Matters

Dead code—unreachable functions, unused imports, orphaned feature flags—is more than clutter. It:

Increases cognitive load for developers reading the code
Slows down refactoring (is this function used anywhere?)
Multiplies maintenance costs (do we need to update this for the new API?)
Hides real bugs (unused variables can mask logic errors)

One Sydney fintech firm discovered 12,000+ lines of dead code in their core payment processor. Cleaning it up manually would take weeks. A Claude Code background agent swept it in 48 hours.

Detecting Dead Code with Static Analysis

Claude Code combines static analysis with semantic understanding. It can:

Parse your codebase using language-specific tools (ESLint for JavaScript, Pylint for Python)
Cross-reference imports with usage across the entire repository
Identify feature flags that are always on or always off
Detect unreachable code (dead branches, unreferenced functions)
Generate removal PRs with confidence scores

Example: Sweeping Dead Feature Flags

Feature flags accumulate over time. A flag that was “temporary” becomes permanent. Teams forget to clean up old flags, leading to bloated configuration files and complex conditional logic.

Deploy a background agent that:

async function sweepDeadFlags(env: Env): Promise<void> {
  const client = new Anthropic({
    apiKey: await env.SECRETS.get("ANTHROPIC_API_KEY"),
  });

  // Fetch feature flag definitions
  const flagConfig = await fetchFlagConfig();
  const codebase = await cloneAndAnalyzeRepo();

  const prompt = `
Analyse this feature flag configuration and codebase.
Identify flags that are:
1. Always enabled (remove the flag and simplify code)
2. Always disabled (remove dead branches)
3. Unused (not referenced anywhere)

Generate a pull request that removes dead flags and simplifies conditionals.

Flag config:
${JSON.stringify(flagConfig, null, 2)}
  `;

  const response = await client.messages.create({
    model: "claude-opus-4-1",
    max_tokens: 8192,
    messages: [
      {
        role: "user",
        content: prompt,
      },
    ],
  });

  // Open PR with changes
  const prTitle = `Refactor: Remove ${identifyDeadFlags(flagConfig).length} dead feature flags`;
  await openPullRequest(prTitle, extractChanges(response));
}

Integration with Code Review Tools

Using OpenCode in CI/CD for AI pull request reviews shows how to combine static analysis with AI-driven review. Pair your dead code sweep with a conservative verification step:

Identify candidates for removal (static analysis)
Verify no usage (grep, code search, git blame)
Generate PR with confidence score
Add human approval gate for high-risk removals

This reduces false positives and ensures your team reviews sensitive changes.

Opening Pull Requests Programmatically

The PR Workflow

Once Claude Code has analysed your codebase and generated fixes, it needs to surface those changes for human review. Opening a pull request is the standard pattern:

Create a feature branch (e.g., claude-code/fix-flaky-tests-2024-01-15)
Commit changes with a descriptive message
Push to remote (GitHub or GitLab)
Open a pull request with context, test results, and a changelog
Tag reviewers (test owners, platform team)
Wait for approval before merging

Using GitHub API

Claude Code Creating a Pull Request demonstrates the end-to-end workflow. Here’s the implementation:

interface PullRequestOptions {
  title: string;
  body: string;
  baseBranch?: string;
  reviewers?: string[];
  labels?: string[];
}

async function openPullRequest(
  env: Env,
  options: PullRequestOptions
): Promise<string> {
  const owner = await env.SECRETS.get("GITHUB_REPO_OWNER");
  const repo = await env.SECRETS.get("GITHUB_REPO_NAME");
  const token = await env.SECRETS.get("GITHUB_TOKEN");

  const baseBranch = options.baseBranch || "main";
  const featureBranch = `claude-code/${Date.now()}`;

  // 1. Create feature branch
  const baseRef = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/git/refs/heads/${baseBranch}`,
    { headers: { Authorization: `token ${token}` } }
  ).then((r) => r.json());

  await fetch(`https://api.github.com/repos/${owner}/${repo}/git/refs`, {
    method: "POST",
    headers: { Authorization: `token ${token}` },
    body: JSON.stringify({
      ref: `refs/heads/${featureBranch}`,
      sha: baseRef.object.sha,
    }),
  });

  // 2. Commit changes (already done by Claude Code)
  // 3. Push to remote (already done by Claude Code)

  // 4. Open PR
  const prResponse = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/pulls`,
    {
      method: "POST",
      headers: { Authorization: `token ${token}` },
      body: JSON.stringify({
        title: options.title,
        body: options.body,
        head: featureBranch,
        base: baseBranch,
      }),
    }
  ).then((r) => r.json());

  const prNumber = prResponse.number;

  // 5. Add reviewers
  if (options.reviewers && options.reviewers.length > 0) {
    await fetch(
      `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}/requested_reviewers`,
      {
        method: "POST",
        headers: { Authorization: `token ${token}` },
        body: JSON.stringify({ reviewers: options.reviewers }),
      }
    );
  }

  // 6. Add labels
  if (options.labels && options.labels.length > 0) {
    await fetch(
      `https://api.github.com/repos/${owner}/${repo}/issues/${prNumber}/labels`,
      {
        method: "POST",
        headers: { Authorization: `token ${token}` },
        body: JSON.stringify({ labels: options.labels }),
      }
    );
  }

  return `https://github.com/${owner}/${repo}/pull/${prNumber}`;
}

GitLab Integration

Claude Code GitLab CI/CD documentation shows native support for GitLab merge requests. If you’re on GitLab, the pattern is similar:

async function openMergeRequest(
  env: Env,
  options: PullRequestOptions
): Promise<string> {
  const projectId = await env.SECRETS.get("GITLAB_PROJECT_ID");
  const token = await env.SECRETS.get("GITLAB_TOKEN");

  const baseBranch = options.baseBranch || "main";
  const featureBranch = `claude-code/${Date.now()}`;

  // Create merge request
  const mrResponse = await fetch(
    `https://gitlab.com/api/v4/projects/${projectId}/merge_requests`,
    {
      method: "POST",
      headers: { "PRIVATE-TOKEN": token },
      body: JSON.stringify({
        title: options.title,
        description: options.body,
        source_branch: featureBranch,
        target_branch: baseBranch,
        assignee_ids: options.reviewers,
        labels: options.labels,
      }),
    }
  ).then((r) => r.json());

  return mrResponse.web_url;
}

PR Body Best Practices

Your PR body should include:

Summary: What problem does this fix?
Changes: List of files modified, with brief descriptions
Testing: How was this tested? Include test output if relevant
Checklist: Items for reviewers to verify
Confidence score: How confident is Claude Code in this change? (80%, 95%, etc.)

Example:

## Summary

This PR fixes 4 flaky tests in the user service test suite. All tests were failing intermittently due to race conditions in the database setup.

## Changes

- `tests/user.test.js`: Added cache invalidation after user updates
- `tests/auth.test.js`: Increased timeout for external API calls from 5s to 10s
- `tests/profile.test.js`: Fixed race condition in concurrent profile updates
- `src/user.js`: Simplified feature flag logic (removed always-on flag)

## Testing

Ran test suite 10 times locally. All tests pass 100% of the time.

✓ user service (4 tests, 0 failures) ✓ auth service (8 tests, 0 failures) ✓ profile service (6 tests, 0 failures)


## Checklist

- [ ] Review changes for correctness
- [ ] Run tests locally
- [ ] Verify no new flakiness introduced
- [ ] Merge when ready

## Confidence

**95%** — Changes are low-risk (test-only, cache invalidation). No production logic modified.

Security, Permissions, and Audit Readiness

API Key Management

Your background agent needs credentials to:

Read your repository (GitHub/GitLab API)
Commit changes (git push)
Call Claude API (Anthropic)

Mismanage these credentials, and you’ve handed an attacker the keys to your codebase.

Best practices:

Use short-lived tokens where possible. GitHub personal access tokens can be scoped to specific repositories and permissions.
Rotate regularly. Set a calendar reminder to rotate credentials every 90 days.
Store in secrets management. Use Cloudflare KV, AWS Secrets Manager, or HashiCorp Vault—never hardcode credentials.
Audit access logs. GitHub and GitLab provide audit logs of all API calls. Monitor for suspicious activity.
Restrict permissions. Your agent token should have minimal permissions: read repository, create branches, open PRs. Not delete repository, not admin access.

Audit Trail and Compliance

If you’re pursuing SOC 2 compliance or ISO 27001 certification, automated agents introduce new audit requirements:

Who triggered the agent? (scheduled cron, webhook, manual trigger)
What did it change? (commit history, file diffs)
Why did it make those changes? (decision logs, reasoning)
Was it reviewed? (PR approval, merge status)

Design your agent to log all actions:

interface AuditLog {
  timestamp: string;
  agentId: string;
  action: "analysis" | "commit" | "pr_open" | "pr_merge";
  repository: string;
  branch: string;
  filesModified: string[];
  reasoning: string;
  reviewedBy?: string;
  approvedAt?: string;
}

async function logAction(
  env: Env,
  log: AuditLog
): Promise<void> {
  await env.AUDIT_LOG.put(
    `${log.timestamp}-${log.agentId}`,
    JSON.stringify(log)
  );
}

Human-in-the-Loop Approval

Not all changes should merge automatically. For high-risk operations (removing code, modifying security-critical paths), require human approval:

if (changeRiskScore > 0.7) {
  // High-risk change: require approval
  await openPullRequest({
    title: options.title,
    body: options.body,
    reviewers: ["security-team", "platform-lead"],
    labels: ["requires-approval", "high-risk"],
  });

  // Wait for approval before merging
  return;
} else {
  // Low-risk change: auto-merge
  await mergePullRequest(prNumber);
}

Deployment Patterns and Real-World Examples

Pattern 1: Daily Flaky Test Sweep

Trigger: Cron job daily at 02:00 UTC

Process:

Fetch test runs from last 24 hours
Identify tests with >10% failure rate
Analyse failure logs with Claude Code
Generate fixes
Open PR with label automated/flaky-tests
Tag test owners for review

Expected outcome: 2–4 PRs per week, 80% merge rate within 24 hours.

Pattern 2: Weekly Dead Code Sweep

Trigger: Cron job weekly on Monday at 03:00 UTC

Process:

Run static analysis (ESLint, Pylint, etc.)
Identify unused imports, unreachable code, dead flags
Cross-reference with git history (was this code used in the last 6 months?)
Generate removal PR
Open PR with label automated/cleanup
Auto-merge after 48 hours if no objections

Expected outcome: 1–2 PRs per month, removes 500–2,000 lines of dead code per quarter.

Pattern 3: On-Demand Issue-to-PR

Trigger: GitHub issue label automated-fix

Process:

Listen for new issues with label automated-fix
Claude Code reads issue description
Generates a fix based on the issue
Opens PR linked to the issue
Mentions issue author for feedback

Expected outcome: Developers can create issues, and Claude Code auto-generates PRs. Useful for simple tasks (add logging, refactor a function, update docs).

Real-World Case Study: Sydney SaaS Startup

A Sydney-based SaaS company deployed Claude Code background agents across their CI/CD. Results after 3 months:

Flaky tests reduced by 40%: From 8 flaky tests per week to 5 per month
Dead code removed: 3,000+ lines of dead code in 2 PRs
Time saved: 15 hours per month of manual triage and cleanup
Team satisfaction: Engineers report less frustration with test failures

The company now runs 3 background agents:

Daily flaky test triage (Cloudflare Worker cron)
Weekly dead code sweep (Cloudflare Worker cron)
On-demand issue-to-PR (GitHub Actions webhook)

Total cost: ~$500/month in Anthropic API calls + minimal Cloudflare Workers usage.

Troubleshooting and Observability

Common Issues and Fixes

Issue: “API rate limit exceeded”

Claude Code makes multiple API calls during analysis. If you hit rate limits:

Solution 1: Increase delay between agent runs (run every 6 hours instead of every 2 hours)
Solution 2: Batch analysis (analyse 10 test failures together instead of 1 at a time)
Solution 3: Use caching (store analysis results in KV, skip re-analysis of unchanged code)

Issue: “PR merge conflicts”

If your agent opens a PR and another PR merges first, you may get conflicts:

Solution: Add conflict detection to your agent. Before opening a PR, check if the branch will merge cleanly.

async function checkMergeability(
  env: Env,
  featureBranch: string
): Promise<boolean> {
  const owner = await env.SECRETS.get("GITHUB_REPO_OWNER");
  const repo = await env.SECRETS.get("GITHUB_REPO_NAME");
  const token = await env.SECRETS.get("GITHUB_TOKEN");

  const response = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/merges`,
    {
      method: "POST",
      headers: { Authorization: `token ${token}` },
      body: JSON.stringify({
        base: "main",
        head: featureBranch,
      }),
    }
  ).then((r) => r.json());

  return response.mergeable === true;
}

Issue: “False positives (agent removes code that’s actually used)”

Static analysis isn’t perfect. Claude Code might identify a function as unused when it’s actually called dynamically or via reflection:

Solution 1: Add a confidence score threshold. Only remove code with >90% confidence.
Solution 2: Require human approval for removals. Use the requires-approval label.
Solution 3: Use git blame to check if code was recently modified. Skip removal if last commit is <30 days old.

Observability and Logging

Log everything. Your background agents run unattended; you need visibility into what they’re doing.

interface AgentLog {
  timestamp: string;
  agentName: string;
  status: "running" | "success" | "error";
  duration: number; // milliseconds
  itemsProcessed: number;
  itemsModified: number;
  prOpened?: string;
  error?: string;
}

async function logAgentRun(
  env: Env,
  log: AgentLog
): Promise<void> {
  // Log to Cloudflare Analytics
  await env.ANALYTICS.writeDataPoint({
    indexes: [log.agentName],
    blobs: [JSON.stringify(log)],
    doubles: [log.duration, log.itemsProcessed, log.itemsModified],
  });

  // Also log to external service (DataDog, New Relic, etc.)
  await fetch("https://api.datadoghq.com/api/v1/events", {
    method: "POST",
    headers: { "DD-API-KEY": await env.SECRETS.get("DATADOG_API_KEY") },
    body: JSON.stringify({
      title: `${log.agentName} ${log.status}`,
      text: `Processed ${log.itemsProcessed} items, modified ${log.itemsModified}`,
      alert_type: log.status === "error" ? "error" : "info",
    }),
  });
}

Set up alerts:

Alert if agent fails 2 runs in a row (likely a bug or API issue)
Alert if agent opens >5 PRs in a single run (might be over-aggressive)
Alert if agent’s PRs have <50% merge rate (might be generating low-quality changes)

Next Steps: From Prototype to Production

Checklist for Production Deployment

Scaling Beyond Cloudflare Workers

As your agent workload grows, you may outgrow Cloudflare Workers. Consider:

GitHub Actions (if you’re already on GitHub): Run agents directly in CI/CD, with full access to repo context
AWS Lambda + EventBridge: More flexible scheduling, better integration with AWS services
Self-hosted runner: Full control, no vendor lock-in, but requires infrastructure management

Integration with Your Existing Stack

When implementing agentic AI vs traditional automation, consider how background agents fit into your broader automation strategy. Claude Code agents complement (not replace) traditional CI/CD tooling:

Keep linting, testing, and formatting: These are fast, deterministic, and should run on every commit
Add Claude Code for high-level analysis: Flaky test triage, dead code detection, architectural decisions
Use both together: Lint + format on every commit, run Claude Code agents nightly

Evolving Your Agent Strategy

Start simple. Deploy one background agent (e.g., daily flaky test sweep). Let it run for 2 weeks. Measure:

How many PRs does it open?
What’s the merge rate?
Do reviewers have feedback? Are the PRs high-quality?

Once you’re confident, add a second agent (weekly dead code sweep). Repeat the measurement cycle.

After 3 agents are running smoothly, you have the foundation to build more sophisticated automation:

Dependency updates: Claude Code reviews dependency updates and flags breaking changes
Documentation generation: Claude Code generates API docs, README updates, changelog entries
Performance optimization: Claude Code analyses performance logs and suggests optimizations
Security scanning: Claude Code reviews code for security vulnerabilities and suggests fixes

Partnering for Fractional CTO Leadership

If you’re building sophisticated automation but lack in-house expertise, consider fractional CTO support. PADISO offers CTO as a Service and AI & Agents Automation services to help Sydney and Australian teams architect, deploy, and operate background agents at scale. Our team has shipped production Claude Code agents for 50+ clients, from seed-stage startups to Series-B companies.

We can help with:

Architecture design: How should your agents fit into your CI/CD?
Implementation: Building and deploying agents on Cloudflare Workers, GitHub Actions, or Lambda
Observability: Setting up logging, alerting, and dashboards
Compliance: Ensuring agents meet your SOC 2 or ISO 27001 requirements
Team training: Teaching your engineers how to maintain and extend agents

Conclusion

Background agents powered by Claude Code represent a fundamental shift in how teams manage CI/CD. Instead of manual triage, cleanup, and repetitive tasks, you deploy autonomous agents that work 24/7, learning from your codebase and proposing high-quality fixes.

The reference architecture in this guide—scheduled Cloudflare Workers, Claude Code for analysis, GitHub/GitLab APIs for PR automation—is battle-tested and production-ready. Start with one agent (flaky test triage), measure results, and expand from there.

Key takeaways:

Background agents are cost-effective: A few hundred dollars per month in API calls can save dozens of engineering hours.
Start simple: Deploy one agent, measure, iterate. Don’t try to automate everything at once.
Security and audit matter: Log all actions, require approval for high-risk changes, rotate credentials regularly.
Observability is essential: Without logging and alerting, you won’t know when agents fail or misbehave.
Humans still review: Agents propose changes; humans approve and merge. This human-in-the-loop pattern ensures quality and safety.

If you’re ready to deploy Claude Code agents in your CI/CD, start with the reference architecture in this guide. If you need help architecting, deploying, or scaling agents, reach out to PADISO for fractional CTO support and AI & Agents Automation services.

Your codebase will thank you.

Claude Code in CI/CD: Background Agents That Open Pull Requests

Table of Contents

Why Background Agents Matter in CI/CD

Claude Code Fundamentals for CI/CD

What Is Claude Code?

Key Capabilities for Background Agents

Reference Architecture: Scheduled Agents on Cloudflare Workers

Overview

Architecture Diagram

Step 1: Set Up Cloudflare Worker

Step 2: Configure Secrets

Step 3: Core Worker Logic

Triaging Flaky Tests with Claude Code

Identifying Flakiness Patterns

Example: Detecting Race Conditions

Automating Flaky Test Reports

Dead Code and Flag Sweep Automation

Why Dead Code Matters

Detecting Dead Code with Static Analysis

Example: Sweeping Dead Feature Flags

Integration with Code Review Tools

Opening Pull Requests Programmatically

The PR Workflow

Using GitHub API

GitLab Integration

PR Body Best Practices

Security, Permissions, and Audit Readiness

API Key Management

Audit Trail and Compliance

Human-in-the-Loop Approval

Deployment Patterns and Real-World Examples

Pattern 1: Daily Flaky Test Sweep

Pattern 2: Weekly Dead Code Sweep

Pattern 3: On-Demand Issue-to-PR

Real-World Case Study: Sydney SaaS Startup

Troubleshooting and Observability

Common Issues and Fixes

Observability and Logging

Next Steps: From Prototype to Production

Checklist for Production Deployment

Scaling Beyond Cloudflare Workers

Integration with Your Existing Stack

Evolving Your Agent Strategy

Partnering for Fractional CTO Leadership

Conclusion