Artificial IntelligenceSoftware DevelopmentProjects

Build an Autonomous Bug Fixer Agent with Claude API

TT
TopicTrick
Build an Autonomous Bug Fixer Agent with Claude API

Debugging is expensive. A developer encounters a bug report, finds the failing test, reads the code, forms a hypothesis, makes a change, re-runs the tests, finds it did not work, and tries again. This loop can take 30 minutes for a simple bug and hours for a subtle one.

What Does an Autonomous Bug Fixer Agent Do?

A bug fixer agent accepts a bug report or failing test command, runs the tests to reproduce the failure, reads the relevant source files, identifies the root cause, applies a minimal targeted fix, re-runs the full test suite to verify no regressions, and outputs a diff — all autonomously. Built with Claude's tool use API, it compresses a 30-minute debugging cycle into 5 iterations.

A bug fixer agent compresses that loop. Given a failing test (or an error description), the agent reads the relevant code, reasons about the root cause, applies a fix, runs the tests, and keeps iterating until the tests pass. For well-defined, reproducible bugs — the kind that come with a failing test — this process can be fully automated.

In this project you will build an autonomous bug fixer agent that accepts a bug report (described in natural language, or as a failing test command), explores the codebase, fixes the bug, verifies the fix with tests, and produces a clean diff of its changes.

This project extends the agent loop architecture from Build Your First AI Coding Agent. If you have not built that agent yet, read that post first — this project reuses the same ToolExecutor and TOOLS definitions.


Prerequisites

bash

Reuse the agent/tools.py and agent/executor.py from the previous project. This post adds the bug fixer's specialised loop and prompt on top of that foundation.


What Makes a Bug Fixer Different from a General Agent

A general coding agent handles open-ended tasks. A bug fixer has a specific, measurable success condition: all targeted tests pass. This tighter loop allows for a more focused architecture:

  1. Reproduce: run the failing test(s) to confirm the failure and capture the error
  2. Explore: read the relevant source files to understand the code involved
  3. Hypothesise: reason about what is causing the failure
  4. Fix: make a targeted change to address the root cause
  5. Verify: run the tests again to confirm the fix works
  6. Check regressions: run the full test suite to confirm nothing else broke
  7. Report: produce a diff and explanation

The agent iterates steps 3–5 until either the tests pass or it exhausts its retry limit.


Step 1: Bug Fixer System Prompt

The system prompt is more constrained than a general agent — it focuses the model on root cause analysis and minimal, targeted fixes.

python

Step 2: The Bug Fixer Agent

python

Step 3: Create a Test Codebase with Bugs

Let's set up a realistic project with multiple bugs to fix:

python

Run this to create the project, then see the failures:

bash

You should see failures on test_deactivate_user (typo bug) and test_get_active_users (filtering bug).


Step 4: Run the Bug Fixer

python

Expected agent behaviour:

text

Step 5: Handling Edge Cases

Bug Cannot Be Reproduced

Sometimes a bug report is vague. The agent handles this gracefully because it runs the tests first:

python

If the test passes, Claude will report it cannot reproduce the bug and describe what it checked.

Multiple Related Bugs in Different Files

python

The agent will explore both files, identify related issues, and fix them in a single session.

No Test — Error Description Only

python

The agent will read app.py, find the relevant code, add a None check, and verify the fix.


Integrating with GitHub Issues

To automatically fix bugs reported as GitHub issues:

python

Key Takeaways

  • A bug fixer agent differs from a general agent in having a measurable success condition — tests passing
  • Always reproduce first: running the failing test before any edits anchors every subsequent decision in evidence
  • Minimal changes are the key constraint — agents that over-fix introduce regressions. The system prompt must enforce this explicitly
  • Termination signals (BUG_FIXED: / BUG_UNFIXED:) give the orchestrator a reliable way to parse the outcome without LLM parsing of free-form text
  • Regression testing after the targeted fix is non-optional — agents can fix one thing and break another
  • Diff generation lets you review exactly what the agent changed before merging to production

What's Next in the AI Coding Agents Series

  1. What Are AI Coding Agents?
  2. AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
  3. Build Your First AI Coding Agent with the Claude API
  4. Build an Automated GitHub PR Review Agent
  5. Build an Autonomous Bug Fixer Agent ← you are here
  6. AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production

This post is part of the AI Coding Agents Series. Previous post: Build an Automated GitHub PR Review Agent.

To integrate this bug fixer agent into your CI/CD pipeline, see AI Coding Agents in CI/CD: Automate Reviews and Bug Fixes. For the agentic loop fundamentals, see Claude Agentic Loop Explained and Claude Tool Use Explained.

External Resources

  • pytest documentation — the test framework used throughout this project for reproducing and verifying bug fixes.
  • Python subprocess module — official docs for the subprocess calls used in the tool executor's run_command method.