Build an Autonomous Bug Fixer Agent with Claude API

Debugging is expensive. A developer encounters a bug report, finds the failing test, reads the code, forms a hypothesis, makes a change, re-runs the tests, finds it did not work, and tries again. This loop can take 30 minutes for a simple bug and hours for a subtle one.
What Does an Autonomous Bug Fixer Agent Do?
A bug fixer agent accepts a bug report or failing test command, runs the tests to reproduce the failure, reads the relevant source files, identifies the root cause, applies a minimal targeted fix, re-runs the full test suite to verify no regressions, and outputs a diff — all autonomously. Built with Claude's tool use API, it compresses a 30-minute debugging cycle into 5 iterations.
A bug fixer agent compresses that loop. Given a failing test (or an error description), the agent reads the relevant code, reasons about the root cause, applies a fix, runs the tests, and keeps iterating until the tests pass. For well-defined, reproducible bugs — the kind that come with a failing test — this process can be fully automated.
In this project you will build an autonomous bug fixer agent that accepts a bug report (described in natural language, or as a failing test command), explores the codebase, fixes the bug, verifies the fix with tests, and produces a clean diff of its changes.
This project extends the agent loop architecture from Build Your First AI Coding Agent. If you have not built that agent yet, read that post first — this project reuses the same ToolExecutor and TOOLS definitions.
Prerequisites
Reuse the agent/tools.py and agent/executor.py from the previous project. This post adds the bug fixer's specialised loop and prompt on top of that foundation.
What Makes a Bug Fixer Different from a General Agent
A general coding agent handles open-ended tasks. A bug fixer has a specific, measurable success condition: all targeted tests pass. This tighter loop allows for a more focused architecture:
- Reproduce: run the failing test(s) to confirm the failure and capture the error
- Explore: read the relevant source files to understand the code involved
- Hypothesise: reason about what is causing the failure
- Fix: make a targeted change to address the root cause
- Verify: run the tests again to confirm the fix works
- Check regressions: run the full test suite to confirm nothing else broke
- Report: produce a diff and explanation
The agent iterates steps 3–5 until either the tests pass or it exhausts its retry limit.
Step 1: Bug Fixer System Prompt
The system prompt is more constrained than a general agent — it focuses the model on root cause analysis and minimal, targeted fixes.
Step 2: The Bug Fixer Agent
Step 3: Create a Test Codebase with Bugs
Let's set up a realistic project with multiple bugs to fix:
Run this to create the project, then see the failures:
You should see failures on test_deactivate_user (typo bug) and test_get_active_users (filtering bug).
Step 4: Run the Bug Fixer
Expected agent behaviour:
Step 5: Handling Edge Cases
Bug Cannot Be Reproduced
Sometimes a bug report is vague. The agent handles this gracefully because it runs the tests first:
If the test passes, Claude will report it cannot reproduce the bug and describe what it checked.
Multiple Related Bugs in Different Files
The agent will explore both files, identify related issues, and fix them in a single session.
No Test — Error Description Only
The agent will read app.py, find the relevant code, add a None check, and verify the fix.
Integrating with GitHub Issues
To automatically fix bugs reported as GitHub issues:
Key Takeaways
- A bug fixer agent differs from a general agent in having a measurable success condition — tests passing
- Always reproduce first: running the failing test before any edits anchors every subsequent decision in evidence
- Minimal changes are the key constraint — agents that over-fix introduce regressions. The system prompt must enforce this explicitly
- Termination signals (
BUG_FIXED:/BUG_UNFIXED:) give the orchestrator a reliable way to parse the outcome without LLM parsing of free-form text - Regression testing after the targeted fix is non-optional — agents can fix one thing and break another
- Diff generation lets you review exactly what the agent changed before merging to production
What's Next in the AI Coding Agents Series
- What Are AI Coding Agents?
- AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
- Build Your First AI Coding Agent with the Claude API
- Build an Automated GitHub PR Review Agent
- Build an Autonomous Bug Fixer Agent ← you are here
- AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production
This post is part of the AI Coding Agents Series. Previous post: Build an Automated GitHub PR Review Agent.
To integrate this bug fixer agent into your CI/CD pipeline, see AI Coding Agents in CI/CD: Automate Reviews and Bug Fixes. For the agentic loop fundamentals, see Claude Agentic Loop Explained and Claude Tool Use Explained.
External Resources
- pytest documentation — the test framework used throughout this project for reproducing and verifying bug fixes.
- Python subprocess module — official docs for the subprocess calls used in the tool executor's run_command method.
