Build a Data Analyst Agent with Claude API

What Does This Data Analyst Agent Do?
This project builds an agent that accepts a CSV file and a plain-English question, writes pandas/Python code to analyse the data, executes that code in a sandboxed subprocess, and returns a clear natural language answer with the numbers. Multi-turn conversation lets users ask follow-up questions that reference previous results. No SQL or pandas knowledge required from the end user.
Most data insight questions are simple. "What was our best sales month?" "Which product category has the highest return rate?" "Show me the top ten customers by revenue." But answering them means opening a spreadsheet, writing a formula or a query, and waiting.
A data analyst agent backed by Claude changes that equation. Upload a CSV, ask in plain English, and the agent writes and executes the analysis code, then explains the result in plain language — no SQL, no pandas knowledge required. The Anthropic API getting started guide has everything you need to configure your API key and make your first call before building this agent.
This project builds that agent: Claude as the reasoning brain, Python execution as the tool, and an agentic loop that iterates until the question is fully answered.
Architecture
The agent has three components:
- Data loader: Read CSV into a pandas DataFrame and generate a schema summary Claude can reference
- Code executor tool: A sandboxed Python execution environment Claude calls to run analysis code
- Agentic loop: Claude interprets the question, writes code, observes results, and may run additional code until satisfied
The security-critical element is the code executor. Claude generates Python code; you must never eval() it in your production environment without restrictions. This implementation uses a subprocess sandbox with a strict allowlist. The Anthropic tool use documentation covers how Claude decides which tools to call and how to structure tool results safely.
Complete Implementation
Code Execution Security
Never run Claude-generated code directly in your main process without sandboxing. This implementation uses subprocess isolation with a 30-second timeout, a static allowlist check, and a restricted environment with no access to network imports. For production deployments, consider container-based sandboxing (like Docker gVisor) or a dedicated code execution service for stronger isolation.
Example Session
Sample output for a sales dataset:
Extending to Charts
Claude can generate matplotlib chart code too. Add a matplotlib chart generation tool that saves a PNG to a temp path and returns the path, then display or serve it. For a web interface, pair this agent with a FastAPI backend and a React frontend that renders the chart images alongside Claude's analysis text.
Summary
This data analyst agent turns any CSV into an interactive analytics tool — no SQL, no pandas expertise required from the end user.
- The schema summary given to Claude at startup prevents hallucinated column names and ensures accurate queries
- Subprocess sandboxing is non-negotiable: Claude-generated code must never execute with elevated privileges
- The agentic loop with max_iterations prevents infinite loops while allowing multi-step analyses
- Conversation history enables follow-up questions that reference previous results naturally
Next IT pro project: Project: Deploy Claude on AWS Bedrock — A Production Setup Guide.
For the underlying agent patterns, see Claude Agentic Loop Explained and Claude Tool Use Explained. To add semantic search over data documentation, see Build Semantic Search from Scratch.
External Resources
- pandas documentation — the data analysis library the agent generates code for; essential reading for extending this project.
- Python subprocess module — used for sandboxed code execution; understand the security model before deploying.
This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build an AI-Powered IT Incident Report Generator.
