How do I build a CV analyser with the Claude API?

Send the CV text to Claude with a structured prompt that requests specific output fields: skills extracted, years of experience, education level, role match score, and improvement suggestions. Use JSON mode or ask Claude to return valid JSON so your application can parse the results programmatically.

How does Claude handle PDF CVs in the API?

The Claude API accepts base64-encoded PDFs as document blocks in recent model versions, or you can extract text from the PDF with a library like pdfplumber and pass the plain text. Sending the extracted text is simpler and avoids document processing costs for straightforward CV analysis.

What privacy considerations apply when sending CVs to the Claude API?

CVs contain personal data under GDPR and similar regulations. Use Anthropic's data processing agreement, ensure CVs are not retained for model training (check your API plan settings), anonymise or pseudonymise data where possible, and disclose to candidates that AI processing is involved.

Build a Smart CV / Resume Analyser with Claude API

← Back to Claude API Hub

What Does This CV Analyser Do?

This project builds a CV analyser that accepts PDF or plain-text CVs, extracts structured candidate data using Claude's tool use API (name, skills, years of experience, education), scores the candidate against a provided job description across four dimensions, and outputs a plain-English hiring recommendation with identified strengths and gaps - all in under 100 lines of Python.

Hiring teams review dozens to hundreds of CVs for every open role. Reading each one thoroughly takes time, and the most qualified candidates can be buried under a stack of applications. An AI-powered CV analyser automates the extraction and scoring phase - freeing recruiters to focus their energy on the candidates that actually match.

This project builds a complete CV analyser that takes a PDF or plain-text CV as input, extracts structured candidate data, scores the candidate against a provided job description, and generates a plain-English hiring recommendation. The full implementation runs in under 100 lines of Python and is production-ready with minor additions.

What We Are Building

The CV analyser does three things:

Extracts structured data from the CV: name, email, years of experience, current role, skills, education, and employment history
Scores the candidate against a job description on five dimensions: technical skills match, experience level, domain relevance, education requirement, and seniority alignment
Generates a hiring recommendation: a plain-English paragraph explaining whether to interview the candidate and why

Prerequisites

Python 3.9 or later
Anthropic Python SDK: pip install anthropic
For PDF support: pip install pypdf2 or pip install pymupdf
An Anthropic API key set as ANTHROPIC_API_KEY

Project Architecture

The system has three components:

CV ingestion: Reads the CV file (PDF or text) and prepares the content for Claude
Structured extraction: Uses Claude with tool use to extract typed fields from the CV content
Scoring and recommendation: Uses Claude to score and explain the match against the job description

Complete Implementation

python

import anthropic
import json
from pathlib import Path

client = anthropic.Anthropic()


# --- Step 1: CV Ingestion ----------------------------------------------------

def read_cv(file_path: str) -> str:
    """Read CV content from a text or PDF file."""
    path = Path(file_path)
    
    if path.suffix.lower() == ".pdf":
        try:
            import fitz  # PyMuPDF
            doc = fitz.open(file_path)
            text = ""
            for page in doc:
                text += page.get_text()
            return text
        except ImportError:
            raise RuntimeError("Install PyMuPDF: pip install pymupdf")
    
    elif path.suffix.lower() in [".txt", ".md"]:
        return path.read_text(encoding="utf-8")
    
    else:
        raise ValueError(f"Unsupported file type: {path.suffix}")


# --- Step 2: Structured Extraction ------------------------------------------

EXTRACTION_TOOL = {
    "name": "extract_candidate_data",
    "description": "Extract structured information from a candidate CV or resume",
    "input_schema": {
        "type": "object",
        "properties": {
            "full_name": {"type": "string", "description": "Candidate's full name"},
            "email": {"type": "string", "description": "Email address, or null if not found"},
            "phone": {"type": "string", "description": "Phone number, or null if not found"},
            "current_title": {"type": "string", "description": "Most recent job title"},
            "years_experience": {
                "type": "number",
                "description": "Estimated total years of professional experience"
            },
            "skills": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Technical and professional skills mentioned"
            },
            "education": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "degree": {"type": "string"},
                        "field": {"type": "string"},
                        "institution": {"type": "string"},
                        "year": {"type": "string"}
                    }
                },
                "description": "Educational qualifications"
            },
            "work_history": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "title": {"type": "string"},
                        "company": {"type": "string"},
                        "duration": {"type": "string"},
                        "key_achievements": {"type": "array", "items": {"type": "string"}}
                    }
                },
                "description": "Employment history, most recent first"
            }
        },
        "required": ["full_name", "current_title", "years_experience", "skills", "work_history"]
    }
}


def extract_candidate_data(cv_text: str) -> dict:
    """Use Claude to extract structured data from CV text."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=[EXTRACTION_TOOL],
        tool_choice={"type": "tool", "name": "extract_candidate_data"},
        messages=[
            {
                "role": "user",
                "content": f"Extract all candidate information from this CV:\n\n{cv_text}"
            }
        ]
    )
    
    for block in response.content:
        if block.type == "tool_use":
            return block.input
    
    raise RuntimeError("Extraction failed - no tool use block in response")


# --- Step 3: Scoring and Recommendation -------------------------------------

SCORING_TOOL = {
    "name": "score_candidate",
    "description": "Score a candidate against a job description",
    "input_schema": {
        "type": "object",
        "properties": {
            "scores": {
                "type": "object",
                "properties": {
                    "technical_skills": {
                        "type": "integer",
                        "description": "Score 1-10: how well candidate skills match job requirements"
                    },
                    "experience_level": {
                        "type": "integer",
                        "description": "Score 1-10: match between candidate experience and required level"
                    },
                    "domain_relevance": {
                        "type": "integer",
                        "description": "Score 1-10: relevance of candidate's domain experience to this role"
                    },
                    "education": {
                        "type": "integer",
                        "description": "Score 1-10: education qualification match"
                    },
                    "overall": {
                        "type": "integer",
                        "description": "Overall score 1-10"
                    }
                },
                "required": ["technical_skills", "experience_level", "domain_relevance", "education", "overall"]
            },
            "strengths": {
                "type": "array",
                "items": {"type": "string"},
                "description": "2-3 specific strengths relative to this role"
            },
            "gaps": {
                "type": "array",
                "items": {"type": "string"},
                "description": "Key gaps or concerns relative to this role"
            },
            "recommendation": {
                "type": "string",
                "enum": ["strong_yes", "yes", "maybe", "no"],
                "description": "Interview recommendation"
            },
            "recommendation_reasoning": {
                "type": "string",
                "description": "2-3 sentence explanation of the recommendation"
            }
        },
        "required": ["scores", "strengths", "gaps", "recommendation", "recommendation_reasoning"]
    }
}


def score_candidate(candidate_data: dict, job_description: str) -> dict:
    """Score the candidate against the job description."""
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        tools=[SCORING_TOOL],
        tool_choice={"type": "tool", "name": "score_candidate"},
        messages=[
            {
                "role": "user",
                "content": f"""Score this candidate against the job description.

JOB DESCRIPTION:
{job_description}

CANDIDATE PROFILE:
{json.dumps(candidate_data, indent=2)}

Provide an honest assessment. Score 1-10 on each dimension.
Include specific strengths and concrete gaps."""
            }
        ]
    )
    
    for block in response.content:
        if block.type == "tool_use":
            return block.input
    
    raise RuntimeError("Scoring failed - no tool use block in response")


# --- Main Analyser ------------------------------------------------------------

def analyse_cv(cv_file_path: str, job_description: str) -> dict:
    """Run the full CV analysis pipeline."""
    print(f"Reading CV from {cv_file_path}...")
    cv_text = read_cv(cv_file_path)
    
    print("Extracting candidate data...")
    candidate_data = extract_candidate_data(cv_text)
    
    print("Scoring candidate...")
    scoring = score_candidate(candidate_data, job_description)
    
    return {
        "candidate": candidate_data,
        "assessment": scoring
    }


def print_report(analysis: dict) -> None:
    """Print a human-readable analysis report."""
    c = analysis["candidate"]
    a = analysis["assessment"]
    
    print("\n" + "="*60)
    print(f"CANDIDATE: {c['full_name']}")
    print(f"Current Role: {c['current_title']}")
    print(f"Experience: {c['years_experience']} years")
    print(f"Key Skills: {', '.join(c['skills'][:8])}")
    print("="*60)
    
    scores = a["scores"]
    print("\nSCORES:")
    print(f"  Technical Skills:    {scores['technical_skills']}/10")
    print(f"  Experience Level:    {scores['experience_level']}/10")
    print(f"  Domain Relevance:    {scores['domain_relevance']}/10")
    print(f"  Education:           {scores['education']}/10")
    print(f"  -----------------------------")
    print(f"  Overall:             {scores['overall']}/10")
    
    print(f"\nRECOMMENDATION: {a['recommendation'].upper().replace('_', ' ')}")
    print(f"\n{a['recommendation_reasoning']}")
    
    print("\nSTRENGTHS:")
    for s in a["strengths"]:
        print(f"  ✓ {s}")
    
    if a["gaps"]:
        print("\nGAPS:")
        for g in a["gaps"]:
            print(f"  ✗ {g}")
    print("="*60)


# --- Example Usage -----------------------------------------------------------

if __name__ == "__main__":
    job_description = """
    Senior DevOps Engineer - 5+ years experience
    Requirements:
    - Strong Python and Bash scripting
    - Kubernetes and Docker container orchestration
    - AWS or Azure cloud infrastructure experience
    - CI/CD pipeline design (GitHub Actions, Jenkins, or similar)
    - Infrastructure as Code (Terraform or Pulumi)
    - Experience with monitoring stacks (Prometheus, Grafana)
    Preferred: Experience with GitOps, ArgoCD, service mesh (Istio)
    """
    
    # Analyse a CV
    analysis = analyse_cv("candidate_cv.pdf", job_description)
    print_report(analysis)
    
    # Also save JSON for downstream use
    with open("analysis_result.json", "w") as f:
        json.dump(analysis, f, indent=2)

Extending the Project

Batch processing: Add a loop to process an entire folder of CVs and produce a ranked shortlist using the Batch API for 50% cost savings
Web interface: Wrap the analyser in a FastAPI or Flask endpoint that accepts file uploads and returns JSON analysis results
Database integration: Store extracted candidate data and scores in PostgreSQL for filtering, searching, and tracking candidates across multiple roles
Files API: For high-volume environments, upload CV PDFs via the Files API to avoid re-uploading the same document when analysing against multiple job descriptions

Use tool_choice: tool for Reliable Extraction

The scoring and extraction steps both use tool_choice: {type: 'tool', name: ...} to force Claude to produce structured output every time. This is more reliable than asking Claude to return JSON in the message text, because the tool use mechanism enforces the schema. Combined with Python's jsonschema library for post-extraction validation, this pattern produces highly consistent structured output across diverse CV formats.

Summary

This project demonstrates the three-step pattern that underlies most document-processing AI applications: ingest -> extract -> analyse. The same approach applies to processing contracts, financial statements, technical specifications, and any other structured document type.

Extraction: Use tool_choice to guarantee structured JSON output from any document
Scoring: Let Claude apply business logic - matching against requirements, identifying gaps - that would be complex to code manually
Recommendation: Let Claude generate the natural language reasoning that explains its structured scores

Next project: Project: Build a Customer Support Chatbot with Claude API.

For the structured output concepts used in this project, see Claude Structured Outputs and JSON and Claude Tool Use Explained. For processing PDF CVs via the Files API, see Claude Files API Tutorial and Claude Vision, Images, and PDF Analysis.

External Resources

Anthropic Tool Use documentation - the official reference for the structured extraction schema used in this project.
PyMuPDF documentation - the recommended library for extracting text from PDF CVs in Python.

This post is part of the Anthropic AI Tutorial Series. Previous post: AI Agents Refresher: Key Concepts, Patterns, and Pitfalls.

Part of the Claude AI Masterclass.