Panda Coding School
Back to Blog

How I Built an AI Resume Matcher Using RAG and LLM Scoring

A build-in-public breakdown of creating an AI-powered resume matching system using RAG, vector search, and LLM scoring. What worked, what failed, and the full cost breakdown.

Panda Coding SchoolMay 2, 20263 min read

I built an AI resume matcher that uses RAG and LLM scoring to measure how well a candidate's resume fits a job description. Here's the full breakdown: what worked, what failed, and what I'd do differently next time.

The Problem

Recruiters spend hours scanning resumes. Most ATS systems rely on keyword matching, which means they miss qualified candidates who describe their skills a bit differently.

I wanted to build something smarter: a system that understands the meaning behind the words, not just the words themselves.

Architecture Overview

The system has three core components:

  1. Document Processor - Parses resumes and job descriptions into structured data
  2. Vector Store - Embeds and stores documents for semantic search
  3. Scoring Engine - Uses an LLM to generate match scores with explanations

Tech Stack

  • Backend: Python + FastAPI
  • Vector DB: Pinecone
  • Embeddings: OpenAI text-embedding-3-small
  • LLM: GPT-4o for scoring
  • Frontend: Next.js + Tailwind
  • Queue: Redis + Celery for async processing

The Document Processor

The hardest part was handling the range of resume formats out there: PDF, DOCX, images, and even LinkedIn profile URLs.

I ended up using:

  • pymupdf for PDF extraction
  • python-docx for Word documents
  • A vision model for image-based resumes

Key lesson: Spend 50% of your time on data extraction. Garbage in, garbage out.

Vector Search Approach

Each resume gets embedded into a vector space. When a job description comes in, I:

  1. Embed the job description
  2. Find the top-N most similar resumes via cosine similarity
  3. Pass the matches to the LLM for detailed scoring

The two-stage approach keeps costs manageable. Vector search is cheap, and the LLM only touches the top candidates.

What Failed

  1. Pure embedding similarity wasn't enough. Two resumes with similar embedding scores could have very different actual relevance. The LLM scoring step turned out to be essential, not optional.

  2. Resume parsing is a nightmare. Every format, every layout, every encoding quirk. I spent 3x more time on parsing than I budgeted.

  3. Initial latency was way too high. Processing a single resume took 8 seconds. I had to add async processing and a proper queue system.

What I'd Do Differently

  • Start with structured input instead of trying to parse free-form resumes
  • Use a smaller model for initial filtering and save GPT-4 for final scoring
  • Build the queue system from day one rather than bolting it on later
  • Add evaluation metrics early because without ground truth data, you can't tell if things are actually getting better

Results

After 3 weeks of building, the system could:

  • Process a resume in under 2 seconds (async)
  • Score matches with 85% agreement with human recruiters
  • Handle 500+ resumes per hour

Cost Breakdown

ComponentMonthly Cost
OpenAI API~$50
Pinecone$70
Hosting (Railway)$20
Total~$140/month

Not bad for a tool that saves recruiters hours every day.

Key Takeaways

  1. Data quality beats model quality. Better parsing beats a fancier LLM every time.
  2. Two-stage retrieval works well. Cheap vector search plus targeted LLM scoring is the right pattern.
  3. Async processing is not optional. Don't make users wait on LLM calls.
  4. Build evaluation early. You need metrics to know if you're actually improving.

Building in public means sharing the ugly parts too. This project taught me more about production AI systems than any tutorial ever has.

Enjoyed this article?

Get more AI engineering insights delivered to your inbox.