How I Built an AI Resume Matcher
A build-in-public breakdown of creating an AI-powered resume matching system. Covers the architecture, vector search, LLM scoring, and lessons learned.
I built an AI resume matcher that scores how well a candidate's resume matches a job description. Here's the full breakdown — what worked, what failed, and what I'd do differently.
The Problem
Recruiters spend hours scanning resumes. Most ATS systems use keyword matching, which misses qualified candidates who describe their skills differently.
I wanted to build something smarter — a system that understands semantic meaning, not just keywords.
Architecture Overview
The system has three core components:
- Document Processor — Parses resumes and job descriptions into structured data
- Vector Store — Embeds and stores documents for semantic search
- Scoring Engine — Uses an LLM to generate match scores with explanations
Tech Stack
- Backend: Python + FastAPI
- Vector DB: Pinecone
- Embeddings: OpenAI text-embedding-3-small
- LLM: GPT-4o for scoring
- Frontend: Next.js + Tailwind
- Queue: Redis + Celery for async processing
The Document Processor
The hardest part was handling the variety of resume formats — PDF, DOCX, images, and even LinkedIn profile URLs.
I used a combination of:
pymupdffor PDF extractionpython-docxfor Word documents- A vision model for image-based resumes
Key lesson: Spend 50% of your time on data extraction. Garbage in, garbage out.
Vector Search Approach
Each resume gets embedded into a vector space. When a job description comes in, I:
- Embed the job description
- Find the top-N most similar resumes via cosine similarity
- Pass the matches to the LLM for detailed scoring
This two-stage approach keeps costs manageable — the vector search is cheap, and the LLM only processes the top candidates.
What Failed
-
Pure embedding similarity wasn't enough. Two resumes with similar embedding scores could have very different actual relevance. The LLM scoring stage was essential.
-
Resume parsing is a nightmare. Every format, every layout, every encoding issue. I spent 3x more time on parsing than I budgeted.
-
Initial latency was too high. Processing a single resume took 8 seconds. I had to add async processing and a queue system.
What I'd Do Differently
- Start with structured input instead of parsing free-form resumes
- Use a smaller model for initial filtering, save GPT-4 for final scoring
- Build the queue system from day one — don't bolt it on later
- Add evaluation metrics early — without ground truth data, it's hard to know if your system is improving
Results
After 3 weeks of building, the system could:
- Process a resume in under 2 seconds (async)
- Score matches with 85% agreement with human recruiters
- Handle 500+ resumes per hour
Cost Breakdown
| Component | Monthly Cost |
|---|---|
| OpenAI API | ~$50 |
| Pinecone | $70 |
| Hosting (Railway) | $20 |
| Total | ~$140/month |
Not bad for a tool that saves recruiters hours per day.
Key Takeaways
- Data quality trumps model quality — Better parsing beats a better LLM
- Two-stage retrieval works — Cheap vector search + expensive LLM scoring
- Async processing is essential — Don't make users wait for LLM calls
- Build evaluation early — You need metrics to improve
Building in public means sharing the ugly parts too. This project taught me more about production AI systems than any tutorial ever could.
Enjoyed this article?
Get more AI engineering insights delivered to your inbox.