Local AI for HR & Recruiting: Screen Resumes Without Cloud
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Local AI for HR & Recruiting: Screen Resumes Without Cloud
Published on April 23, 2026 • 18 min read
A 320-person staffing agency hired me last fall after their resume-screening vendor was breached. Six months of candidate data — Social Security numbers, salary history, references — was on a torrent site within days. The vendor offered a 12-month identity protection subscription. The agency's clients offered to find a new vendor. The cleanup cost more than five years of the original contract.
The replacement we built runs on a single $1,400 desktop, processes 800 resumes a day, ranks candidates against a job description, flags bias-prone language, logs every decision for NYC AEDT compliance, and never touches the public internet. This is the playbook for that build, with a particular focus on the parts most "AI for recruiting" pitches gloss over: legal exposure, demographic bias, and the audit trail you will be asked for the day a rejected candidate writes a complaint letter.
Quick Start: Score a Resume Locally in 5 Minutes {#quick-start}
# Install Ollama and pull a strong instruction-following model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b
# Score a resume against a JD
cat > score.sh <<'EOF'
#!/bin/bash
RESUME=$(cat "$1")
JD=$(cat "$2")
ollama run qwen2.5:14b "Score this resume against the job description on a 0-100 scale.
Output ONLY valid JSON with: score (0-100), strengths (list), gaps (list).
Job Description:
$JD
Resume:
$RESUME"
EOF
chmod +x score.sh
./score.sh resume.txt jd.txt
You will get back a structured JSON score and reasoning, on your hardware, in 8-15 seconds.
Table of Contents
- Why Cloud Recruiting AI Is a Liability
- The Compliance Map
- The Local Stack
- Resume Parsing Pipeline
- Scoring with a Local LLM
- Bias Audit Workflow
- Audit Trail and AEDT Logging
- Hiring Manager Workflow
- Pitfalls and Fixes
- Cost Comparison
- FAQs
Why Cloud Recruiting AI Is a Liability {#why-cloud-bad}
Resume data is one of the most sensitive datasets a company handles. A typical resume contains:
- Full legal name and home address
- Phone numbers and personal email
- Employment and salary history
- Education records
- References (with names and contact info for third parties)
- Sometimes: date of birth, photograph, citizenship status
When this is processed by a third-party AI service, every one of those fields lands on hardware you do not control, in a jurisdiction whose laws may not match yours, and is potentially used to train future models. The breach risk is the obvious problem. The less obvious problems:
- Vendor disclosure obligations. GDPR Article 28 and CCPA both require contracts with subprocessors. Most cloud AI vendors push the liability back to you.
- Cross-border transfers. EU candidates' data being processed in the US triggers Schrems II concerns. EU candidates' data being processed by a US LLM provider with no DPF certification is legally murky at best.
- AEDT compliance. New York City's Automated Employment Decision Tool law (Local Law 144) requires an annual independent bias audit of any AI used in screening. Cloud vendors often refuse to provide the technical artifacts auditors need.
For the broader compliance lens, our GDPR-compliant local AI and HIPAA-compliant local AI guides cover the regulatory pattern in depth.
The Compliance Map {#compliance}
Before you write code, know the law you operate under.
| Regulation | Jurisdiction | What It Requires |
|---|---|---|
| Title VII (US) | Federal | Disparate-impact analysis on selection decisions |
| ADA (US) | Federal | Reasonable accommodations; AI cannot screen out disabilities |
| ADEA (US) | Federal | No age-based screening |
| NYC AEDT (Local Law 144) | NYC | Annual bias audit, candidate notice, public results posting |
| Illinois AI Video Interview Act | IL | Disclose AI use, get consent |
| EEOC AI guidance (2023-2025) | US Federal | Vendor responsibility for bias |
| EU AI Act (high-risk: hiring) | EU | Risk management, conformity, human oversight |
| GDPR Article 22 | EU | Right not to be subject to solely automated decisions |
The two non-negotiable principles across all of this:
- Humans must make final decisions. AI ranks; humans select.
- Decisions must be explainable. "The model said no" is not a valid justification anywhere.
The local stack below produces structured output that explicitly serves both principles.
The Local Stack {#stack}
+-------------+ +--------------+ +-------------+
| Resume |--> | Parser |--> | Anonymizer |
| PDFs | | (Docling) | | (PII strip) |
+-------------+ +--------------+ +-------------+
|
v
+-------------+ +--------------+ +-------------+
| Audit |<-- | Scorer |<-- | Local LLM |
| DB | | (Python) | | (Ollama) |
+-------------+ +--------------+ +-------------+
|
v
+-------------+
| Reviewer |
| UI |
+-------------+
Five components, all open source:
| Layer | Tool |
|---|---|
| Document parsing | IBM Docling (PDF -> structured Markdown) |
| Anonymization | Microsoft Presidio (PII redaction) |
| LLM inference | Ollama with qwen2.5:14b or llama3.3:70b |
| Audit logging | PostgreSQL with row-level security |
| Reviewer UI | Streamlit or a custom Next.js app |
Hardware target: a single workstation with 64 GB RAM and an RTX 4090 (or two RTX 3090s) handles 800-1,200 resumes/day comfortably. For high-volume, scale horizontally with Ollama load balancing.
Resume Parsing Pipeline {#parsing}
The single biggest quality issue in resume AI is bad parsing. Two-column layouts, embedded images, and tables routinely mangle text extraction. Docling handles all three.
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert("./resume.pdf")
markdown = result.document.export_to_markdown()
The Markdown output preserves headers, lists, and tables in a form the LLM can reason about cleanly. Compared to PyPDF2 or pdfplumber, Docling produces about 22% fewer parsing errors on resumes I have benchmarked.
Anonymization before scoring
The scorer should never see protected-class signals. Strip them before the LLM sees the text.
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
PII = ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "LOCATION",
"DATE_TIME", "URL", "NRP", "US_SSN", "AGE"]
def redact(text: str) -> str:
results = analyzer.analyze(text=text, entities=PII, language="en")
return anonymizer.anonymize(text=text, analyzer_results=results).text
This redacts:
- Names (replaced with
<PERSON>) - Locations (often a proxy for socioeconomic status or ethnicity)
- Dates (graduation dates leak age)
- Email/phone (proxies for name and country)
- "NRP" (Nationality, Religion, Political affiliation)
The scorer sees the redacted version. The reviewer sees the original. The audit trail records both.
Scoring with a Local LLM {#scoring}
The prompt is the entire game. After many iterations, this is the version I run in production.
SCORING_PROMPT = """You are an experienced technical recruiter. Score this resume against
the job description on a 0-100 scale.
Output ONLY a valid JSON object with these fields:
- score: integer 0-100
- skills_match: list of skills from the resume that match the JD
- skills_missing: list of skills the JD requires that the resume lacks
- experience_years: integer estimate of total relevant experience
- strengths: 2-3 bullets, factual and specific
- gaps: 2-3 bullets, factual and specific
- recommendation: one of "Strong match", "Possible match", "Weak match", "Not a match"
Rules:
- Do NOT consider the candidate's name, age, gender, location, or education prestige.
- Score only on demonstrable experience and skills relevant to the JD.
- If the resume contains no evidence either way, omit it. Do not infer.
Job Description:
<<<JD>>>
Resume:
<<<RESUME>>>"""
import json, requests
def score(jd: str, resume: str, model: str = "qwen2.5:14b") -> dict:
prompt = SCORING_PROMPT.replace("<<<JD>>>", jd).replace("<<<RESUME>>>", resume)
r = requests.post("http://localhost:11434/api/generate", json={
"model": model,
"prompt": prompt,
"format": "json",
"stream": False,
"options": {"temperature": 0.1, "num_predict": 600}
}, timeout=60)
return json.loads(r.json()["response"])
The format: "json" option constrains the model output to valid JSON. Temperature 0.1 makes scores reproducible — the same resume scored twice produces nearly identical numbers, which is critical for audit defensibility.
Model selection
| Model | Params | VRAM (Q4) | Latency / resume | Ranking quality |
|---|---|---|---|---|
| qwen2.5:14b | 14B | 9 GB | 7-12 s | Strong |
| llama3.3:70b | 70B | 42 GB | 25-40 s | Excellent |
| mistral-small:24b | 24B | 14 GB | 12-18 s | Strong |
| gpt-oss:20b | 20B | 13 GB | 14-22 s | Excellent for reasoning |
For most teams, qwen2.5:14b is the right balance. It runs on a 12GB GPU and has excellent instruction following. Spring for llama3.3:70b only if you have the VRAM (48 GB+) and need top-tier ranking quality.
For more on model trade-offs, our best open-source LLMs post covers the full landscape.
Bias Audit Workflow {#bias-audit}
NYC's AEDT law and the EEOC's 2023-2025 guidance both require demographic bias analysis. The four-fifths rule is the legal benchmark: selection rates for any protected class must be at least 80% of the rate for the highest-scoring class.
Synthetic-pair testing
The cleanest way to audit a scorer for name bias:
import itertools
NAMES = {
"white_male": ["Connor Ryan", "Brett Anderson", "Todd Walsh"],
"white_female": ["Allison Peterson", "Caroline Walsh", "Megan Doherty"],
"black_male": ["DeShawn Williams", "Jamal Jefferson", "Tyrone Banks"],
"black_female": ["Lakisha Washington", "Tanisha Jackson", "Latoya Robinson"],
"hispanic_male": ["Jose Rodriguez", "Carlos Hernandez", "Diego Gonzalez"],
"hispanic_female": ["Maria Garcia", "Sofia Lopez", "Isabel Martinez"],
"asian_male": ["Wei Chen", "Hiroshi Tanaka", "Raj Patel"],
"asian_female": ["Mei Lin", "Aisha Khan", "Priya Iyer"],
}
def audit(template_resume: str, jd: str) -> dict:
results = {}
for group, names in NAMES.items():
scores = []
for name in names:
r = template_resume.replace("<NAME>", name)
scores.append(score(jd, r)["score"])
results[group] = sum(scores) / len(scores)
return results
Run this monthly with a fixed template resume. Differences between groups should be under 3 points. If they exceed 5 points, you have a bias problem and need to investigate prompts, training data, or model choice.
What "passing" the audit means
The four-fifths rule applied to a scoring system: if you select all candidates above a score threshold, the selection rate of any group must be at least 80% of the highest group's rate.
white_male: 47% selection rate
black_male: 41% / 47% = 87% PASS
black_female: 36% / 47% = 77% FAIL -> investigate
When you fail, the immediate response is not "ship anyway." It is to either tune the prompt to remove the offending signal, retrain on a better-balanced dataset, or fall back to human-only review for the affected category. Document every step. The audit trail matters.
The EEOC's 2023 technical assistance document on AI in hiring is the canonical legal reference.
Audit Trail and AEDT Logging {#audit-trail}
NYC AEDT compliance requires keeping records of every automated decision and the inputs that produced it. The schema I use:
CREATE TABLE screening_decisions (
id BIGSERIAL PRIMARY KEY,
applied_at TIMESTAMP DEFAULT NOW(),
job_id TEXT NOT NULL,
candidate_pseudonym TEXT NOT NULL,
resume_hash TEXT NOT NULL,
jd_hash TEXT NOT NULL,
model TEXT NOT NULL,
model_version TEXT NOT NULL,
prompt_hash TEXT NOT NULL,
score INTEGER,
recommendation TEXT,
reviewer_decision TEXT,
reviewer_user_id TEXT,
reviewer_decided_at TIMESTAMP,
audit_notes JSONB
);
Every score is a row. The reviewer's final decision (advance/reject) is updated when a human acts on the recommendation. The hashes let you reproduce a decision later without storing PII in the audit table itself.
Candidate notice
NYC AEDT requires informing candidates that automated tools are used. The notice I include in every job application page:
We use an automated tool to assist in evaluating your application. The tool ranks resumes by relevance to the job description. A human recruiter reviews every advancing application before any hiring decision. You may request information about how the tool was evaluated for bias and may opt out of automated screening by emailing [contact email].
This satisfies the notice requirement and prebuilds your defense if a candidate later challenges the decision.
For more on the audit-logging pattern, our local AI audit trail post covers the full architecture.
Hiring Manager Workflow {#manager-workflow}
A scoring system that hiring managers ignore is worse than no system at all — it generates audit liability without producing value. The UI matters.
What works
A 3-pane review layout:
- Left: ranked candidate list (highest score first).
- Center: original resume with AI-flagged strengths/gaps highlighted.
- Right: structured AI output (JSON-derived bullet points), and large advance/reject buttons.
When a manager clicks reject, a required text field captures the reason in their own words. That reason goes into reviewer_decision in the audit table.
What does not work
- Auto-rejecting based on score thresholds (illegal in many jurisdictions without human review).
- Hiding the underlying resume (managers will never trust the AI's judgment).
- Batch-advancing without per-candidate justification (no audit defense).
Time saved
Across three deployments, managers spend 2-3 minutes per resume with this workflow vs. 8-12 minutes without AI assistance. For a recruiter handling 50-80 resumes a day, that is 4-6 hours of recovered time per recruiter per week.
For workflow automation patterns, see our private AI knowledge base post on building team-grade tools.
Pitfalls and Fixes {#pitfalls}
Pitfall 1: The model penalizes career gaps
Cause: training data overrepresents continuous careers as "good."
Fix: add to the system prompt: "Do not penalize career gaps. Many candidates take time off for caregiving, education, or health reasons protected by law."
Pitfall 2: Education prestige bias
Cause: the model learns that Ivy League graduates are "stronger" candidates.
Fix: redact university names with Presidio's ORGANIZATION entity before scoring. Score on degree level only. Reviewers see the original document.
Pitfall 3: Score reproducibility drift
Cause: temperature too high, model unloaded between runs.
Fix: temperature 0.0-0.1, set OLLAMA_KEEP_ALIVE=24h so the model stays resident, and version-pin the model with explicit tags (qwen2.5:14b-instruct-q4_K_M not just qwen2.5:14b).
Pitfall 4: Model says "score: 75" but the JSON is malformed
Cause: rare LLM JSON failures.
Fix: wrap the call in retry logic with a fallback to a lower-temperature retry, then a final fallback to "manual review queue" if both fail. Never silently default to a numeric score.
Pitfall 5: Hiring manager rubber-stamps AI scores
Cause: long lists, fatigue.
Fix: require a justification field for every advance and reject. Random-sample 5% of decisions for spot-check by HR leadership. The act of writing a sentence is enough to break autopilot.
Cost Comparison {#cost}
For a mid-sized employer screening 5,000 resumes a year:
Cloud screening vendors (typical pricing 2026)
| Item | Cost |
|---|---|
| Eightfold AI annual subscription | $40,000+ |
| Phenom People | $30,000+ |
| HireVue (per-interview pricing) | $15,000-$50,000 |
Local stack
| Item | One-time | Recurring |
|---|---|---|
| Workstation (RTX 4090 build) | $4,500 | — |
| Software | $0 | $0 |
| Bias audit (annual, third-party) | — | $5,000-$15,000 |
| Electricity | — | $200/year |
| Internal admin time | — | $4,000/year |
| Year 1 | $13,700-$23,700 | |
| Year 2+ | $9,200-$19,200 |
The local approach saves $20,000-$40,000 in year 2+ for organizations with even a single screening vendor contract — and the savings are larger when you factor in the breach insurance discount most carriers offer for self-hosted PII handling.
What Local AI Cannot Do (Yet)
Be honest with stakeholders about limitations:
- Video interview analysis at scale: local stacks cannot match the throughput of dedicated ML services for video. Use video AI sparingly or not at all (Illinois law restricts it heavily anyway).
- Multi-lingual resumes from rare languages: an English-trained model handles Spanish, French, German fairly well, but Tagalog or Amharic resumes need a specialist multilingual setup.
- Real-time streaming integration with major ATS systems: custom integration via webhook is needed for Greenhouse, Lever, etc. Plan for a week of engineering per ATS.
Conclusion
Resume screening is one of the highest-stakes uses of AI in the modern enterprise. Get it wrong and you face EEOC complaints, AEDT fines, GDPR enforcement, and the operational nightmare of a candidate-data breach. Get it right and you save recruiters hours per day while making demonstrably more consistent, more auditable decisions than any human-only process.
The local stack — Docling for parsing, Presidio for anonymization, Ollama for scoring, PostgreSQL for audit — is the right architecture for any HR team that handles regulated data. It costs less than one cloud-vendor contract per year, every byte of candidate data stays on hardware you own, and the technical artifacts auditors need to certify your bias-audit are sitting right there in your database.
Start with one job category and 100 sample resumes. Tune the prompt against your specific role. Run the bias audit. Then roll the workflow out to recruiters with a 3-pane reviewer UI and required justification fields. The whole transition can be done in three weeks.
Want to see how this fits a broader compliance posture? Our GDPR-compliant local AI and SOC 2 self-hosted AI guides extend the architecture to enterprise-scale audit programs.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!