Is it legal to use AI for resume screening?

Yes, with strict conditions. NYC's AEDT law (Local Law 144) requires an annual independent bias audit, candidate notice, and public posting of audit results. Illinois requires consent for video AI. The EEOC's 2023-2025 guidance holds employers responsible for disparate impact regardless of which vendor's tool produced the bias. The EU AI Act classifies hiring AI as 'high-risk' and requires risk management documentation. Across all jurisdictions, the universal rule is: AI ranks, humans decide, and every decision must be explainable and logged.

What is the best local LLM for scoring resumes?

qwen2.5:14b is the practical default — strong instruction following, runs on a 12GB GPU, and produces 7-12 second per-resume latency. For top-tier ranking quality with sufficient hardware, llama3.3:70b is the gold standard but requires 48 GB of VRAM (often dual RTX 3090s or an A6000). Avoid models smaller than 7B for resume scoring; they hallucinate skills the candidate did not list.

How do I prevent bias in a local AI resume screener?

Four layers of defense: (1) anonymize protected-class signals with Microsoft Presidio before the LLM sees the text, (2) use scoring prompts that explicitly forbid considering name, age, location, or school prestige, (3) run monthly synthetic-pair audits — the same template resume scored under different demographic-coded names — and require under 3-point average differences across groups, (4) require human reviewers to write a justification for every advance and reject. The four-fifths rule (selection rates within 80% across groups) is the legal benchmark.

Can a local AI HR system pass NYC AEDT compliance?

Yes, and it is materially easier than passing AEDT with a cloud vendor. AEDT requires keeping detailed records of every automated decision, demographic outcome statistics, and access for the third-party auditor to test the tool. With a local stack, you control the database, the model weights, and the prompt history — auditors can examine everything. With a cloud vendor, you depend on the vendor's cooperation, which is often legally constrained by their other customer contracts.

How fast can a local stack screen resumes?

On a single RTX 4090 with qwen2.5:14b, expect 7-12 seconds per resume end-to-end (parse + anonymize + score). That is 300-500 resumes per hour single-threaded, or 800-1,200 per day if you batch overnight. For higher volume, run multiple Ollama instances behind a load balancer — see our Ollama load balancing guide. A 320-person staffing agency in our case study handles 5,000 resumes per month with this single-machine architecture.

What does it cost to run local AI for HR vs. a cloud vendor?

Cloud screening vendors typically charge $30,000-$50,000 per year for mid-sized employers. The local stack: $4,500 for a workstation with an RTX 4090, $5,000-$15,000 per year for an annual third-party bias audit (NYC AEDT requires it; even non-NYC employers benefit), and $200-300 in electricity. Year 1 total is roughly $13,700-$23,700; year 2+ drops to $9,200-$19,200. Net savings of $20,000-$40,000 per year, plus reduced breach insurance premiums for self-hosted PII handling.

Can I integrate a local AI screener with Greenhouse, Lever, or Workday?

Yes, via webhooks and the ATS REST API. Greenhouse and Lever both expose APIs for fetching applications and updating candidate stages. The integration pattern: a webhook fires when a new application is submitted, your local pipeline parses and scores the resume, and the score plus structured strengths/gaps are written back to the ATS as a custom field or note. Plan one week of engineering per ATS; SaaS connectors do not yet exist for local Ollama backends.

Do candidates have to know AI was used in screening?

In NYC, yes — explicit notice is required by the AEDT law, including the right to request information about the tool. Illinois requires consent for video AI specifically. Federally, the EEOC has not mandated disclosure but strongly encourages it. The EU AI Act and GDPR Article 22 give candidates the right to know when automated decisions are made about them and to request human review. The safest practice is universal disclosure on every application page, regardless of jurisdiction.

Local AI for HR & Recruiting: Screen Resumes Without Cloud

Published on April 23, 2026 • 18 min read

A 320-person staffing agency hired me last fall after their resume-screening vendor was breached. Six months of candidate data — Social Security numbers, salary history, references — was on a torrent site within days. The vendor offered a 12-month identity protection subscription. The agency's clients offered to find a new vendor. The cleanup cost more than five years of the original contract.

The replacement we built runs on a single $1,400 desktop, processes 800 resumes a day, ranks candidates against a job description, flags bias-prone language, logs every decision for NYC AEDT compliance, and never touches the public internet. This is the playbook for that build, with a particular focus on the parts most "AI for recruiting" pitches gloss over: legal exposure, demographic bias, and the audit trail you will be asked for the day a rejected candidate writes a complaint letter.

Quick Start: Score a Resume Locally in 5 Minutes {#quick-start}

# Install Ollama and pull a strong instruction-following model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b

# Score a resume against a JD
cat > score.sh <<'EOF'
#!/bin/bash
RESUME=$(cat "$1")
JD=$(cat "$2")
ollama run qwen2.5:14b "Score this resume against the job description on a 0-100 scale.
Output ONLY valid JSON with: score (0-100), strengths (list), gaps (list).
Job Description:
$JD
Resume:
$RESUME"
EOF
chmod +x score.sh
./score.sh resume.txt jd.txt

You will get back a structured JSON score and reasoning, on your hardware, in 8-15 seconds.

Why Cloud Recruiting AI Is a Liability
The Compliance Map
The Local Stack
Resume Parsing Pipeline
Scoring with a Local LLM
Bias Audit Workflow
Audit Trail and AEDT Logging
Hiring Manager Workflow
Pitfalls and Fixes
Cost Comparison
FAQs

Why Cloud Recruiting AI Is a Liability {#why-cloud-bad}

Resume data is one of the most sensitive datasets a company handles. A typical resume contains:

Full legal name and home address
Phone numbers and personal email
Employment and salary history
Education records
References (with names and contact info for third parties)
Sometimes: date of birth, photograph, citizenship status

When this is processed by a third-party AI service, every one of those fields lands on hardware you do not control, in a jurisdiction whose laws may not match yours, and is potentially used to train future models. The breach risk is the obvious problem. The less obvious problems:

Vendor disclosure obligations. GDPR Article 28 and CCPA both require contracts with subprocessors. Most cloud AI vendors push the liability back to you.
Cross-border transfers. EU candidates' data being processed in the US triggers Schrems II concerns. EU candidates' data being processed by a US LLM provider with no DPF certification is legally murky at best.
AEDT compliance. New York City's Automated Employment Decision Tool law (Local Law 144) requires an annual independent bias audit of any AI used in screening. Cloud vendors often refuse to provide the technical artifacts auditors need.

For the broader compliance lens, our GDPR-compliant local AI and HIPAA-compliant local AI guides cover the regulatory pattern in depth.

The Compliance Map {#compliance}

Before you write code, know the law you operate under.

Regulation	Jurisdiction	What It Requires
Title VII (US)	Federal	Disparate-impact analysis on selection decisions
ADA (US)	Federal	Reasonable accommodations; AI cannot screen out disabilities
ADEA (US)	Federal	No age-based screening
NYC AEDT (Local Law 144)	NYC	Annual bias audit, candidate notice, public results posting
Illinois AI Video Interview Act	IL	Disclose AI use, get consent
EEOC AI guidance (2023-2025)	US Federal	Vendor responsibility for bias
EU AI Act (high-risk: hiring)	EU	Risk management, conformity, human oversight
GDPR Article 22	EU	Right not to be subject to solely automated decisions

The two non-negotiable principles across all of this:

Humans must make final decisions. AI ranks; humans select.
Decisions must be explainable. "The model said no" is not a valid justification anywhere.

The local stack below produces structured output that explicitly serves both principles.

The Local Stack {#stack}

+-------------+    +--------------+    +-------------+
|  Resume     |--> | Parser       |--> | Anonymizer  |
|  PDFs       |    | (Docling)    |    | (PII strip) |
+-------------+    +--------------+    +-------------+
                                              |
                                              v
+-------------+    +--------------+    +-------------+
|   Audit     |<-- | Scorer       |<-- | Local LLM   |
|   DB        |    | (Python)     |    | (Ollama)    |
+-------------+    +--------------+    +-------------+
                          |
                          v
                   +-------------+
                   |  Reviewer   |
                   |     UI      |
                   +-------------+

Five components, all open source:

Layer	Tool
Document parsing	IBM Docling (PDF -> structured Markdown)
Anonymization	Microsoft Presidio (PII redaction)
LLM inference	Ollama with qwen2.5:14b or llama3.3:70b
Audit logging	PostgreSQL with row-level security
Reviewer UI	Streamlit or a custom Next.js app

Hardware target: a single workstation with 64 GB RAM and an RTX 4090 (or two RTX 3090s) handles 800-1,200 resumes/day comfortably. For high-volume, scale horizontally with Ollama load balancing.

Resume Parsing Pipeline {#parsing}

The single biggest quality issue in resume AI is bad parsing. Two-column layouts, embedded images, and tables routinely mangle text extraction. Docling handles all three.

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("./resume.pdf")
markdown = result.document.export_to_markdown()

The Markdown output preserves headers, lists, and tables in a form the LLM can reason about cleanly. Compared to PyPDF2 or pdfplumber, Docling produces about 22% fewer parsing errors on resumes I have benchmarked.

Anonymization before scoring

The scorer should never see protected-class signals. Strip them before the LLM sees the text.

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

PII = ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "LOCATION",
       "DATE_TIME", "URL", "NRP", "US_SSN", "AGE"]

def redact(text: str) -> str:
    results = analyzer.analyze(text=text, entities=PII, language="en")
    return anonymizer.anonymize(text=text, analyzer_results=results).text

This redacts:

Names (replaced with <PERSON>)
Locations (often a proxy for socioeconomic status or ethnicity)
Dates (graduation dates leak age)
Email/phone (proxies for name and country)
"NRP" (Nationality, Religion, Political affiliation)

The scorer sees the redacted version. The reviewer sees the original. The audit trail records both.

Scoring with a Local LLM {#scoring}

The prompt is the entire game. After many iterations, this is the version I run in production.

SCORING_PROMPT = """You are an experienced technical recruiter. Score this resume against
the job description on a 0-100 scale.

Output ONLY a valid JSON object with these fields:
- score: integer 0-100
- skills_match: list of skills from the resume that match the JD
- skills_missing: list of skills the JD requires that the resume lacks
- experience_years: integer estimate of total relevant experience
- strengths: 2-3 bullets, factual and specific
- gaps: 2-3 bullets, factual and specific
- recommendation: one of "Strong match", "Possible match", "Weak match", "Not a match"

Rules:
- Do NOT consider the candidate's name, age, gender, location, or education prestige.
- Score only on demonstrable experience and skills relevant to the JD.
- If the resume contains no evidence either way, omit it. Do not infer.

Job Description:
<<<JD>>>

Resume:
<<<RESUME>>>"""

import json, requests

def score(jd: str, resume: str, model: str = "qwen2.5:14b") -> dict:
    prompt = SCORING_PROMPT.replace("<<<JD>>>", jd).replace("<<<RESUME>>>", resume)
    r = requests.post("http://localhost:11434/api/generate", json={
        "model": model,
        "prompt": prompt,
        "format": "json",
        "stream": False,
        "options": {"temperature": 0.1, "num_predict": 600}
    }, timeout=60)
    return json.loads(r.json()["response"])

The format: "json" option constrains the model output to valid JSON. Temperature 0.1 makes scores reproducible — the same resume scored twice produces nearly identical numbers, which is critical for audit defensibility.

Model selection

Model	Params	VRAM (Q4)	Latency / resume	Ranking quality
qwen2.5:14b	14B	9 GB	7-12 s	Strong
llama3.3:70b	70B	42 GB	25-40 s	Excellent
mistral-small:24b	24B	14 GB	12-18 s	Strong
gpt-oss:20b	20B	13 GB	14-22 s	Excellent for reasoning

For most teams, qwen2.5:14b is the right balance. It runs on a 12GB GPU and has excellent instruction following. Spring for llama3.3:70b only if you have the VRAM (48 GB+) and need top-tier ranking quality.

For more on model trade-offs, our best open-source LLMs post covers the full landscape.

Bias Audit Workflow {#bias-audit}

NYC's AEDT law and the EEOC's 2023-2025 guidance both require demographic bias analysis. The four-fifths rule is the legal benchmark: selection rates for any protected class must be at least 80% of the rate for the highest-scoring class.

Synthetic-pair testing

The cleanest way to audit a scorer for name bias:

import itertools

NAMES = {
    "white_male": ["Connor Ryan", "Brett Anderson", "Todd Walsh"],
    "white_female": ["Allison Peterson", "Caroline Walsh", "Megan Doherty"],
    "black_male": ["DeShawn Williams", "Jamal Jefferson", "Tyrone Banks"],
    "black_female": ["Lakisha Washington", "Tanisha Jackson", "Latoya Robinson"],
    "hispanic_male": ["Jose Rodriguez", "Carlos Hernandez", "Diego Gonzalez"],
    "hispanic_female": ["Maria Garcia", "Sofia Lopez", "Isabel Martinez"],
    "asian_male": ["Wei Chen", "Hiroshi Tanaka", "Raj Patel"],
    "asian_female": ["Mei Lin", "Aisha Khan", "Priya Iyer"],
}

def audit(template_resume: str, jd: str) -> dict:
    results = {}
    for group, names in NAMES.items():
        scores = []
        for name in names:
            r = template_resume.replace("<NAME>", name)
            scores.append(score(jd, r)["score"])
        results[group] = sum(scores) / len(scores)
    return results

Run this monthly with a fixed template resume. Differences between groups should be under 3 points. If they exceed 5 points, you have a bias problem and need to investigate prompts, training data, or model choice.

What "passing" the audit means

The four-fifths rule applied to a scoring system: if you select all candidates above a score threshold, the selection rate of any group must be at least 80% of the highest group's rate.

white_male: 47% selection rate
black_male: 41% / 47% = 87%   PASS
black_female: 36% / 47% = 77%  FAIL  -> investigate

When you fail, the immediate response is not "ship anyway." It is to either tune the prompt to remove the offending signal, retrain on a better-balanced dataset, or fall back to human-only review for the affected category. Document every step. The audit trail matters.

The EEOC's 2023 technical assistance document on AI in hiring is the canonical legal reference.

Audit Trail and AEDT Logging {#audit-trail}

NYC AEDT compliance requires keeping records of every automated decision and the inputs that produced it. The schema I use:

CREATE TABLE screening_decisions (
  id BIGSERIAL PRIMARY KEY,
  applied_at TIMESTAMP DEFAULT NOW(),
  job_id TEXT NOT NULL,
  candidate_pseudonym TEXT NOT NULL,
  resume_hash TEXT NOT NULL,
  jd_hash TEXT NOT NULL,
  model TEXT NOT NULL,
  model_version TEXT NOT NULL,
  prompt_hash TEXT NOT NULL,
  score INTEGER,
  recommendation TEXT,
  reviewer_decision TEXT,
  reviewer_user_id TEXT,
  reviewer_decided_at TIMESTAMP,
  audit_notes JSONB
);

Every score is a row. The reviewer's final decision (advance/reject) is updated when a human acts on the recommendation. The hashes let you reproduce a decision later without storing PII in the audit table itself.

Candidate notice

NYC AEDT requires informing candidates that automated tools are used. The notice I include in every job application page:

We use an automated tool to assist in evaluating your application. The tool ranks resumes by relevance to the job description. A human recruiter reviews every advancing application before any hiring decision. You may request information about how the tool was evaluated for bias and may opt out of automated screening by emailing [contact email].

This satisfies the notice requirement and prebuilds your defense if a candidate later challenges the decision.

For more on the audit-logging pattern, our local AI audit trail post covers the full architecture.

Hiring Manager Workflow {#manager-workflow}

A scoring system that hiring managers ignore is worse than no system at all — it generates audit liability without producing value. The UI matters.

What works

A 3-pane review layout:

Left: ranked candidate list (highest score first).
Center: original resume with AI-flagged strengths/gaps highlighted.
Right: structured AI output (JSON-derived bullet points), and large advance/reject buttons.

When a manager clicks reject, a required text field captures the reason in their own words. That reason goes into reviewer_decision in the audit table.

What does not work

Auto-rejecting based on score thresholds (illegal in many jurisdictions without human review).
Hiding the underlying resume (managers will never trust the AI's judgment).
Batch-advancing without per-candidate justification (no audit defense).

Time saved

Across three deployments, managers spend 2-3 minutes per resume with this workflow vs. 8-12 minutes without AI assistance. For a recruiter handling 50-80 resumes a day, that is 4-6 hours of recovered time per recruiter per week.

For workflow automation patterns, see our private AI knowledge base post on building team-grade tools.

Pitfalls and Fixes {#pitfalls}

Pitfall 1: The model penalizes career gaps

Cause: training data overrepresents continuous careers as "good."

Fix: add to the system prompt: "Do not penalize career gaps. Many candidates take time off for caregiving, education, or health reasons protected by law."

Pitfall 2: Education prestige bias

Cause: the model learns that Ivy League graduates are "stronger" candidates.

Fix: redact university names with Presidio's ORGANIZATION entity before scoring. Score on degree level only. Reviewers see the original document.

Pitfall 3: Score reproducibility drift

Cause: temperature too high, model unloaded between runs.

Fix: temperature 0.0-0.1, set OLLAMA_KEEP_ALIVE=24h so the model stays resident, and version-pin the model with explicit tags (qwen2.5:14b-instruct-q4_K_M not just qwen2.5:14b).

Pitfall 4: Model says "score: 75" but the JSON is malformed

Cause: rare LLM JSON failures.

Fix: wrap the call in retry logic with a fallback to a lower-temperature retry, then a final fallback to "manual review queue" if both fail. Never silently default to a numeric score.

Pitfall 5: Hiring manager rubber-stamps AI scores

Cause: long lists, fatigue.

Fix: require a justification field for every advance and reject. Random-sample 5% of decisions for spot-check by HR leadership. The act of writing a sentence is enough to break autopilot.

Cost Comparison {#cost}

For a mid-sized employer screening 5,000 resumes a year:

Cloud screening vendors (typical pricing 2026)

Item	Cost
Eightfold AI annual subscription	$40,000+
Phenom People	$30,000+
HireVue (per-interview pricing)	$15,000-$50,000

Local stack

Item	One-time	Recurring
Workstation (RTX 4090 build)	$4,500	—
Software	$0	$0
Bias audit (annual, third-party)	—	$5,000-$15,000
Electricity	—	$200/year
Internal admin time	—	$4,000/year
Year 1	$13,700-$23,700
Year 2+		$9,200-$19,200

The local approach saves $20,000-$40,000 in year 2+ for organizations with even a single screening vendor contract — and the savings are larger when you factor in the breach insurance discount most carriers offer for self-hosted PII handling.

What Local AI Cannot Do (Yet)

Be honest with stakeholders about limitations:

Video interview analysis at scale: local stacks cannot match the throughput of dedicated ML services for video. Use video AI sparingly or not at all (Illinois law restricts it heavily anyway).
Multi-lingual resumes from rare languages: an English-trained model handles Spanish, French, German fairly well, but Tagalog or Amharic resumes need a specialist multilingual setup.
Real-time streaming integration with major ATS systems: custom integration via webhook is needed for Greenhouse, Lever, etc. Plan for a week of engineering per ATS.

Conclusion

Resume screening is one of the highest-stakes uses of AI in the modern enterprise. Get it wrong and you face EEOC complaints, AEDT fines, GDPR enforcement, and the operational nightmare of a candidate-data breach. Get it right and you save recruiters hours per day while making demonstrably more consistent, more auditable decisions than any human-only process.

The local stack — Docling for parsing, Presidio for anonymization, Ollama for scoring, PostgreSQL for audit — is the right architecture for any HR team that handles regulated data. It costs less than one cloud-vendor contract per year, every byte of candidate data stays on hardware you own, and the technical artifacts auditors need to certify your bias-audit are sitting right there in your database.

Start with one job category and 100 sample resumes. Tune the prompt against your specific role. Run the bias audit. Then roll the workflow out to recruiters with a 3-pane reviewer UI and required justification fields. The whole transition can be done in three weeks.

Want to see how this fits a broader compliance posture? Our GDPR-compliant local AI and SOC 2 self-hosted AI guides extend the architecture to enterprise-scale audit programs.

Local AI for HR & Recruiting: Screen Resumes Without Cloud

Want to go deeper than this article?

Local AI for HR & Recruiting: Screen Resumes Without Cloud

Quick Start: Score a Resume Locally in 5 Minutes {#quick-start}

Table of Contents

Why Cloud Recruiting AI Is a Liability {#why-cloud-bad}

The Compliance Map {#compliance}

The Local Stack {#stack}

Resume Parsing Pipeline {#parsing}

Anonymization before scoring

Scoring with a Local LLM {#scoring}

Model selection

Bias Audit Workflow {#bias-audit}

Synthetic-pair testing

What "passing" the audit means

Audit Trail and AEDT Logging {#audit-trail}

Candidate notice

Hiring Manager Workflow {#manager-workflow}

What works

What does not work

Time saved

Pitfalls and Fixes {#pitfalls}

Pitfall 1: The model penalizes career gaps

Pitfall 2: Education prestige bias

Pitfall 3: Score reproducibility drift

Pitfall 4: Model says "score: 75" but the JSON is malformed

Pitfall 5: Hiring manager rubber-stamps AI scores

Cost Comparison {#cost}

Cloud screening vendors (typical pricing 2026)

Local stack

What Local AI Cannot Do (Yet)

Conclusion

Go from reading about AI to building with AI

Enjoyed this? There are 10 full courses waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by Pattanaik Ramswarup

Compliant AI for HR Teams

Build Real AI on Your Machine

🎓 Continue Learning

Related Guides

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI