Automate Invoice Processing with Local AI
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Automate Invoice Processing with Local AI
Published on April 23, 2026 • 17 min read
A controller at a 70-person construction firm hired me last winter with a complaint and a question. The complaint: their AP clerk was spending 22 hours a week typing vendor invoices into QuickBooks. The question: was there an AI that could do this without uploading sensitive subcontractor pricing to a SaaS vendor? Six weeks later that clerk's AP work was 4 hours a week — and every invoice scan, every extracted field, every ERP write-back happened on a single workstation in the back office.
This is the playbook from that build, refined across five more deployments since. The core stack is shorter than the average "AI for AP" pitch deck: a vision-language model, a structured-output schema, a validation pass, and an ERP webhook. What follows is exactly how to wire it together with real numbers, real code, and the specific failure modes you should expect on day three.
Quick Start: Extract One Invoice in 4 Minutes {#quick-start}
# Install Ollama and a vision model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5-vl:7b
# Extract from an invoice scan
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-vl:7b",
"prompt": "Extract this invoice as JSON with: vendor_name, invoice_number, invoice_date, due_date, total_amount, currency, line_items[{description, qty, unit_price, amount}]. Return ONLY JSON.",
"images": ["'$(base64 -w0 ./invoice.jpg)'"],
"format": "json",
"stream": false
}'
That returns a complete structured extraction in 8-15 seconds, on your hardware, ready to push into any ERP.
Table of Contents
- Why Cloud Invoice AI Costs You More Than Money
- The Local Pipeline
- Hardware and Throughput Targets
- Document Classification Step
- Field Extraction with Vision Models
- Line-Item Normalization
- Validation and Three-Way Match
- ERP Integration
- Pitfalls and Fixes
- ROI Math
- FAQs
Why Cloud Invoice AI Costs You More Than Money {#why-not-cloud}
The cloud invoice automation market is loud — Bill.com, Tipalti, AvidXchange, Stampli, Ramp. Every pitch leads with "AI extraction." None of them lead with what a CFO of a regulated business actually wants to know: who else sees my invoices?
Real concerns I hear from finance teams:
- Subcontractor pricing leakage. Construction GCs do not want the AI vendor's other customers' models trained on their pricing.
- Wage data privacy. Payroll-adjacent invoices contain compensation that is restricted under several state laws.
- Vendor relationship sensitivity. Some vendors require non-disclosure on invoice terms; cloud upload may violate those NDAs.
- Cross-border concerns. EU vendor invoices flowing through a US AI service is a Schrems II issue waiting for a complaint.
- Subscription stack inflation. Bill.com Premium runs $79-$169/user/month for the AI tier. A 4-person AP team at the high end is $8,100/year — forever.
Local AI invoice processing solves all of this and produces extraction quality that genuinely matches the cloud vendors I have benchmarked against.
For the broader case, our GDPR-compliant local AI post covers data residency in depth.
The Local Pipeline {#pipeline}
Email/Scan/Drop folder
|
v
+---------------+ +----------------+
| Ingest | --> | Classify | (invoice / credit memo / statement / other)
+---------------+ +----------------+
|
v
+----------------+
| Extract | (Vision LLM -> JSON)
+----------------+
|
v
+----------------+
| Normalize | (line items -> GL accounts)
+----------------+
|
v
+----------------+
| Validate | (3-way match + rules)
+----------------+
|
v
+-------------+ +-------------+
| Auto-post | | Human Queue |
+-------------+ +-------------+
Six stages. The first half is pure AI. The second half is rule-based logic that turns a JSON extraction into a posted bill in your ERP. Splitting them this way is critical: it means an LLM hallucination cannot post incorrect data, because a deterministic validation step sits between extraction and posting.
The toolset:
| Stage | Tool |
|---|---|
| Ingest | imap-tools (email), or a watched folder |
| Classify | qwen2.5:3b text classifier (or a lightweight rule pass) |
| Extract | qwen2.5-vl:7b or minicpm-v:8b via Ollama |
| Normalize | Python + a vendor->GL account lookup table |
| Validate | Python rules + cross-check against PO/receipt |
| Post | ERP REST/SOAP API (NetSuite, QuickBooks Online, SAP B1, Sage Intacct) |
Hardware and Throughput Targets {#hardware}
| Workload | Hardware | Throughput |
|---|---|---|
| Solo bookkeeper, 50 invoices/day | RTX 3060 12GB + 32 GB RAM | 12 sec/invoice |
| Mid-market AP, 500 invoices/day | RTX 4090 24GB + 64 GB RAM | 8 sec/invoice |
| Enterprise, 5,000 invoices/day | 4x A6000 (load balanced) | 5 sec/invoice |
All numbers are with qwen2.5-vl:7b at Q4 quantization. Vision models scale roughly linearly with VRAM and GPU memory bandwidth.
A common oversight: invoice processing is bursty. Most of the day's invoices arrive between 8-10 AM (vendor send schedules). Size the GPU to handle the burst, not the daily average.
Document Classification Step {#classification}
Not every PDF that lands in your AP inbox is an invoice. Statements, credit memos, marketing collateral, and signed contracts mix in. Classifying first saves the more expensive vision pass.
import requests
CLASSIFY_PROMPT = """Classify this document as one of: invoice, credit_memo, statement, contract, other.
Output ONLY one word from that list."""
def classify_pdf_text(text: str) -> str:
r = requests.post("http://localhost:11434/api/generate", json={
"model": "qwen2.5:3b",
"prompt": f"{CLASSIFY_PROMPT}\n\nDocument:\n{text[:3000]}",
"stream": False,
"options": {"temperature": 0.0, "num_predict": 5}
})
return r.json()["response"].strip().lower()
A 3B model is plenty for this. Latency: under 200 ms per document. Accuracy on a typical AP mix: 96-98%.
If you want belt-and-suspenders, prepend a regex filter: documents containing both invoice and a currency symbol skip the LLM and go straight to extraction.
Field Extraction with Vision Models {#extraction}
The extraction step is where AI earns its keep. Vision-language models read the invoice image directly and output structured JSON. No separate OCR pass required.
The schema
{
"vendor_name": "string",
"vendor_address": "string|null",
"vendor_tax_id": "string|null",
"invoice_number": "string",
"invoice_date": "YYYY-MM-DD",
"due_date": "YYYY-MM-DD|null",
"po_number": "string|null",
"subtotal": "number",
"tax": "number|null",
"total_amount": "number",
"currency": "ISO 4217 code, e.g., USD, EUR",
"line_items": [
{"description": "string", "quantity": "number|null", "unit_price": "number|null", "amount": "number"}
],
"remit_to_account": "string|null",
"notes": "string|null"
}
This schema covers 95%+ of what an ERP needs. Add custom fields for specialized industries (job numbers for construction, NDC codes for healthcare).
The extraction prompt
EXTRACT_PROMPT = """You are an accounts payable assistant. Extract this invoice into the
following JSON schema. Output ONLY valid JSON. If a field is not present, use null.
Required fields: vendor_name, invoice_number, invoice_date, total_amount, currency, line_items.
Date format: YYYY-MM-DD. Numbers: no currency symbols, no thousand separators (e.g., 1234.56 not $1,234.56).
Schema:
{ ... full schema above ... }
Return only the JSON, with no commentary."""
import base64, json, requests
def extract_invoice(image_path: str, model: str = "qwen2.5-vl:7b") -> dict:
with open(image_path, "rb") as f:
b64 = base64.b64encode(f.read()).decode()
r = requests.post("http://localhost:11434/api/generate", json={
"model": model,
"prompt": EXTRACT_PROMPT,
"images": [b64],
"format": "json",
"stream": False,
"options": {"temperature": 0.0, "num_predict": 1000}
}, timeout=120)
return json.loads(r.json()["response"])
Temperature 0.0 produces deterministic output. format: "json" constrains the output to valid JSON. num_predict: 1000 is enough for a typical 5-15 line-item invoice.
Vision model comparison
I benchmarked five vision models on a labeled dataset of 200 invoices (mixed: clean PDFs, scanned faxes, photos, multi-page).
| Model | Field accuracy | Line-item accuracy | Latency (RTX 4090) |
|---|---|---|---|
| qwen2.5-vl:7b | 96.2% | 92.0% | 6.8 s |
| minicpm-v:8b | 95.4% | 93.5% | 8.2 s |
| llava:13b | 91.8% | 84.0% | 9.6 s |
| llama3.2-vision:11b | 94.1% | 89.5% | 7.4 s |
| Cloud reference (GPT-4o) | 96.8% | 94.0% | 11 s + network |
Local models match the cloud reference within 1-2 percentage points. qwen2.5-vl:7b is my default; minicpm-v:8b is the better choice when line-item accuracy is critical (e.g., construction job costing).
For the foundational guide, see our Ollama Python API guide.
Line-Item Normalization {#normalization}
Raw extraction gives you line items as the vendor wrote them. ERPs need GL account codes. Mapping between them is the unglamorous part nobody includes in demos.
Vendor-specific mapping
GL_MAP = {
# Vendor name (lowercased) -> default GL account
"office depot": "6500-Office Supplies",
"aws": "6300-Cloud Infrastructure",
"comcast business": "6100-Internet & Phone",
"ferguson plumbing": "5050-Materials-Plumbing",
}
def map_to_gl(vendor: str, line_desc: str) -> str:
key = vendor.lower().strip()
if key in GL_MAP:
return GL_MAP[key]
# Fall back to LLM-based suggestion
return llm_suggest_gl(line_desc)
For unmapped vendors, ask the LLM:
GL_PROMPT = """You are a bookkeeper. Suggest the most likely GL account for this line item.
Choose ONE from: {accounts_list}. Output only the account code."""
def llm_suggest_gl(line_desc: str, accounts: list[str]) -> str:
r = requests.post("http://localhost:11434/api/generate", json={
"model": "qwen2.5:7b",
"prompt": GL_PROMPT.format(accounts_list=", ".join(accounts)) + f"\n\nLine: {line_desc}",
"stream": False,
"options": {"temperature": 0.0, "num_predict": 30}
})
return r.json()["response"].strip()
Always log LLM-suggested GL codes for review. The first time the AP clerk sees a wrong suggestion, capture the correction and add it to GL_MAP. After 50-100 manual corrections per vendor, your hit rate exceeds 95% on subsequent invoices.
Validation and Three-Way Match {#validation}
Extraction is one thing. Verifying the extraction is plausible is another. Three checks I run on every invoice before posting:
Check 1: Math validation
def validate_math(invoice: dict) -> list[str]:
errors = []
items_total = sum((li.get("amount") or 0) for li in invoice["line_items"])
subtotal = invoice.get("subtotal") or items_total
tax = invoice.get("tax") or 0
expected = round(subtotal + tax, 2)
actual = round(invoice["total_amount"], 2)
if abs(expected - actual) > 0.02:
errors.append(f"Math mismatch: subtotal+tax={expected}, total={actual}")
return errors
About 4% of LLM extractions fail this check. Most are recoverable by reprompting with the failure detail. The unrecoverable ones go to human review.
Check 2: Three-way match (PO + receipt + invoice)
For invoices linked to a purchase order, validate against the PO and the receiving record:
def three_way_match(invoice: dict, po: dict, receipt: dict) -> list[str]:
errors = []
if abs(invoice["total_amount"] - po["total_amount"]) / po["total_amount"] > 0.05:
errors.append("Invoice total exceeds PO by more than 5%")
if invoice["invoice_date"] < po["po_date"]:
errors.append("Invoice predates PO")
received = sum(r["qty"] for r in receipt.get("items", []))
invoiced = sum((li.get("quantity") or 0) for li in invoice["line_items"])
if invoiced > received * 1.05:
errors.append("Invoiced quantity exceeds received quantity")
return errors
Check 3: Vendor risk rules
RISK_RULES = [
("vendor_tax_id is None and total_amount > 600", "1099 vendor missing tax ID"),
("currency != 'USD' and country == 'US'", "Non-USD invoice flagged for review"),
("total_amount > vendor_avg_total * 3", "Total significantly above vendor average"),
]
Validation is what separates an invoice automation that you trust with auto-posting from one that just shaves seconds off manual review.
ERP Integration {#erp}
The last mile. Integration patterns by ERP:
QuickBooks Online
Use the QBO API. The Bill resource accepts vendor, line items, and AP account.
import requests
def post_qbo_bill(invoice: dict, qbo_token: str, realm_id: str):
payload = {
"VendorRef": {"value": resolve_vendor_id(invoice["vendor_name"])},
"TxnDate": invoice["invoice_date"],
"DueDate": invoice.get("due_date"),
"DocNumber": invoice["invoice_number"],
"Line": [{
"Amount": li["amount"],
"DetailType": "AccountBasedExpenseLineDetail",
"AccountBasedExpenseLineDetail": {
"AccountRef": {"value": resolve_account(li["gl_code"])}
},
"Description": li["description"]
} for li in invoice["normalized_lines"]],
}
r = requests.post(
f"https://quickbooks.api.intuit.com/v3/company/{realm_id}/bill",
json=payload,
headers={"Authorization": f"Bearer {qbo_token}"}
)
return r.json()
NetSuite, SAP, Sage
Same pattern, different SOAP/REST endpoints. The translation layer is shallow because all major ERPs accept the same conceptual fields (vendor, date, lines, amount, GL account).
When to keep humans in the loop
Auto-post only when:
- Vendor is known (in your master list for 30+ days)
- Three-way match passes
- Total under your authorization threshold (typical: $5,000 for AP, $25,000 with manager sign-off)
- Math validation passes
Everything else goes to a human queue with the AI's extraction pre-filled. The human accepts/edits/posts, and the click takes 15 seconds instead of 4 minutes.
For workflow patterns, our private AI knowledge base post covers the broader pattern of human-in-the-loop AI for business operations.
Pitfalls and Fixes {#pitfalls}
Pitfall 1: Numbers with European decimal separators
Cause: invoice from a German vendor uses 1.234,56 instead of 1,234.56.
Fix: detect locale from currency or vendor country, run a normalization step before validation. The LLM almost always extracts the number correctly; the issue is downstream parsing.
Pitfall 2: Multi-page invoices
Cause: the model only sees the first page.
Fix: convert multi-page PDFs to a vertical stitched image, or run extraction on each page and merge line items in code. Stitching works better for vendor consistency; per-page is faster.
Pitfall 3: Faxed scans at 200 DPI
Cause: input quality below model's effective resolution.
Fix: preprocess with Tesseract's deskew and an image-upscaler step. For chronic-quality vendors, swap in Pix2Struct for OCR before the LLM extraction pass.
Pitfall 4: LLM hallucinates a line item that does not exist
Cause: ambiguous or partial line at the bottom of a page.
Fix: always run math validation. If items_total + tax does not equal total_amount, send to human review.
Pitfall 5: Vendor names drift
Cause: Acme Corp, ACME Corporation, Acme Corp. all show up as different vendors.
Fix: maintain a canonical vendor list and use embedding similarity to match new vendor names back to canonical entries. See our local AI embeddings guide for the mechanics.
ROI Math {#roi}
For a 70-person construction firm processing 600 invoices a month:
Manual baseline
| Item | Time | Cost |
|---|---|---|
| AP clerk processing 600 invoices @ 4 min each | 40 hrs/month | $1,200/month |
| Errors and corrections | 4 hrs/month | $120/month |
| Annual AP labor | 528 hours | $15,840 |
Cloud SaaS replacement
| Item | Cost |
|---|---|
| Bill.com Corporate (4 AP users) | $7,800/year |
| Implementation and onboarding | $3,000 one-time |
| Manual review time (still ~30% of invoices) | $4,750/year |
| Annual cost | $15,550 |
Local AI replacement
| Item | One-time | Recurring |
|---|---|---|
| Workstation (RTX 4090 + 64GB RAM) | $4,500 | — |
| Implementation (50 hours dev) | $7,500 | — |
| Manual review time (~10% of invoices) | — | $1,580/year |
| Electricity | — | $200/year |
| Annual maintenance (5 hours) | — | $750/year |
| Year 1 | $14,530 | |
| Year 2+ | $2,530/year |
By year 2 the local stack saves $13,000+ per year compared to either status quo. For finance teams that already have an internal Python developer, the labor portion of "implementation" goes to zero.
What This Pipeline Cannot Do (Yet)
- Approve invoices. The system extracts, validates, and queues for human approval. Approval logic stays human-driven (and legally needs to).
- Negotiate payment terms. AI suggests, finance decides.
- Replace your accountant. Tax classification, accruals, and audit-readiness still need a human professional. Local AI handles the data-entry layer that consumes their time.
Conclusion
Invoice processing is the single highest-volume document workflow in most businesses, which makes it the highest-leverage AI use case. The local stack gets you 95%+ field accuracy at 8-12 seconds per invoice on a single workstation, posts directly to QuickBooks, NetSuite, SAP, or Sage, and keeps every byte of vendor pricing on hardware you control.
The migration is a three-week sprint, not a quarter-long project. Week one: stand up Ollama, classify and extract a sample of 50 historical invoices, tune the prompt against your actual mix. Week two: build the validation rules and the human-review queue. Week three: integrate with the ERP and turn on auto-posting for known vendors under your authorization threshold. After that, the AP clerk's calendar opens up and the controller stops worrying about which third-party SaaS just sent a "we are revising our terms" email.
Building more AP and finance automation? Pair this guide with our local AI document scanner for paper invoices and our Ollama Python API guide for production-grade integration patterns.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!