Review Contracts with Local AI: No Cloud Risk
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Review Contracts with Local AI: Zero Cloud Risk
Published on April 11, 2026 -- 16 min read
Last year, a mid-size law firm uploaded an acquisition agreement to a cloud AI service for "quick analysis." Three months later, fragments of that deal appeared in another user's AI-generated output. The acquiring company sued. The firm lost the case and two clients.
That story — and variations of it — is why contracts should never touch cloud AI. Not even with enterprise plans. Not even with a signed DPA. The risk calculus does not work when the upside is "saves 2 hours" and the downside is "leaked M&A terms."
This guide builds a contract review system that runs entirely on your hardware. Your NDAs, your vendor agreements, your employment contracts — processed by models you control, on a network that does not reach the internet.
Why Contracts Cannot Go Through Cloud AI {#why-no-cloud}
Consider what a typical commercial contract contains:
- Financial terms — pricing, payment schedules, penalties
- Trade secrets — proprietary processes mentioned in scope-of-work sections
- Personnel data — names, titles, compensation in employment agreements
- Strategic information — M&A targets, expansion plans, partnership terms
- Competitive intelligence — exclusivity arrangements, non-compete scopes
When you paste this into a cloud AI:
- The text travels over the internet to a data center you have never audited
- It is processed on shared GPU infrastructure alongside other customers' data
- The provider's retention policy determines how long your contract text lives on their servers
- Staff at the AI company may review flagged conversations for safety or quality
Even with enterprise agreements that promise no training on your data, the operational reality is that your text exists on someone else's hardware, subject to their security practices, their employees' access controls, and their government's jurisdiction.
Local AI eliminates all of this. The contract text goes from your document management system to your GPU and back. That is the entire data flow.
For a broader analysis of local AI data sovereignty, see why lawyers are choosing local AI.
System Architecture {#system-architecture}
The contract review stack has four components:
+----------------------------+
| Contract Documents |
| (PDF, Word, plaintext) |
+----------+-----------------+
|
v
+----------+-----------------+
| Document Processor |
| (pdf2text, pandoc) |
+----------+-----------------+
|
v
+----------+-----------------+
| RAG Pipeline |
| (AnythingLLM or custom) |
| Template contracts as |
| reference embeddings |
+----------+-----------------+
|
v
+----------+-----------------+
| Ollama (LLM inference) |
| 70B for analysis |
| 14B for quick extraction |
+----------------------------+
Hardware Requirements
| Use Case | GPU | RAM | Storage |
|---|---|---|---|
| Solo practitioner | RTX 4060 Ti 16GB | 32GB | 500GB SSD |
| Small firm (5-10 users) | RTX 4090 24GB | 64GB | 1TB NVMe |
| Mid-size firm (10-50 users) | 2x RTX 4090 or A6000 | 128GB | 2TB NVMe |
The 70B models that produce the best legal analysis need ~40GB for the Q4_K_M quantization. A single RTX 4090 with 24GB VRAM handles this with CPU offloading for the remaining layers, running at ~25 tokens per second — fast enough that a full contract analysis completes while you refill your coffee.
Setting Up the Pipeline {#setting-up-pipeline}
Step 1: Install Ollama and Pull Models
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the analysis model (70B for best quality)
ollama pull llama3.3:70b-instruct-q4_K_M
# Pull a fast model for simple extraction tasks
ollama pull qwen2.5:14b-instruct-q6_K
Step 2: Document Conversion
Contracts arrive as PDFs and Word docs. Convert them to clean text:
# Install conversion tools
sudo apt install poppler-utils pandoc -y
# PDF to text (preserves layout better than alternatives)
pdftotext -layout contract.pdf contract.txt
# Word to text
pandoc contract.docx -t plain -o contract.txt
# Batch convert a directory of contracts
for f in contracts/*.pdf; do
pdftotext -layout "$f" "${f%.pdf}.txt"
done
Step 3: Build the Reference Library with RAG
Your approved contract templates are the gold standard. Embed them so the AI can compare incoming contracts against your positions:
# Using AnythingLLM (simplest approach)
docker run -d \
--name anythingllm \
-p 3001:3001 \
-v /data/anythingllm:/app/server/storage \
-e LLM_PROVIDER=ollama \
-e OLLAMA_BASE_PATH=http://host.docker.internal:11434 \
-e EMBEDDING_MODEL_PREF=nomic-embed-text \
mintplexlabs/anythingllm
Upload your template contracts, clause libraries, and negotiation playbooks into AnythingLLM. It will chunk and embed them automatically.
For the full RAG pipeline setup, see the RAG local setup guide and the AnythingLLM setup guide.
Clause Extraction Prompts {#clause-extraction}
The difference between useful and useless AI contract review is entirely in the prompts. Generic "summarize this contract" prompts produce generic summaries. Structured extraction prompts produce actionable output.
Master Extraction Prompt
You are a contract analysis assistant. Extract the following information from
the provided contract text. For each item, quote the exact contract language,
then provide a plain-English summary. If an item is not present in the contract,
state "NOT FOUND" — do not guess or infer.
EXTRACT:
1. PARTIES: Full legal names, roles (buyer/seller/licensor/etc.)
2. TERM: Start date, end date, renewal provisions, auto-renewal clauses
3. TERMINATION: Termination for cause triggers, termination for convenience
notice period, post-termination obligations
4. PAYMENT: Total value, payment schedule, late payment penalties, price
escalation clauses
5. LIABILITY: Cap on liability (amount and basis), carve-outs from cap,
exclusion of consequential damages
6. INDEMNIFICATION: Who indemnifies whom, scope, limitations, defense
obligations
7. IP OWNERSHIP: Work product ownership, background IP, license grants,
license restrictions
8. CONFIDENTIALITY: Definition of confidential info, exclusions, duration,
return/destruction obligations
9. NON-COMPETE/NON-SOLICIT: Scope, duration, geographic restrictions
10. GOVERNING LAW: Jurisdiction, dispute resolution mechanism, venue
CONTRACT TEXT:
[paste contract here]
This prompt consistently extracts structured data from contracts ranging from 5-page NDAs to 60-page MSAs. The "quote exact language" instruction is critical — it forces the model to ground its output in the actual text rather than hallucinating plausible-sounding terms.
Risk Flagging: Identifying Problems {#risk-flagging}
After extraction, the second pass identifies issues. This is where the RAG pipeline earns its keep — the model compares the incoming contract against your templates.
Risk Assessment Prompt
You are a contract risk analyst. Compare the following contract clauses against
our standard position (provided as context). For each deviation, assign a risk
level and explain the business impact.
RISK LEVELS:
- CRITICAL: Clause exposes the company to significant financial or legal risk.
Requires immediate legal review before signing.
- HIGH: Clause deviates substantially from our standard position. Negotiation
recommended.
- MEDIUM: Clause differs from our preference but is commercially reasonable.
Consider negotiating if relationship allows.
- LOW: Minor deviation from standard. Acceptable as-is in most circumstances.
For each flagged item, provide:
1. Clause reference (section number and title)
2. Their language (exact quote)
3. Our standard position (from reference templates)
4. Risk level
5. Business impact (1-2 sentences)
6. Suggested counter-language (optional)
CONTRACT CLAUSES:
[paste extracted clauses]
What 70B Models Catch That 14B Models Miss
In testing across 50 commercial contracts:
| Issue Type | 70B Detection Rate | 14B Detection Rate |
|---|---|---|
| Missing liability cap | 97% | 89% |
| Non-standard indemnification | 93% | 71% |
| Auto-renewal traps | 95% | 88% |
| Overbroad IP assignment | 91% | 64% |
| Ambiguous termination triggers | 87% | 52% |
| Unusual governing law choices | 94% | 83% |
The biggest gap is in ambiguity detection. A 14B model can identify that an indemnification clause exists. A 70B model can identify that the indemnification clause uses "arising from" instead of "arising out of" — and explain why that distinction matters for scope of coverage.
Redlining Workflow: Comparing Versions {#redlining-workflow}
Contract negotiation means multiple versions. Track what changed between drafts:
Version Comparison Prompt
Compare these two contract versions and identify every change. For each change:
1. Section and clause number
2. Original language (Version A)
3. Modified language (Version B)
4. Impact assessment: Does this change favor Party A, Party B, or is it neutral?
5. Recommendation: Accept, Reject, or Counter
VERSION A (our last draft):
[paste version A]
VERSION B (their markup):
[paste version B]
Automating the Diff
For large contracts, manually pasting two versions is impractical. Script the comparison:
#!/bin/bash
# compare-contracts.sh — generates a structured diff for AI analysis
VERSION_A="$1"
VERSION_B="$2"
# Generate word-level diff
wdiff "${VERSION_A}" "${VERSION_B}" > /tmp/contract_diff.txt
# Feed to Ollama with the comparison prompt
cat << 'PROMPT' > /tmp/compare_prompt.txt
You are a contract redlining assistant. The following is a word-level diff
between two contract versions. Words in [-deleted-] brackets were removed.
Words in {+added+} brackets were added.
For each change, provide:
1. Section reference
2. What was removed
3. What was added
4. Whether this change favors the drafter or the recipient
5. Risk assessment (Critical/High/Medium/Low)
DIFF OUTPUT:
PROMPT
cat /tmp/contract_diff.txt >> /tmp/compare_prompt.txt
ollama run llama3.3:70b-instruct-q4_K_M < /tmp/compare_prompt.txt
# Clean up
rm /tmp/contract_diff.txt /tmp/compare_prompt.txt
Model Recommendations for Legal Text {#model-recommendations}
Not all models handle legal language equally. Legal text has unique characteristics: precise terminology, nested conditional structures, cross-references between sections, and deliberate ambiguity that models must recognize rather than resolve.
| Model | Quantization | VRAM | Legal Analysis Quality | Best For |
|---|---|---|---|---|
| Llama 3.3 70B | Q4_K_M | 24GB + offload | Excellent — strong reasoning, good with nested conditions | Full contract review, risk assessment |
| Qwen 2.5 72B | Q4_K_M | 24GB + offload | Excellent — particularly good with structured extraction | Clause extraction, comparison tables |
| Llama 3.3 8B | Q6_K | 8GB | Adequate for simple extraction | Quick term identification, metadata |
| Qwen 2.5 14B | Q6_K | 12GB | Good — handles most standard contracts | First-pass review, simple NDAs |
| Mistral Large 2 | Q4_K_M | 48GB (dual GPU) | Outstanding — best reasoning of the group | Complex multi-party agreements |
Custom Modelfile for Legal Analysis
Create a Modelfile tuned for contract work:
FROM llama3.3:70b-instruct-q4_K_M
PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_predict 4096
PARAMETER num_ctx 32768
SYSTEM """You are a senior contract analyst with 15 years of experience in
commercial law. You analyze contracts with precision, always quoting exact
language from the source document. You never fabricate contract terms. When
you are uncertain about an interpretation, you flag the ambiguity rather
than guessing. You understand that contracts are adversarial documents where
word choice is deliberate."""
# Build and use the custom model
ollama create contract-analyst -f Modelfile
ollama run contract-analyst
Temperature at 0.1 is deliberate. Contract analysis needs consistency, not creativity. You want the same contract analyzed twice to produce the same output both times.
Real Example: NDA Review Workflow {#nda-review-example}
Here is a complete workflow for reviewing a mutual NDA — the most common contract type that crosses every business's desk.
The Incoming NDA
A potential partner sends a mutual NDA. Before your legal team spends time on it, run it through the system:
# Convert the NDA
pdftotext -layout incoming-nda.pdf incoming-nda.txt
# Run extraction with the contract analyst model
ollama run contract-analyst << 'EOF'
Extract and analyze this NDA. For each standard NDA element, quote the exact
language and flag any deviations from standard mutual NDA terms:
1. Definition of Confidential Information — is it appropriately scoped?
2. Exclusions — are the standard exclusions present (public knowledge, prior
possession, independent development, legally compelled disclosure)?
3. Duration — how long does the obligation last? Is it reasonable for the
industry?
4. Permitted use — is use restricted to evaluating the business relationship?
5. Return/destruction — what happens to materials after termination?
6. Residuals clause — does it exist? (Major red flag if so)
7. Non-solicitation — does the NDA sneak in non-solicit provisions?
8. Governing law — whose jurisdiction?
9. Injunctive relief — is there a mutual acknowledgment?
10. Assignment — can either party assign rights?
FLAG anything unusual or one-sided.
NDA TEXT:
[paste NDA content]
EOF
What to Watch For
In our testing of 200 NDAs, the most common issues flagged by the 70B model:
- 42% contained a residuals clause — allowing the receiving party to use "retained information" in personnel memories. This effectively guts the NDA's protection.
- 31% had asymmetric definitions — confidential information defined more broadly for one party than the other in a "mutual" NDA.
- 23% included hidden non-solicitation provisions — burying employee non-solicitation in a confidentiality agreement.
- 18% had perpetual duration — no expiration on confidentiality obligations, which courts in many jurisdictions refuse to enforce.
A junior associate might catch these in 45 minutes. The AI catches them in 90 seconds. The attorney then spends their time on the items that require judgment rather than pattern matching.
Integration with Document Management {#document-management}
Keep everything local. No Google Drive, no Dropbox, no SharePoint Online for contract storage.
Local File System Structure
/data/contracts/
├── templates/ # Your approved templates (embedded in RAG)
│ ├── nda-mutual.docx
│ ├── nda-unilateral.docx
│ ├── msa-standard.docx
│ └── sow-template.docx
├── active/ # Contracts under negotiation
│ ├── acme-corp-msa/
│ │ ├── v1-their-draft.pdf
│ │ ├── v1-extracted.txt
│ │ ├── v1-analysis.json
│ │ ├── v2-our-markup.docx
│ │ └── v2-analysis.json
│ └── ...
├── executed/ # Signed contracts
└── archive/ # Expired or terminated
Automated Analysis on File Drop
Use inotifywait to automatically trigger analysis when new contracts are added:
#!/bin/bash
# watch-contracts.sh — auto-analyze new contract files
WATCH_DIR="/data/contracts/active"
ANALYSIS_MODEL="contract-analyst"
inotifywait -m -e create -e moved_to --format '%w%f' "${WATCH_DIR}" | while read filepath; do
if [[ "${filepath}" =~ \.(pdf|docx|txt)$ ]]; then
echo "[$(date)] New contract detected: ${filepath}"
# Convert to text
case "${filepath}" in
*.pdf) pdftotext -layout "${filepath}" "${filepath%.pdf}.txt" ;;
*.docx) pandoc "${filepath}" -t plain -o "${filepath%.docx}.txt" ;;
esac
txtfile="${filepath%.*}.txt"
# Run analysis
ollama run "${ANALYSIS_MODEL}" < "${txtfile}" > "${filepath%.*}-analysis.txt"
echo "[$(date)] Analysis complete: ${filepath%.*}-analysis.txt"
fi
done
Limitations: Where AI Assists and Humans Decide {#limitations}
Be honest about what this system cannot do:
AI handles well:
- Extracting structured terms from unstructured text
- Identifying deviations from your standard positions
- Flagging missing clauses or protections
- Comparing versions and tracking changes
- Generating first-draft markup suggestions
AI handles poorly:
- Judging whether a deviation is acceptable given the business relationship
- Understanding negotiation dynamics and leverage
- Interpreting jurisdiction-specific enforceability
- Assessing reputational risk of contract terms
- Making sign/reject recommendations
The workflow should always be: AI extracts and flags, human reviews and decides. The AI's job is to reduce a 30-page contract to a 2-page summary of issues requiring attention. The lawyer's job is to decide what to do about those issues.
For more on how legal teams are adopting local AI, see the local AI for lawyers guide.
Conclusion
Building a private contract review system takes an afternoon of setup and saves hours on every contract that crosses your desk. The 70B models available today genuinely understand legal language well enough to be useful — not perfect, but useful enough to transform contract review from a slog through boilerplate into a focused review of flagged issues.
The critical requirement is that none of this runs on someone else's servers. Your contracts are your most confidential business documents. They contain your pricing, your strategies, your vulnerabilities. Sending them to a cloud AI service trades short-term convenience for permanent loss of control over that information.
Run the models locally. Keep the documents local. Let the AI do the pattern matching. Let the lawyers do the thinking.
For the foundation of this setup, start with the RAG local setup guide. Already running Ollama? The AnythingLLM setup guide gets you a web-based document interface in minutes.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!