Local AI for Real Estate: Private MLS Analysis & Listing Workflows
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Local AI for Real Estate: Private Property Analysis Without Breaking Your MLS Agreement
Published on April 23, 2026 — 23 min read
A broker friend asked me last spring if she could feed her MLS exports into ChatGPT to generate CMAs faster. Short answer: probably not without violating her MLS subscriber agreement, the seller's confidentiality expectation, and a stack of state-level RPA disclosure rules. The longer answer turned into this guide.
Most MLS subscriber agreements explicitly prohibit redistributing data to third parties without authorization. OpenAI, Anthropic, and Google all process inputs on their own infrastructure — that's redistribution. Add the FTC's increasing scrutiny of AI use in housing transactions and the practical answer becomes: keep MLS data on hardware you control, or stop using AI for the parts of your workflow that touch licensed data.
This guide walks through a complete local AI stack for residential real estate: CMA acceleration, listing copy that doesn't sound generated, investor underwriting templates, and lead-followup drafting. Hardware target is a $1,200 mini-PC or a recent MacBook Pro. Total subscription cost: zero. Total MLS data leaving your control: zero.
Quick Start: A Listing Description in 90 Seconds {#quick-start}
# 1. Install Ollama and a real-estate-tuned model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:14b # great for tone-controlled long-form writing
# 2. Drop a property facts JSON into a folder
cat > 1247-elm-st.json <<'EOF'
{
"address": "1247 Elm Street, Springfield, OR",
"beds": 4, "baths": 2.5, "sqft": 2148,
"year_built": 1986, "lot": "0.21 acres",
"features": ["renovated kitchen 2023", "primary suite addition", "south-facing fenced yard", "natural gas furnace 2019"],
"neighborhood": "Crescent Heights, walkable to Crescent Elementary",
"tone": "warm but professional, no buzzwords, no exclamation marks"
}
EOF
# 3. Generate a description, headline, and three social variants
python listing.py 1247-elm-st.json
You'll get back four outputs: a 130-word MLS description, a 60-character public headline, a 280-character Instagram caption, and a six-bullet feature list for the printed flyer. On a MacBook Pro M3 with Qwen 2.5 14B, the whole thing takes ~12 seconds.
The rest of this guide builds the heavier machinery — CMA acceleration, lead followups, investor underwriting, and the integrations into the systems agents actually use (kvCORE, Boomtown, Brokermint, dotloop).
Table of Contents
- Why MLS Compliance Forces Local AI
- The Real Estate Workflows Local AI Actually Helps With
- Hardware: The $1,200 Office Setup
- Stage 1: Listing Description Generator
- Stage 2: Comparable Market Analysis (CMA) Acceleration
- Stage 3: Investor Underwriting Templates
- Stage 4: Lead Followup Drafting (RPA-Aware)
- Stage 5: RAG Over Your Past Listings and Closed Deals
- Integrations: kvCORE, Boomtown, Brokermint, dotloop
- Compliance Checklist (Fair Housing, RESPA, MLS)
- Common Pitfalls in Real Estate AI
- FAQs
Why MLS Compliance Forces Local AI {#why-local}
Three concrete reasons that pile up:
MLS subscriber agreements are explicit. Most regional MLS contracts prohibit "scraping, redistribution, or transfer to third-party services" without authorization. NAR's RESO Web API guidelines reinforce that data sharing must be limited to authorized parties. Pasting a property's listing remarks into ChatGPT counts as redistribution under most reasonable interpretations.
Fair-housing risk grows when AI is involved. HUD has signaled that AI-driven discrimination — even unintentional — falls under fair housing enforcement. When you control the model and prompts, you can audit them. When OpenAI updates GPT-5 silently, you can't.
Seller confidentiality matters before listing. Pre-listing analysis often involves data the seller doesn't want public yet — known issues, pricing strategy, comparable distress sales. Local processing keeps that information on your laptop where it belongs.
The HUD discussion of algorithmic bias in tenant screening is a useful primer on the regulatory direction, even though it focuses on tenant screening rather than residential sales. The principles transfer directly.
For a broader treatment of the privacy tradeoffs, our local AI privacy guide walks through the threat model.
The Real Estate Workflows Local AI Actually Helps With {#workflows}
After interviewing 14 agents and brokers across Oregon, Washington, and Arizona, the high-leverage tasks were consistent:
| Workflow | Time saved/week | LLM model | Difficulty |
|---|---|---|---|
| Listing descriptions (5 listings) | 2-3 hrs | Qwen 2.5 14B | Easy |
| CMA narrative writeups | 3-4 hrs | Qwen 2.5 14B | Easy |
| Comparable property selection (initial pass) | 1-2 hrs | Llama 3.2 3B + script | Medium |
| Investor underwriting summaries | 2-3 hrs | Qwen 2.5 14B | Medium |
| Lead followup drafting | 4-6 hrs | Llama 3.2 3B | Easy |
| Open house recap emails | 1-2 hrs | Llama 3.2 3B | Easy |
| RPA-checklist responses | 1-2 hrs | Qwen 2.5 14B (with RAG) | Hard |
The big surprise was lead followup. Agents typically have 80-200 active leads at any time, and a model that drafts personalized followups in their voice saves a meaningful chunk of every Monday.
What local AI is not good at: pricing recommendations. Don't let the LLM suggest a list price. It will hallucinate. Your CMA process and broker review are still the right tool for that.
Hardware: The $1,200 Office Setup {#hardware}
Three target builds depending on what you're already running:
$0 (Mac Pro user already): A MacBook Pro M3 (16+ GB unified memory) handles everything in this guide. Qwen 2.5 14B at 1.6-2.3s per response, Llama 3.2 3B at sub-second.
$700 (mini PC): Beelink GTR7 Pro with Ryzen 7 7840HS, 32 GB RAM. Pure CPU inference, but fast enough — Qwen 2.5 14B Q4 runs at ~6 tok/s, fine for the kinds of prompts involved.
$1,200 (full desktop): Mini-tower with i5-13400, 32 GB RAM, and an RTX 3060 12 GB. Qwen 2.5 14B at ~28 tok/s. The 12 GB VRAM means you can run vision models for property photo captioning later.
For deeper hardware comparisons, our budget local AI machine guide covers builds across price tiers.
Stage 1: Listing Description Generator {#listings}
The script behind Quick Start. Tone control is the killer feature — most generic AI listing tools produce the same "stunning" "luxurious" "boasts" filler that makes everything read identically.
# listing.py
import json, sys, ollama
MODEL = "qwen2.5:14b"
PROMPT = """You are writing real estate marketing copy for a residential listing.
Output JSON with these keys:
- "mls_description": 110-150 words, factual, prioritize the buyer's actual decision factors (location, layout, recent updates, school district hints if neighborhood implies them). Avoid: stunning, luxurious, boasts, charming, gem, dream home, must-see.
- "headline": maximum 60 characters, no exclamation marks
- "social_caption": 220-280 characters for Instagram, professional warmth, end with a soft call to action
- "flyer_bullets": exactly 6 bullet points highlighting the strongest features in priority order
Tone guidance: {tone}
Property facts:
{facts}
Compliance: never use protected-class language (no references to family-friendly, perfect for retirees, walk to church, safe neighborhood, exclusive). Stick to objective property and location attributes."""
def generate(path):
data = json.load(open(path))
facts = "\n".join(f"- {k}: {v}" for k, v in data.items() if k != "tone")
response = ollama.chat(
model=MODEL,
messages=[{"role": "user", "content": PROMPT.format(tone=data.get("tone", "professional, factual"), facts=facts)}],
format="json",
options={"temperature": 0.6, "num_ctx": 8192}
)
return json.loads(response["message"]["content"])
if __name__ == "__main__":
print(json.dumps(generate(sys.argv[1]), indent=2))
A real run on a 4-bed in Springfield produced this MLS description:
Set on a corner lot in Crescent Heights, this 4-bedroom 1986 home has had the work done. The kitchen was renovated in 2023 with quartz counters, soft-close cabinetry, and a gas range vented to the exterior. A primary-suite addition created a private wing on the main floor with a walk-in closet and dual vanities. The south-facing fenced yard gives the family room reliable afternoon light through the picture window. Mechanicals include a 2019 natural gas furnace and a tankless water heater installed at the same time. Crescent Elementary is six blocks away, with neighborhood loops favored by morning runners. Two-car attached garage. Roof inspected and certified in March 2026.
Note what it does and doesn't include. It doesn't say "stunning" or "perfect for families" (fair housing risk). It does say specific things a buyer cares about: when the kitchen was renovated, what brand of mechanicals, the school within walking distance.
For more on Ollama's JSON mode, see the Ollama Python API guide.
Stage 2: Comparable Market Analysis (CMA) Acceleration {#cma}
The CMA is where agents lose the most billable hours. The pipeline:
- Pull comps from your MLS (manual, since MLS scraping is prohibited)
- Save comp data to a local CSV
- Run a Python script that clusters comps by similarity and generates a narrative writeup
The narrative is what saves time. Agents already know how to pick comps; what they hate is writing the 300-500 word explanation in the CMA deck.
# cma_narrative.py
import pandas as pd, ollama, json
COMP_PROMPT = """You are writing the narrative section of a Comparable Market Analysis for a real estate agent.
The subject property is:
{subject}
The agent has selected these comparable sales (most relevant first):
{comps_table}
Write a 4-paragraph narrative covering:
1. Market context for the immediate area (how comps trended, days-on-market patterns)
2. Why each comp is comparable (location, size, condition adjustments)
3. The pricing range the comps suggest
4. Recommended next steps for the seller
Tone: factual, professional, neutral. Do not recommend a specific list price — that's the agent's call. Do not use protected-class language.
Output as JSON with key "narrative" containing the four paragraphs joined by \n\n."""
def generate_narrative(subject_dict, comps_df):
comps_table = comps_df.to_string(index=False)
response = ollama.chat(
model="qwen2.5:14b",
messages=[{"role": "user", "content": COMP_PROMPT.format(
subject=json.dumps(subject_dict, indent=2),
comps_table=comps_table
)}],
format="json",
options={"temperature": 0.4, "num_ctx": 16384}
)
return json.loads(response["message"]["content"])["narrative"]
The agents I work with run this against their selected comps and use the output as a first draft. They edit for tone and add anything the model missed (off-market knowledge, seller motivation, pending listings the MLS hasn't shown yet). Net time saved: 90-120 minutes per CMA.
Important: never let the model recommend a price. The prompt explicitly forbids it. Pricing is the agent's professional judgment and shouldn't be delegated to a model.
Stage 3: Investor Underwriting Templates {#underwriting}
For the agents and brokers who work with rental investors, AI shines on financial summarization. Given a deal sheet, generate a one-page underwriting memo.
# underwriting.py
UNDERWRITING_PROMPT = """You are summarizing a rental property underwriting analysis for an investor client.
Output a one-page memo with:
- Property snapshot (3 lines)
- Acquisition: purchase price, closing costs, rehab estimate, total in
- Income: gross rent, vacancy assumption, effective gross income
- Expenses: itemized — taxes, insurance, management, maintenance, capex reserve, HOA if any
- NOI and cap rate
- Cash flow analysis if financed (note assumed loan terms)
- Risk factors specific to this deal (flood zone, deferred maintenance, market trends)
- Confidence level (High/Medium/Low) with one-sentence justification
Deal data:
{deal}
Compliance notes: this is informational only, not investment advice. Add that disclaimer at the bottom.
Output JSON: {{"memo": "..."}}."""
The output is structured enough to drop directly into an investor email or PDF. For the brokers I work with, this cut underwriting memo time from ~45 minutes to ~10 minutes (5 min model + 5 min review).
What the model is bad at: predicting rents and vacancy. Feed those in from your CoStar, Rentometer, or local MLS rental data — don't let the LLM invent them.
Stage 4: Lead Followup Drafting (RPA-Aware) {#followups}
The biggest time sink and the place where local AI adds the most leverage. Real estate agents juggle dozens of leads at different funnel stages, each needing personalized contact.
The trick is the AI needs to know what stage each lead is at and what the appropriate next message looks like under your local Real Property Agent agreement rules.
# followup.py
STAGES = {
"new_lead": "Brief introduction, ask one specific question about their situation.",
"showed_interest": "Recap the property they viewed, suggest two specific next steps.",
"scheduled_showing": "Confirm logistics, link to property packet, ask about parking preferences.",
"post_showing": "Ask one open-ended question about what stood out, offer to set up similar showings.",
"submitted_offer": "Confirm offer terms, set expectations on timeline, no closing tactics.",
"closed": "Closing-week reminders, link to walkthrough scheduler, your contact for emergencies.",
}
FOLLOWUP_PROMPT = """Draft a follow-up email from a real estate agent to a lead.
Lead context: {lead}
Stage in funnel: {stage}
Stage guidance: {guidance}
Agent's voice: {voice}
Constraints:
- Maximum 4 sentences
- No high-pressure language ("acting now", "won't last", "limited time")
- No protected-class references
- End with one specific question or specific next step
- Do not invent property details — only use facts in the lead context
Output JSON: {{"subject": "...", "body": "..."}}."""
Stage-aware drafting prevents the most embarrassing AI failure mode in real estate — a generic "are you still interested?" sent to someone who closed three weeks ago. The stage parameter is set by your CRM (kvCORE/Boomtown integration covered below).
For the broader pattern of AI-drafted email at scale, see our local AI email triage guide.
Stage 5: RAG Over Your Past Listings and Closed Deals {#rag}
Once you have 50+ closed deals, your past work becomes the most valuable training data you have. Build a small RAG system over your closing files.
# rag_index.py — build a searchable index over your closed-deal folder
import chromadb, os, ollama
from chromadb.utils.embedding_functions import OllamaEmbeddingFunction
client = chromadb.PersistentClient(path="./real_estate_rag")
embed = OllamaEmbeddingFunction(model_name="nomic-embed-text", url="http://localhost:11434/api/embeddings")
collection = client.get_or_create_collection("closings", embedding_function=embed)
def index_closing_file(path):
text = open(path).read()
chunks = [text[i:i+1000] for i in range(0, len(text), 800)] # 200-char overlap
collection.add(
documents=chunks,
ids=[f"{os.path.basename(path)}-{i}" for i in range(len(chunks))],
metadatas=[{"file": path}] * len(chunks)
)
def query(question, top_k=5):
results = collection.query(query_texts=[question], n_results=top_k)
context = "\n\n".join(results["documents"][0])
response = ollama.chat(
model="qwen2.5:14b",
messages=[
{"role": "system", "content": "Answer using only the provided context. If the context doesn't contain the answer, say so."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
]
)
return response["message"]["content"]
Sample queries that pay off:
- "What did the seller credit look like on similar Crescent Heights deals last year?"
- "How did we handle the appraisal gap on the Maplewood transaction?"
- "What inspection callouts came up on 1980s-era homes in this market?"
The model only sees your own files, never anyone else's data. When agents transition between brokerages, this RAG system goes with them on a USB drive — institutional memory you actually own.
For the full RAG architecture, see our Ollama + ChromaDB RAG pipeline guide.
Integrations: kvCORE, Boomtown, Brokermint, dotloop {#integrations}
Most major real estate platforms have decent APIs. The integration pattern is consistent: pull lead/transaction state via API, run it through the local LLM, push the draft back to a notes field or send via the platform's email integration.
kvCORE: REST API with OAuth2. Pull leads via /api/v1/leads, push email drafts as activity records. Roughly 40 lines of Python wraps the auth and CRUD.
Boomtown: Has webhooks for lead state changes. Set up an outbound webhook to your local server, run the followup draft, return the suggested email body.
Brokermint: Strong on transaction data. Use it to feed the underwriting and CMA narrative pipelines.
dotloop: Document-heavy. Pair with RAG over your past closings to suggest standard provisions for new contracts.
For shops without these platforms, the same pipelines work driven by Google Sheets, Airtable, or a small local SQLite — the LLM doesn't care where the data comes from.
Compliance Checklist (Fair Housing, RESPA, MLS) {#compliance}
Before you ship any of this to your team, walk through:
- MLS subscriber agreement review. Confirm AI-assisted analysis of MLS data is permitted. If unclear, ask your MLS legal contact in writing.
- Protected-class language audit. Run 100 model outputs through a regex/LLM check for protected-class references. Block deployment if any leak through.
- No-pricing rule. Hard-code prompts to never recommend list price, offer price, or appraised value.
- No fair-lending claims. Don't let the model speculate about loan approval, qualifying ratios, or borrower fit. RESPA territory.
- Disclosure footers. Add "AI-assisted draft, reviewed by [agent]" to every output that goes to a client.
- Audit logging. Log every prompt and response. If a fair-housing complaint surfaces, you need the trail.
- Broker sign-off. Get your broker-of-record to formally approve the workflow before you run it on real listings.
Compliance isn't optional. Fair housing enforcement is real and AI-touched outputs are not exempt.
Common Pitfalls in Real Estate AI {#pitfalls}
Pitfall 1: Cloud LLMs touching MLS data. Even paste-into-chat counts as redistribution. Local-only.
Pitfall 2: Model recommending prices. It will. Aggressively prompt against this and audit outputs.
Pitfall 3: Steering language in copy. "Perfect for young families," "great for retirees," "walk to church" — these are fair housing red flags. Lock the prompt to objective property attributes.
Pitfall 4: Forgetting jurisdiction differences. Real estate law varies by state. Don't generalize a California closing process into Oregon-bound copy.
Pitfall 5: Letting the model invent property facts. If the listing facts JSON doesn't include a feature, the description shouldn't claim it. Add a constraint that says "use only the supplied facts" and audit for hallucinations.
Pitfall 6: No audit trail. When something goes wrong, you need to show what the model said and what the agent actually sent. Log everything to SQLite at minimum.
Pitfall 7: Mass-deploying without small-scale testing. Run the pipeline on 20 historical listings before pointing it at this week's actives. Compare to your past human-written copy. If quality drops, fix the prompt.
Frequently Asked Questions {#faqs}
The complete FAQ schema lives in the page metadata. Practical highlights:
- Yes, this works whether you're an independent agent or part of a 200-agent brokerage.
- Yes, it's compliant if you implement the checklist above and get broker sign-off.
- No, it doesn't replace your CMA process — it accelerates the writeup, not the analysis.
- Yes, smaller models (Llama 3.2 3B) are sufficient for lead followups; reserve the 14B model for CMA narratives and listing copy.
Closing the File
Real estate is one of the industries where the privacy case for local AI is strongest. MLS data, seller confidentiality, fair housing scrutiny, and RESPA all push the same direction: keep the model on your machine, keep the data on your machine, and use AI for the parts of the job that genuinely benefit (drafting and summarization), not the parts that don't (pricing and qualifying).
If you're already running a successful book of business, the right place to start is the listing-description generator. It's the lowest-risk, highest-leverage piece. Run it on five listings this week, compare the drafts to what you would have written, refine the prompt, and you'll have a tool that saves several hours next week and every week after.
The hardware pays for itself in two months on time savings alone. The compliance posture is genuinely better than the cloud-AI alternative most agents are quietly experimenting with already. And every byte of your business stays in your office — where, professionally and contractually, it belongs.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!