Talk to Your Database with a Local LLM (2026): Private Text-to-SQL
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Go from reading about AI to building with AI 20 structured courses. Hands-on projects. Runs on your machine. Start free.
You can talk to your database in plain English entirely offline by pairing a SQL-capable local model — defog's SQLCoder-7B-2 (a CC BY-SA 4.0 model fine-tuned from CodeLlama-7B) or the Apache-2.0 Qwen2.5-Coder family — with a text-to-SQL tool like Vanna (MIT license) or a simple "schema-in-prompt" pattern. The model writes the SQL; you run it. The privacy win is the whole point: with Vanna's RAG approach the LLM only ever sees your table schemas, documentation, and example queries — never the actual rows — and when the model runs in Ollama on your own box, nothing leaves the machine at all. Below is a working setup, a real comparison table, and the caveats that keep you from nuking a table on a hallucinated DELETE.
What does "talk to your database with a local LLM" actually mean?
It means natural-language-to-SQL, run on hardware you control. You type "What were our top 5 products by revenue last quarter?" and a local model translates that into a runnable SQL query against your schema. There are two moving parts:
- A capable local model that is good at SQL specifically (general chat models are mediocre at it).
- A tool or pattern that feeds the model your database schema so it knows your tables, columns, and relationships — either a framework like Vanna, or a hand-rolled prompt that pastes your
CREATE TABLEstatements in.
The cloud version of this (ask a hosted GPT-class model, it writes SQL) has existed for a while. The difference here is that your schema — and optionally your data — never gets shipped to a third party. For anyone working with PII, health records, financial data, or just a company that says "don't paste the prod schema into ChatGPT," that is the deciding factor.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Which local model should write the SQL?
Two families are worth your time, and they sit at different points on the size/quality curve. Everything below is verified against the official model cards.
The comparison that matters
| Model | Params | License | What it's good at | Approx. VRAM (Q4_K_M) | Source |
|---|---|---|---|---|---|
| defog SQLCoder-7B-2 | 7B | CC BY-SA 4.0 | Purpose-built for text-to-SQL; fine-tuned from CodeLlama-7B | ~4–5 GB | model card |
| Qwen2.5-Coder 7B | 7B | Apache 2.0 | Strong general coder (~88% HumanEval pass@1), good SQL, permissive license | ~5–6 GB | Qwen blog |
| Qwen2.5-Coder 32B | 32B | Apache 2.0 | SOTA open code model, near GPT-4o on code; best for gnarly multi-join queries | ~20 GB | Qwen blog |
The honest recommendation: start with defog SQLCoder-7B-2. It is the only model in this list that was built for this one job — defog reports it edging out GPT-4 on JOIN-heavy queries in their own PostgreSQL-based SQL-Eval harness (their figure was ~94.3% vs ~91.4% on joins; treat any single vendor benchmark as directional, not gospel). It is tiny enough (~4–5 GB at Q4_K_M) to run on a laptop GPU or even CPU. If you want one model that does SQL and general coding, use Qwen2.5-Coder 7B for its Apache-2.0 license and broad ability; step up to the 32B only when your queries involve many joins, window functions, or CTEs and the 7B starts getting them subtly wrong.
License note that bites people: SQLCoder-7B-2 is CC BY-SA 4.0 — if you fine-tune it, you must re-share your modified weights under the same license. Qwen2.5-Coder (7B/14B/32B) is Apache 2.0, which is friendlier for closed commercial products. The 3B Qwen2.5-Coder is under a separate Qwen-Research license, so skip it for commercial use.
What I measured running it locally
On an RTX 3090 (24GB) I pulled the GGUF build via Ollama and ran SQLCoder-7B-2 at the Q4_K_M quant. It loaded in well under 5 GB of VRAM and returned single-statement SELECT queries in roughly a second or two for a normal "top N by some aggregate" question — call it ~40–55 tok/s, approximate and hardware-dependent. On CPU-only (no GPU) the same 7B Q4_K_M model is usable but noticeably slower — expect low-double-digit tok/s, so a query takes several seconds. The 32B Qwen model on the same 3090 was meaningfully slower per token but produced cleaner SQL on a 4-table join I deliberately threw at the 7B and watched it fumble. Frame all of these as ballpark figures; your CPU, quant, and context length move them.
How do I set up private text-to-SQL with Vanna and Ollama?
Vanna is the path of least resistance. It is MIT-licensed, uses a RAG (retrieval-augmented generation) approach, and connects to Postgres, MySQL, SQLite, Snowflake, BigQuery, DuckDB, ClickHouse, Oracle, and more. The key privacy property: Vanna trains a lightweight RAG "model" on your schema, documentation, and example SQL — and that metadata layer is all the LLM sees. Your actual table rows are not sent to the model unless you explicitly opt into it.
Step 1 — Run a local model in Ollama
# Install Ollama from https://ollama.com, then:
# defog SQLCoder-7B-2 (CodeLlama-based, the one with the JOIN numbers) is a
# community upload — the canonical "sqlcoder:7b" library tag is the OLDER model:
ollama pull pxlksr/defog_sqlcoder-7b-2
# or, for a general coder under Apache 2.0:
ollama pull qwen2.5-coder:7b
Step 2 — Point Vanna at the local model and your database
# pip install 'vanna[chromadb,ollama]'
from vanna.ollama import Ollama
from vanna.chromadb import ChromaDB_VectorStore
class LocalVanna(ChromaDB_VectorStore, Ollama):
def __init__(self, config=None):
ChromaDB_VectorStore.__init__(self, config=config)
Ollama.__init__(self, config=config)
# Everything below runs on YOUR machine: model, vector store, and DB.
vn = LocalVanna(config={'model': 'pxlksr/defog_sqlcoder-7b-2'})
# Connect to your own database (SQLite shown; Postgres/MySQL also supported)
vn.connect_to_sqlite('shop.db')
Step 3 — "Train" the RAG layer on your schema
This is the step that makes the model accurate. You feed it your DDL and a few good example questions. None of this is your row data — it is structure and documentation.
# Teach it your schema (the LLM sees this, not your rows)
vn.train(ddl="""
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER,
product_id INTEGER,
amount NUMERIC,
created_at TIMESTAMP
);
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT,
category TEXT
);
""")
# Optional: teach it a few documented example pairs to lift accuracy
vn.train(
question="What were the top 5 products by revenue last month?",
sql="""SELECT p.name, SUM(o.amount) AS revenue
FROM orders o JOIN products p ON p.id = o.product_id
WHERE o.created_at >= date('now','-1 month')
GROUP BY p.name ORDER BY revenue DESC LIMIT 5;"""
)
Step 4 — Ask in English
sql = vn.generate_sql("Which category made the most revenue this year?")
print(sql) # <-- READ THIS BEFORE YOU RUN IT
# df = vn.run_sql(sql) # only after you've verified the query
That is the whole loop: question → schema-aware SQL → (you review) → run. Vanna also ships a web UI if you want a chat box instead of a Python REPL.
What if I don't want a framework? The schema-in-prompt pattern
You do not strictly need Vanna. For a single database or a quick script, the schema-in-prompt pattern is fewer moving parts: paste your relevant CREATE TABLE statements straight into the prompt and ask for SQL.
import ollama
SCHEMA = """
CREATE TABLE orders (id INT, customer_id INT, product_id INT, amount NUMERIC, created_at TIMESTAMP);
CREATE TABLE products (id INT, name TEXT, category TEXT);
"""
question = "Total revenue per product category, highest first."
prompt = f"""You are a PostgreSQL expert. Given the schema below, write ONE
read-only SQL SELECT that answers the question. Return only SQL, no prose.
Schema:
{SCHEMA}
Question: {question}
SQL:"""
resp = ollama.generate(model='pxlksr/defog_sqlcoder-7b-2', prompt=prompt)
print(resp['response']) # review, then run yourself
When to use which: the schema-in-prompt pattern is great for a handful of tables and one-off scripts. The moment your schema has 20+ tables (too big to paste), or you want the model to learn from corrected examples over time, Vanna's RAG retrieval — which pulls only the relevant tables and example queries into context — earns its keep.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Is this actually private? Where the data really goes
This is the part people get wrong, so be precise about it:
- Model runs locally (Ollama): the LLM's weights and inference are on your machine. The text of your question and the schema/examples you provide go to the model — but the model is local, so that text never crosses the network.
- Vanna's RAG store: with a local vector store (e.g. ChromaDB as above) your schema and training examples sit in a local database. Nothing is uploaded.
- Your rows: in both patterns above, the model writes SQL from the schema; it does not need your data to do so. Your rows are only touched when you execute the generated SQL against your DB — locally.
The one caveat: if you wire Vanna to a cloud LLM (OpenAI, Anthropic, Gemini are all supported), then your schema and questions do leave. Keeping it private means keeping the model in Ollama. That is the trade — a 7B local model is weaker than a frontier cloud model on the hardest queries, but it is the difference between "my schema is on my disk" and "my schema is in someone's training pipeline."
The caveats that keep you out of trouble
Text-to-SQL is a productivity tool, not an autopilot. Treat generated SQL like code from a junior dev who is fast but occasionally confidently wrong.
- Always read the SQL before running it. Models hallucinate column names, invent joins, and silently misinterpret "last month" as a calendar month vs. a trailing 30 days. Verify against your actual schema.
- Never let it auto-run destructive statements. Connect through a read-only database role for the AI. If the model can only
SELECT, a hallucinatedDROP TABLEorDELETEsimply errors instead of ruining your day. This is the single most important safeguard. - Sanity-check the numbers. A query can be valid SQL and still answer the wrong question. Spot-check totals against a number you already know.
- Watch for subtle aggregation bugs — missing
GROUP BYcolumns, double-counting from a fan-out join, orNULLhandling that quietly drops rows. - Smaller models struggle with complex joins and window functions. If the 7B keeps getting a multi-table query wrong, give it better example pairs (Vanna) or step up to Qwen2.5-Coder 32B.
Key Takeaways
- Fully local text-to-SQL is real in 2026 — a SQL-tuned 7B model plus a tool is enough to ask your database questions in English without anything leaving your machine.
- Start with defog SQLCoder-7B-2 (purpose-built, ~4–5 GB at Q4_K_M, CC BY-SA 4.0); reach for Qwen2.5-Coder 7B/32B (Apache 2.0) when you want one model for SQL and general coding or need more horsepower on complex joins.
- Vanna (MIT) is the fast path — its RAG approach shows the LLM only your schema and examples, not your rows, and it runs against Postgres, MySQL, SQLite, and more.
- The schema-in-prompt pattern is a fine, framework-free alternative for small schemas and quick scripts.
- Privacy depends on keeping the model in Ollama — pointing the same tool at a cloud LLM ships your schema off-box.
- Two non-negotiable safeguards: read every query before running it, and connect the AI through a read-only DB role so it can never run destructive SQL.
Next Steps
- Want the model to do things, not just write queries? See our guide to building a local AI agent that can call tools like your text-to-SQL function.
- Give your database assistant a memory of past questions and corrections with local AI agent memory using mem0.
- Need a model that codes well beyond SQL? Compare options in the best local AI models for programming.
- Working with documents, charts, or screenshots of dashboards instead of a live DB? Read local AI for vision tasks to OCR and reason over them offline.
For the authoritative source on the text-to-SQL framework used here, see the Vanna GitHub repository, and for the SQL model itself, the defog SQLCoder-7B-2 model card.
Go from reading about AI to building with AI
20 structured courses. Hands-on projects. Runs on your machine. Start free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!