Starter Kit · One-Time Purchase
RAG Starter Kit
Chat with your documents locally (hybrid + rerank)
Worth $79 — you pay $29 today.
Buy now — $29Instant delivery — download from your in-app library right after checkout. Linked to your account; sign in any time to re-download. All sales final.
Overview
The RAG Starter Kit is a complete, working "chat with your documents" system that runs 100% locally — no cloud, no API keys, no data ever leaving the machine. You point it at a folder of PDFs, Word docs, text, Markdown, or CSV files, and it gives you a chat box that answers questions from those documents and quotes the exact source for every answer. Under the hood it's a real production-grade retrieval pipeline (Ollama for local LLM + embeddings, ChromaDB vector store, FastAPI backend, Streamlit chat UI), not a single-vector toy demo.
What makes it worth money: because everything runs on the buyer's own hardware, you can sell a private document-AI service to the businesses that legally or contractually can't use ChatGPT, Copilot, or Claude — law firms, clinics, accountants, manufacturers, anyone with a "no public LLM" policy. That privacy constraint is a large, underserved market, and a credible local RAG build is the hardest 60% of the work. This kit hands you that 60% done.
Be clear-eyed: this is not passive income and not a magic button. The kit removes the build barrier — the hardest engineering — but you still talk to the client, ingest their documents, tune retrieval, host it, and support it. What you get is a real, billable service you can deliver in days instead of weeks, at margins that work because the hard part is already built and tested.
What's included
- Full working source code for a local RAG pipeline (Python): app/api.py, rag_engine.py, retrieval.py, reranker.py, chunker.py, embeddings.py, document_loader.py, config.py, ui.py
- Hybrid retrieval — dense vector search PLUS BM25 keyword search, fused with Reciprocal Rank Fusion (catches both meaning and exact terms, IDs, and codes)
- Cross-encoder reranking with BAAI/bge-reranker-v2-m3 for high-precision results; optional and degrades gracefully if you want a lighter install
- Grounded citations — every answer quotes numbered [1] [2] sources so claims are traceable (the feature that wins trust with lawyers, clinics, accountants)
- Streaming answers — tokens stream live to the UI and the /api/query/stream endpoint
- Streamlit chat UI with multi-collection support, document upload, and an expandable sources panel
- FastAPI REST API: /api/ingest, /api/query, /api/query/stream, /api/collections, /api/stats, /api/health, plus interactive docs at /docs
- Smart chunking — recursive (structure-aware) or semantic (groups sentences by meaning), both configurable
- Document loaders for PDF, DOCX, TXT, MD, and CSV
- CLI tools: scripts/ingest.py for batch/folder ingest and scripts/eval.py for retrieval quality (hit-rate@k + MRR) so you can tune objectively and against a no-rerank baseline
- One-command Docker Compose stack (docker-compose.yml + Dockerfile) that brings up Ollama, auto-pulls the models, and starts the API + UI; pinned image tags, GPU stanza included
- Fully documented .env.example with model swaps (qwen3:8b / qwen3:4b / llama3.3:70b, nomic-embed-text / mxbai-embed-large / bge-m3), chunking, fusion weights, TOP_K, and rerank toggles
- MONETIZE.md — an 11KB income playbook: the exact service to sell, who buys, the pitch that lands, realistic pricing, a ready-to-send Scope of Work template, a delivery checklist, and how to find clients without cold outreach
- README with quickstart, API examples, configuration reference, and honest limits
- Verified-current dependencies and model names (June 2026): FastAPI, ChromaDB, sentence-transformers, rank-bm25, PyMuPDF, python-docx
Who it's for
- Freelancers and small dev shops who want a real, billable AI service to offer clients without spending weeks building a RAG pipeline from scratch
- IT consultants, MSPs, and fractional CTOs whose clients keep asking 'can we use AI on our confidential files?' and currently have no answer
- Developers who want to learn production RAG (hybrid search, reranking, eval) from a clean, working reference instead of a tutorial toy
- Solo founders building a vertical document-AI product for a regulated niche (legal, medical, accounting, manufacturing)
- Internal teams at privacy-sensitive companies that need an on-premise document assistant their staff can use without sending data to the cloud
Use cases
- Private 'chat with your documents' assistant for a law firm — search contracts, case files, and precedents with cited answers, nothing leaving their network
- On-premise SOP/manual assistant for a manufacturer or field-service team that answers from equipment manuals and maintenance logs (often offline plants)
- HIPAA-conscious clinic knowledge base over billing codes, SOPs, and internal policies that can't go to a cloud chatbot
- Internal company wiki / HR / onboarding Q&A for any org with a 'no public LLM' policy
- Accounting or bookkeeping firm assistant over tax docs and client financials, kept confidential on their own hardware
- A learning project: study real hybrid retrieval, RRF fusion, cross-encoder reranking, and retrieval evaluation in working code
Sell a private 'chat with your documents' service to firms that legally can't use ChatGPT
The service
A privacy-safe, on-premise document AI assistant: you install it on the client's own machine/server, ingest their contracts/SOPs/manuals/policies, tune retrieval, and hand them a chat box that answers in seconds with cited sources — and none of their data ever leaves their network. The whole sale is the privacy: you serve the buyers who say 'we'd love to use ChatGPT for this but legal/IT won't let us.'
What to charge
One-time setup/pilot: $1,500–$4,000 (single document set, one machine) up to $5,000–$12,000 (multiple collections, several users, small server). Larger departmental builds with access rules scope to $15,000–$40,000+. Recurring is the real business: $300–$1,500/month for hosting, maintenance, model updates, and support, plus $150–$250/hr (or a monthly bucket) for new document loads and re-tuning. Five clients at ~$600/mo is ~$36k/year of sticky revenue.
How to find clients
- Post ONCE on LinkedIn to your existing network: "I build private, on-premise AI assistants for companies that can't put documents in ChatGPT — HIPAA/legal/IP reasons. Reply 'private' for a demo." The privacy hook pre-qualifies everyone who replies.
- Record a 3-minute demo video on a realistic sample document set (a contract, an SOP) and pin it everywhere — most buyers have never seen private RAG actually work, so the demo IS the marketing.
- Go to the industry, not 'tech': answer 'can we use AI on confidential files?' helpfully in lawyer/accountant/clinic associations, niche forums, and LinkedIn groups until you're the 'private AI person' for that niche.
- Partner with people who already have trust — MSPs, IT-support shops, fractional CTOs, compliance consultants, bookkeepers — and offer a 10–20% referral cut or white-label delivery. One good MSP partner can feed you for a year.
- Publish the one article that ranks: 'How [accountants/clinics/law firms] can use AI on confidential documents without breaking [HIPAA/privilege]' — it pulls in buyers who already know they have the problem.
The delivery steps
- Confirm the buyer’s real blocker (privacy, cost, or offline) and match the pitch: saves hours -> cites its sources -> never leaves your network.
- Run a live demo: ask for 20–50 of their non-sensitive or public documents, ingest them with the kit, and let a decision-maker ask their own questions — watching it quote their own policy back with a citation closes most deals.
- Send a fixed-scope proposal using the included Scope of Work template — define document count, users, a 20-question acceptance test, hardware, timeline, and a one-time fee + monthly retainer (50% deposit to start).
- Deliver: install the stack on their hardware, set CHAT_MODEL/EMBEDDING_MODEL to fit it, choose chunking, ingest into per-topic collections, then run scripts/eval.py and tune TOP_K, fusion weights, and reranking until the acceptance test clears the bar.
- Hand over: lock down network/CORS access, train the users, leave a one-page guide, back up the data/ directory, get written acceptance, and invoice the balance.
- Convert every pilot to a retainer — schedule the first monthly check-in before you leave so 'add documents / re-tune / model updates' becomes recurring revenue.
How to market it
- Lead with the privacy constraint, not the tech — your headline is 'AI for documents that can't go to the cloud,' never 'RAG' or 'vector databases.' The constraint is what makes you the only option for these buyers.
- Make the demo your marketing asset: a short screen recording on a realistic contract/SOP, shown answering with citations, converts far better than any feature list — pin it on your profile, site, and proposals.
- Pick ONE vertical and own it (e.g. 'private AI for accounting firms'). A niche article + niche-forum presence ranks and builds trust faster than being a generalist 'AI consultant.'
- Build a referral/white-label channel through MSPs and IT shops who already serve your target clients — give them 10–20% and let them resell your delivery; this scales without you cold-prospecting.
- Write the cornerstone SEO piece ('How [niche] can use AI on confidential files without breaking [regulation]') to pull in self-qualified inbound leads who already know they have the problem.
- Sell outcomes and retainers, not hours or tokens — price the pilot as a fixed project and frame the monthly fee as 'keeps the assistant your staff now relies on running and current.'
Frequently asked questions
Do I need to be an AI expert to use or sell this?
No, but you do need to be comfortable with the command line, Docker, and basic Python config. The hard ML engineering (hybrid search, reranking, eval) is built and working — your job is to install it, ingest a client’s documents, tune a few .env settings, and support it. The included MONETIZE.md walks through the business side step by step.
Does any data leave the machine? Can I honestly promise clients privacy?
Inference and embeddings run entirely locally via Ollama, and documents are stored only in the local ChromaDB. There are no cloud API calls for document content, which is exactly why you can sell to privacy-blocked buyers. One caveat to set yourself: the API ships with open CORS for easy local dev — restrict it before exposing anything beyond localhost. The kit and README tell you this.
What hardware does it need?
It scales. qwen3:4b + nomic-embed-text run on a 16GB-RAM laptop, CPU-only, fine for small document sets and cheap pilots. The default qwen3:8b is comfortable on 16–32GB RAM and snappy with a 12GB+ GPU — the sweet spot for most clients. llama3.3:70b is for clients who’ll pay for a serious GPU box. If a client lacks suitable hardware, that’s a billable line item.
Is this just a wrapper around an LLM, or a real RAG system?
It’s a real pipeline: structure-aware or semantic chunking, dense + BM25 hybrid retrieval fused with Reciprocal Rank Fusion, optional cross-encoder reranking, grounded inline citations, streaming, multiple collections, a REST API, a chat UI, and an eval script that reports hit-rate@k and MRR so you can tune objectively. Not a single-vector demo.
Can it really be wrong? What do I tell clients?
Yes — it grounds answers and cites sources, but the LLM can still misread, and it only knows what you ingest. Local models are strong but not frontier-cloud-class, and scanned-image PDFs need OCR first (not included). Set these limits in writing up front; the MONETIZE.md and README spell them out. A client who knew the limits renews; a surprised one asks for a refund.
What exactly do I get when I buy, and can I resell what I build with it?
You get the full source code, Docker Compose stack, CLI tools, configuration, README, and the MONETIZE.md income playbook (pitch, pricing, a ready-to-send Scope of Work, delivery checklist, and how to find clients). You use the kit to build and charge for client deployments — that service business is yours to run. Note that all sales of the kit itself are final (no refunds).
After you buy
Purchases are linked to your account — sign in and head to your product library to download anytime. Bought without an account? Check your email for the download link and a one-click way to set a password.
← Back to all kits, tools & codebases