AnythingLLM Setup Guide: All-in-One Local AI with RAG
Before we dive deeper...
Get your free AI Starter Kit
Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.
AnythingLLM at a Glance
Key Stats:
• 53,000+ GitHub stars
• 30+ LLM providers supported
• 9+ vector database options
• 100% offline capable
Core Features:
• Built-in RAG for document chat
• AI agents with web/SQL/files
• No-code agent flow builder
• MCP compatibility
What is AnythingLLM?
AnythingLLM is an open-source, all-in-one Desktop and Docker AI application developed by Mintplex Labs. With 53,000+ GitHub stars, it's the most comprehensive local AI platform for document chat, RAG, and AI agents.
AnythingLLM is designed to be private by default—everything is stored and run locally on your machine. It acts as a bridge between your proprietary knowledge base and modern AI models, enabling you to build custom AI systems with zero data leakage risk.
Why AnythingLLM?
- LLM Flexibility: Support for 30+ providers including OpenAI, Anthropic, Google, and local models via Ollama
- Full-Stack RAG: Turn any document into searchable context for AI conversations
- AI Agents: Built-in agent capabilities with web search, SQL, charts, and custom tools
- No-Code Builder: Visual interface to create agentic workflows without programming
- Multi-User Support: Role-based permissions for teams (Docker deployment)
- 100% Offline: Works without internet when using local models
- Cross-Platform: Available for macOS, Windows, Linux, and Docker
How AnythingLLM Works
- Documents are uploaded and converted to text
- Chunks are created with configurable overlap
- Embeddings are generated using your chosen model
- Vectors are stored in the vector database
- Queries are embedded and matched using cosine similarity
- Context is provided to the LLM for accurate responses
Installation Options
Desktop Application (Single-User)
The simplest way to get started—perfect for personal use.
Step 1: Download
Visit https://anythingllm.com and download for your platform:
- macOS (Apple Silicon and Intel)
- Windows
- Linux (AppImage)
Step 2: Install and Launch
Run the installer and launch AnythingLLM.
Step 3: Choose Your LLM
On first boot, you can:
- Download a built-in model (Llama-3, Phi-3, etc.)
- Connect to Ollama
- Configure a cloud provider
The desktop app includes a built-in LLM engine that runs models locally on CPU/GPU without external dependencies.
Docker Installation (Multi-User)
Docker deployment is ideal for servers, teams, and production environments.
System Requirements:
- Minimum 2GB RAM
- Minimum 10GB disk storage
- Docker installed
- yarn and node (for building from source)
Quick Setup:
# Pull the latest image
docker pull mintplexlabs/anythingllm:master
# Create storage directory
mkdir -p $HOME/anythingllm
# Create environment file
touch "$HOME/anythingllm/.env"
# Run the container
docker run -d -p 3001:3001 \
--cap-add SYS_ADMIN \
-v $HOME/anythingllm:/app/server/storage \
-e STORAGE_DIR="/app/server/storage" \
mintplexlabs/anythingllm:master
Access the application at http://localhost:3001
Building from Source:
# Clone the repository
git clone https://github.com/Mintplex-Labs/anything-llm.git
cd anything-llm
# Create SQLite database file
touch server/storage/anythingllm.db
# Build and run with docker-compose
docker compose up -d
Docker Notes:
- UID and GID are set to 1000 by default
- The
--cap-add SYS_ADMINflag is required for web scraping functionality - Persistent storage is mounted to preserve data across container restarts
Cloud Deployment Options
AnythingLLM offers one-click deployment templates for:
- Railway
- Render
- AWS, Google Cloud, Azure
- Custom cloud infrastructure
Connecting to Ollama (Local LLMs)
The most popular setup is AnythingLLM with Ollama for 100% offline, private AI.
Step 1: Install and Start Ollama
# Download from https://ollama.com/download
# Or install via Homebrew (macOS)
brew install ollama
# Start the Ollama server
ollama serve
Step 2: Pull Models
# For high-end systems (64GB+ RAM)
ollama pull gemma3:27b
ollama pull llama3.1:70b
# For mid-range systems (16-32GB RAM)
ollama pull llama3.1:8b
ollama pull mistral:7b
# For lighter systems (8-16GB RAM)
ollama pull gemma3:4b
ollama pull phi3:mini
# For embeddings (required for RAG)
ollama pull nomic-embed-text
Step 3: Configure AnythingLLM LLM Provider
- Open AnythingLLM
- Go to Settings (gear icon)
- Select LLM Provider
- Choose Ollama
- Set the Ollama URL:
| Deployment | URL |
|---|---|
| Desktop App | http://127.0.0.1:11434 |
| Docker (Windows/macOS) | http://host.docker.internal:11434 |
| Docker (Linux) | http://172.17.0.1:11434 |
- Select your model from the dropdown
- Click Save
Step 4: Configure Embeddings
- Go to Settings > Embedder
- Select Ollama
- Choose
nomic-embed-text - Set the same URL as above
- Click Save
Note: If nomic-embed-text isn't available, run:
ollama pull nomic-embed-text
Alternative: Built-in LLM Engine
AnythingLLM Desktop includes a built-in LLM engine that can download and run models directly:
- Go to Settings > LLM Provider
- Select AnythingLLM
- Choose a model to download (Llama-3, Phi-3, etc.)
- Wait for download to complete
- Start chatting
This option requires no external setup—perfect for beginners.
LLM Provider Options
AnythingLLM supports 30+ LLM providers for maximum flexibility.
Cloud Providers
| Provider | Models | Best For |
|---|---|---|
| OpenAI | GPT-4, GPT-4o, GPT-4o-mini | Best quality, most popular |
| Anthropic | Claude 3.5 Sonnet, Opus, Haiku | Long context, safety |
| Azure OpenAI | GPT models on Azure | Enterprise, compliance |
| Gemini Pro, Gemini Ultra | Multimodal, long context | |
| AWS Bedrock | Various | AWS ecosystem |
| Mistral AI | Mistral Large, Codestral | Code, European data |
| Groq | Llama, Mixtral | Ultra-fast inference |
| Cohere | Command R, Command R+ | Enterprise RAG |
| Perplexity AI | Online models | Real-time web data |
| Together AI | Open-source models | Model variety |
| OpenRouter | 100+ models | Model aggregation |
| Hugging Face | Custom models | Research, fine-tuned |
Local Providers
| Provider | Setup Complexity | Best For |
|---|---|---|
| Built-in Engine | None | Beginners, quick start |
| Ollama | Low | Most popular, versatile |
| LM Studio | Low | GUI-based, visual users |
| LocalAI | Medium | Docker deployments |
| KoboldCPP | Low | GGML models |
Generic OpenAI Wrapper
For any OpenAI-compatible API not explicitly integrated:
- Go to Settings > LLM Provider
- Select OpenAI (Generic)
- Enter the API endpoint and key
- Works with vLLM, text-generation-inference, and other OpenAI-compatible servers
Document Processing and RAG
How RAG Works in AnythingLLM
RAG (Retrieval-Augmented Generation) enhances LLM responses with your documents:
- Upload: Add documents to a workspace
- Process: Documents are converted to text
- Chunk: Text is split into segments (configurable size and overlap)
- Embed: Chunks are converted to vectors using your embedding model
- Store: Vectors are saved in your vector database
- Query: Your questions are embedded and matched to relevant chunks
- Respond: The LLM receives matched context and generates accurate answers
Supported Document Formats
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, TXT, Markdown, RTF |
| Data | CSV, Excel, Spreadsheets |
| Code | Python, JavaScript, TypeScript, and more |
| Media | Audio files (with transcription) |
| Web | URLs, websites (via scraping) |
Upload Process
- Create a Workspace (or use existing)
- Click Upload Documents button
- Drag and drop files or paste URLs
- Monitor embedding progress
- Start chatting with your documents
Data Connectors (Docker/Cloud)
| Connector | Description |
|---|---|
| GitHub | Import entire repositories |
| Confluence | Import Confluence pages |
| YouTube | Extract video transcripts |
| Website Crawler | Recursively scrape websites |
| Browser Extension | Send web pages directly to workspaces |
Embedding Models
Local Embeddings:
- Built-in embedder (runs on CPU, ~25MB download)
- Ollama (nomic-embed-text recommended)
- LocalAI
Cloud Embeddings:
- OpenAI (text-embedding-3-small, text-embedding-3-large)
- Azure OpenAI
- Cohere
- Voyage AI
Important: Embedding models are set system-wide. Changing embedders requires re-embedding all documents.
Vector Database Options
AnythingLLM supports multiple vector databases for different use cases.
Built-in: LanceDB (Default)
LanceDB is an embedded, serverless vector database that runs directly inside AnythingLLM—no separate server required.
Advantages:
- Zero configuration
- Scales to millions of vectors on disk
- Incredible retrieval speed
- Native reranking support
- Perfect for edge computing and desktop apps
When to use: Most users should stick with LanceDB unless they have specific requirements.
Other Vector Database Options
| Database | Type | Setup | Best For |
|---|---|---|---|
| Chroma | Local/Cloud | Server required | Rapid prototyping, flexibility |
| Pinecone | Cloud | API key | Enterprise scale, managed |
| Milvus | Self-hosted | Server required | Maximum control, large scale |
| Qdrant | Local/Cloud | Server required | Performance, advanced filtering |
| Weaviate | Local/Cloud | Server required | Semantic search |
| PGVector | PostgreSQL | Existing DB | Postgres infrastructure |
| Zilliz | Cloud | API key | Managed Milvus |
| AstraDB | Cloud | API key | Cassandra-based |
Configuring Vector Databases
- Go to Settings > Vector Database
- Select your provider
- Enter connection details (URL, API key, etc.)
- Click Save
Note: Changing vector databases requires re-embedding all documents.
AI Agents
Agents extend LLM capabilities with real-world tools and actions.
Invoking Agents
Type @agent in any chat to activate agent mode. The agent can then use enabled tools to complete complex tasks.
Built-in Agent Tools
| Tool | Description | Always Enabled |
|---|---|---|
| RAG Search | Search embedded documents | Yes |
| Summarize Documents | Condense workspace content | Yes |
| Web Browsing | Search the internet | No (requires API key) |
| Web Scraping | Extract and embed website content | No |
| Save Files | Store information to local files | No |
| List Documents | View all accessible documents | No |
| Chart Generator | Create data visualizations | No |
| SQL Connector | Query databases | No (requires config) |
Enabling Agent Tools
- Go to Settings > Agent Skills
- Toggle on desired tools
- Configure API keys for web search:
- SerpApi: Google, Amazon, Baidu, Google Maps
- SearchApi: Google, Bing, Baidu, YouTube
Model Recommendations for Agents
Not all models work well as agents. For best results:
| Recommendation | Why |
|---|---|
| Use 8B+ parameters | Better reasoning capabilities |
| Prefer 8-bit quantization | More reliable than 4-bit |
| Choose tool-calling models | Native function calling support |
Recommended models for agents:
- Llama 3.1 8B/70B
- Mistral 7B
- GPT-4o / GPT-4o-mini
- Claude 3.5 Sonnet
Agent Flows (No-Code)
Agent Flows provides a visual drag-and-drop interface for building workflows:
- Go to Agent Flows section
- Create a new flow
- Drag components onto canvas
- Connect steps and configure triggers
- Save and activate
Use cases:
- Automated document processing
- Scheduled web scraping
- Multi-step research workflows
- Custom chatbot behaviors
Custom Agent Skills
Power users can create custom agent skills with code:
- Enable Community Hub (disabled by default for security)
- Browse available skills
- Or create your own using the SDK
// Example custom skill structure
module.exports = {
name: "My Custom Skill",
description: "What this skill does",
execute: async (context, params) => {
// Your implementation
return "Result";
}
};
Workspaces
Workspaces organize your documents and conversations into isolated environments.
Creating Workspaces
- Click + New Workspace in the sidebar
- Name your workspace
- Add description (optional)
- Start uploading documents
Workspace Features
| Feature | Description |
|---|---|
| Isolated Documents | Each workspace has its own embedded content |
| Separate Chat History | Conversations are workspace-specific |
| Custom Settings | Different LLM per workspace (Docker/Cloud) |
| Prompt Templates | Customizable system prompts |
| Temperature Control | Adjust creativity per workspace |
Workspace Organization Strategies
- By Project: One workspace per client or project
- By Topic: Separate workspaces for different knowledge domains
- By Team: Workspaces scoped to specific user groups
- By Stage: Development, testing, production workspaces
Multi-User Permissions (Docker/Cloud)
| Role | Capabilities |
|---|---|
| Admin | Full access, user management, all settings |
| Manager | Workspace creation, document management |
| Default User | Chat access to assigned workspaces |
API and Embedding Options
Developer API
AnythingLLM provides a comprehensive REST API:
Endpoints include:
- Workspace management (CRUD)
- Document upload and embedding
- Chat completions
- User management (Docker/Cloud)
- System configuration
Example API call:
curl -X POST "http://localhost:3001/api/v1/workspace/my-workspace/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "What are the key findings?"}'
Chat Widget Embedding
Embed AnythingLLM as a chat widget on any website:
<script
data-embed-id="your-embed-id"
data-base-api-url="https://your-instance.com/api/embed"
src="https://your-instance.com/embed/anythingllm-chat-widget.min.js">
</script>
Customization Options:
| Attribute | Description |
|---|---|
data-position | Widget position (bottom-right, bottom-left, top-right, top-left) |
data-assistant-name | Custom assistant name |
data-assistant-icon | Custom assistant icon URL |
data-window-height | Chat window height (px, %, rem) |
data-window-width | Chat window width (px, %, rem) |
data-text-size | Chat text size in pixels |
data-username | Client identifier for logging |
data-open-on-load | Auto-open widget on page load |
Comparison: AnythingLLM vs Alternatives
AnythingLLM vs PrivateGPT vs LibreChat
| Feature | AnythingLLM | PrivateGPT | LibreChat |
|---|---|---|---|
| Primary Focus | Document chat + RAG + Agents | Offline document Q&A | ChatGPT-style multi-provider UI |
| GitHub Stars | 53,000+ | High | 33,000+ |
| LLM Providers | 30+ | Limited local | Many |
| RAG | Built-in, production-ready | Built-in | Limited/Plugin |
| AI Agents | Extensive (web, SQL, charts) | Basic | Good |
| No-Code Builder | Yes | No | No |
| Multi-User | Yes (Docker) | Limited | Yes |
| Enterprise Auth | Basic | Limited | Strong (OAuth2, SSO) |
| Best For | Document-aware AI apps | Maximum privacy offline | Enterprise chat teams |
When to Choose AnythingLLM
- Need comprehensive document chat with RAG
- Want AI agents with real tools (web, SQL, files)
- Building custom AI applications with embedding
- Need both local and cloud LLM options
- Want no-code workflow building
- Deploying for personal use or small teams
When to Choose Alternatives
Choose PrivateGPT if:
- Strict offline-only requirement
- Maximum privacy is paramount
- Simple document Q&A without extras
Choose LibreChat if:
- Need enterprise authentication (OAuth2, LDAP, SSO)
- Want ChatGPT-like interface
- Large team with complex permissions
- Primary use is chat, not document RAG
Use Cases and Industry Applications
Industry Applications
| Industry | Use Cases |
|---|---|
| Legal | Contract analysis, case research, document review |
| Finance | Report analysis, compliance Q&A, market research |
| Healthcare | Medical literature, patient education, protocols |
| Education | Course material Q&A, research assistance, tutoring |
| Software | Code documentation, technical specs, API docs |
| HR | Policy Q&A, onboarding docs, employee handbook |
| Sales | Product knowledge, competitor analysis, proposals |
Common Workflows
Document Q&A:
- Upload company documents to workspace
- Ask questions in natural language
- Get answers with source citations
Research Assistant:
- Enable web browsing agent
- Upload existing research papers
- Ask agent to research and compare
Meeting Intelligence:
- Upload meeting transcripts (or record in Desktop)
- Ask for summaries, action items, decisions
- Search across all meetings
Customer Support:
- Upload product documentation
- Create support workspace
- Embed chat widget on website
- Customers get instant accurate answers
Pricing
Self-Hosted (Free)
| Feature | Included |
|---|---|
| Desktop App | Free forever |
| Docker Deployment | Free forever |
| All features | Full access |
| Updates | Community releases |
| Support | GitHub issues, Discord |
Cloud Hosted
| Tier | Price | Features |
|---|---|---|
| Starter | $50/month | Up to 3 users, 100 documents, private instance |
| Pro | $99/month | Larger teams, 72-hour support SLA |
| Enterprise | Custom | White-glove service, on-premise support, SLA |
Troubleshooting
Ollama Connection Issues
Problem: "Cannot connect to Ollama"
Solutions:
- Ensure Ollama is running:
ollama serve - Check the URL is correct for your deployment type
- For Docker, verify network connectivity
- Check if firewall is blocking port 11434
Documents Not Embedding
Problem: Documents stuck in processing
Solutions:
- Check embedding model is configured
- Verify vector database is running
- Check available disk space
- Review error logs in console
Agent Not Working
Problem: Agent commands not executing
Solutions:
- Verify agent tools are enabled in settings
- Check model supports tool calling
- Ensure API keys are configured for web tools
- Try a larger model (8B+ parameters)
Performance Issues
Problem: Slow response times
Solutions:
- Use smaller/quantized models
- Reduce chunk size for documents
- Limit context window in workspace settings
- Consider cloud LLM for faster inference
Key Takeaways
- AnythingLLM is the most comprehensive all-in-one local AI platform
- Built-in RAG makes document chat simple with zero configuration
- 30+ LLM providers give maximum flexibility—local or cloud
- AI agents extend capabilities with web, SQL, and file tools
- No-code builder enables workflow automation without programming
- Works 100% offline with Ollama or built-in engine
- Free and open-source for self-hosting—paid cloud available
Next Steps
- Set up Ollama for local models
- Learn RAG in depth
- Compare vector databases for your use case
- Explore AI agents capabilities
AnythingLLM is the most complete local AI platform available—combining RAG, agents, and flexibility in one polished package. Whether you're building private document chat for personal use or deploying AI-powered applications for your organization, AnythingLLM provides the tools you need without compromising on privacy or control.
Ready to start your AI career?
Get the complete roadmap
Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!