Local AI + Obsidian: A Second Brain That Thinks
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Local AI + Obsidian: Build a Second Brain That Actually Thinks
Published on April 11, 2026 — 20 min read
I have 8,400 notes in my Obsidian vault spanning 6 years of engineering work, research, meeting notes, and project documentation. Finding connections across that volume is impossible manually. Adding local AI changed how I use the entire system — I can now ask "what did I learn about database sharding across all my projects" and get a synthesized answer drawn from 40+ scattered notes, without any of that knowledge leaving my machine.
This guide covers three Obsidian plugins that connect to Ollama for local AI capabilities: Smart Connections for semantic search, Copilot for conversational Q&A, and Text Generator for AI writing within notes. Each plugin serves a different purpose, and I will explain exactly when to use which.
Why Local AI for Your Notes {#why-local}
Your Obsidian vault is a record of your thinking. It contains research notes, journal entries, project retrospectives, startup ideas, salary negotiations, health tracking — the most personal data you produce.
Cloud AI plugins for Obsidian send your note content to OpenAI or Anthropic servers for processing. Read the fine print on their data retention policies. Even with opt-out, your notes traverse their infrastructure.
With Ollama running locally, every embedding, every query, every AI interaction stays on your machine. Your thoughts remain yours.
The performance argument is strong too. Local embeddings with nomic-embed-text process 10,000 notes in about 2 minutes. Semantic search returns results in under 100ms. That is faster than cloud roundtrips.
Prerequisites: Ollama Setup {#prerequisites}
Before configuring any Obsidian plugin, get Ollama running with the right models:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull models for different tasks
ollama pull llama3.2 # Chat and Q&A (4.7 GB)
ollama pull nomic-embed-text # Embeddings for semantic search (274 MB)
ollama pull qwen2.5:7b # Writing assistance (4.4 GB)
# Verify Ollama is running
curl http://localhost:11434/api/tags
Make sure Ollama is running before you open Obsidian. On macOS, it starts automatically as a background service. On Linux, run ollama serve or enable the systemd service.
For a full Ollama installation walkthrough including model management, see our complete Ollama guide. For embedding-specific details, the local AI embeddings guide covers model selection and optimization.
Plugin 1: Smart Connections — Semantic Search {#smart-connections}
Smart Connections is the most useful AI plugin for Obsidian. It embeds every note in your vault and enables semantic search — finding notes by meaning rather than exact keywords.
Installation and Setup
- Open Obsidian Settings → Community Plugins → Browse
- Search "Smart Connections" → Install → Enable
- Open Smart Connections settings:
- Embedding Model: Select "Local" and enter the Ollama API URL
- Chat Model: Select "Ollama" as the platform
Configure the Ollama connection:
| Setting | Value |
|---|---|
| Embedding Platform | Ollama (Local) |
| Embedding Model | nomic-embed-text |
| Embedding API URL | http://localhost:11434 |
| Chat Platform | Ollama |
| Chat Model | llama3.2 |
- Click "Force Refresh: Embed All" to start the initial embedding
Initial Embedding: What to Expect
Smart Connections will process every note in your vault. On my 8,400-note vault:
| Vault Size | Embedding Time | Disk Space |
|---|---|---|
| 1,000 notes | ~15 seconds | 12 MB |
| 5,000 notes | ~70 seconds | 58 MB |
| 10,000 notes | ~2.5 minutes | 115 MB |
| 20,000 notes | ~5 minutes | 230 MB |
nomic-embed-text on Apple M2 Pro. GPU systems are faster.
Embeddings are cached on disk. After the initial pass, only new or modified notes are re-embedded — usually under 1 second.
Using Smart Connections
Once embedded, the plugin adds two powerful features:
1. Smart View Panel (Sidebar)
Open it from the command palette: "Smart Connections: View Smart Connections." For any note you are reading, the sidebar shows the most semantically related notes across your entire vault.
This is where the magic is. I was reading my notes on "distributed consensus" and the sidebar surfaced a 3-year-old note about a Raft implementation I had forgotten about, a meeting note where a colleague explained their approach to leader election, and a book highlight from "Designing Data-Intensive Applications." None of these shared keywords with my current note. Semantic search found them by meaning.
2. Smart Chat
Click the chat icon to talk with your vault. Example queries:
- "What have I written about database indexing strategies?"
- "Summarize my notes from the Q1 architecture review"
- "Find connections between my notes on team management and those on system reliability"
- "What decisions did we make about the auth system redesign?"
Smart Chat uses RAG (retrieval-augmented generation) internally: it finds the most relevant notes via embedding similarity, then passes them to the LLM for synthesis.
Optimizing Smart Connections
Exclude folders that add noise. Templates, daily notes with no content, and attachment folders dilute search quality:
In Smart Connections settings, add excluded folders:
- Templates/
- Attachments/
- .obsidian/
Embedding model matters. nomic-embed-text (768 dimensions) provides the best balance of quality and speed for Obsidian vaults. mxbai-embed-large (1024 dimensions) is slightly more accurate but 3x slower to embed. For most vaults, nomic-embed-text is sufficient.
Plugin 2: Obsidian Copilot — Chat With Your Vault {#copilot}
Copilot provides a dedicated chat interface within Obsidian, similar to ChatGPT but powered by your local models and grounded in your notes.
Installation and Setup
- Settings → Community Plugins → Browse → "Copilot" → Install → Enable
- Open Copilot settings:
- Default Model Provider: Ollama
- Ollama URL: http://localhost:11434
- Default Model: llama3.2
- Embedding Model: nomic-embed-text
How Copilot Differs from Smart Connections Chat
Smart Connections Chat is optimized for vault-wide semantic search. Copilot is optimized for conversational interaction — it maintains context across a multi-turn conversation.
| Feature | Smart Connections | Copilot |
|---|---|---|
| Primary use | Find related notes | Interactive Q&A |
| Context window | Per-query retrieval | Multi-turn conversation |
| Note creation | No | Yes (create notes from chat) |
| Prompt templates | Basic | Extensive library |
| Best for | "What notes relate to X?" | "Help me think through X" |
Real Workflows with Copilot
Weekly Review Generation
Open Copilot and type: "Review my daily notes from this week and create a weekly summary. Highlight what I accomplished, what's still pending, and what themes emerged."
Copilot pulls your daily notes (assuming you use a consistent daily note naming pattern like YYYY-MM-DD), synthesizes them, and generates a structured weekly review. You can then save this as a new note directly from the chat.
Research Synthesis
"Summarize everything I have written about React Server Components across all my notes and blog drafts. Identify gaps in my understanding."
This works because Copilot retrieves relevant notes via embeddings, includes them as context, and asks the LLM to synthesize. On a vault with 50+ notes mentioning RSC, it produces a coherent summary in about 8 seconds.
Outline Generation from Research
"Based on my research notes tagged #llm-inference, create a blog post outline about optimizing LLM inference for production."
Copilot finds tagged notes, extracts key points, and structures them into an outline. This single command replaced 30 minutes of manual note-reviewing.
Plugin 3: Text Generator — AI Writing Inside Notes {#text-generator}
Text Generator operates directly within the note editor. Select text, trigger the plugin, and the AI expands, summarizes, rewrites, or continues your writing. It is the closest thing to having a writing partner inside Obsidian.
Installation and Setup
- Settings → Community Plugins → Browse → "Text Generator" → Install → Enable
- Open Text Generator settings:
- Provider: Ollama
- Ollama Base URL: http://localhost:11434
- Model: qwen2.5:7b (or llama3.2 for general use)
- Max Tokens: 1024
Commands
Text Generator adds several commands accessible via hotkeys or the command palette:
| Command | What It Does |
|---|---|
| Generate Text | Continue writing from cursor position |
| Generate (selection) | Process selected text with a prompt |
| Summarize | Condense selected text |
| Expand | Elaborate on selected text |
| Rewrite | Rephrase selected text |
| Extract | Pull out key points from selected text |
My Most-Used Workflows
Turn bullet points into prose. I take meeting notes as rapid bullet points, then select them and run "Expand" to generate full paragraphs for documentation.
Summarize long articles. When I clip web articles into Obsidian, I select the full text and run "Summarize" to get a 3-paragraph overview at the top of the note.
Generate follow-up questions. After writing project notes, I select the content and use a custom prompt: "What follow-up questions should I be asking about this?" This surfaces blind spots I would have missed.
Custom Prompt Templates
Text Generator supports custom templates. Here are the ones I use daily:
Create a templates folder and add these as .md files:
templates/meeting-action-items.md:
Extract action items from these meeting notes. Format as:
- [ ] Task description — Owner — Deadline (if mentioned)
Only include concrete action items, not discussion points.
{{selection}}
templates/note-connections.md:
Given this note, suggest 5 concepts or topics I should create links to.
For each suggestion, explain why the connection matters in one sentence.
{{selection}}
templates/eli5.md:
Explain the following concept as if I were explaining it to a colleague
who has no background in this area. Use analogies where helpful.
{{selection}}
Embedding Your Vault: Technical Details {#embeddings}
Understanding how embeddings work helps you optimize search quality. Every note gets converted to a numerical vector — a list of 768 numbers (for nomic-embed-text) that captures the meaning of the text.
Model Comparison for Obsidian Vaults
| Model | Dimensions | Speed (10K notes) | Search Quality | Size |
|---|---|---|---|---|
| nomic-embed-text | 768 | 2 min | Very good | 274 MB |
| mxbai-embed-large | 1024 | 6 min | Excellent | 670 MB |
| all-minilm | 384 | 1 min | Good | 46 MB |
| snowflake-arctic-embed-m | 768 | 3 min | Very good | 436 MB |
nomic-embed-text is the default recommendation. It runs fast, produces high-quality vectors, and works well with Smart Connections out of the box. mxbai-embed-large is marginally better for nuanced academic or technical content, but 3x slower to embed.
For a thorough comparison of embedding models and strategies, see our local AI embeddings guide.
How Embedding Size Affects Search
Larger embedding dimensions capture more nuance but require more disk space and slightly slower similarity computation:
| Vault Size | 384-dim | 768-dim | 1024-dim |
|---|---|---|---|
| 1,000 notes | 1.5 MB | 3 MB | 4 MB |
| 10,000 notes | 15 MB | 30 MB | 40 MB |
| 50,000 notes | 75 MB | 150 MB | 200 MB |
Even at 50,000 notes, the disk overhead is negligible. Search speed remains under 100ms regardless of dimension count because modern similarity search algorithms (used by Smart Connections) are heavily optimized.
Real Workflows: What This Looks Like in Practice {#workflows}
"Find Everything About [Topic]"
I type into Smart Connections Chat: "What do I know about event-driven architecture?"
The system retrieves 12 relevant notes spanning 4 years: a conference talk summary, two project retrospective notes, a book highlight from "Building Event-Driven Microservices," three meeting notes from a migration project, and five scattered implementation notes.
The LLM synthesizes them into a 4-paragraph answer that connects ideas I had never linked manually. It even identifies a contradiction between my 2022 and 2024 notes about saga patterns — I changed my mind and forgot.
"Find Connections Between [A] and [B]"
"What connections exist between my notes on team burnout and my notes on deployment frequency?"
Smart Connections surfaces 6 notes. The AI response identifies a pattern I had not noticed: three of my retrospective notes mention both increased deployment pressure and team morale issues within the same 2-month windows. That is a genuine insight I would not have found without semantic search.
"Generate a Weekly Review from Daily Notes"
My daily notes follow a format: what I did, what I learned, blockers, mood. Copilot's weekly review prompt:
"Synthesize my daily notes from 2026-04-07 to 2026-04-11. Group by themes. Highlight patterns in blockers. Note any recurring topics."
Output: a structured weekly review with accomplishment categories, a pattern analysis showing I was blocked by the same integration test 3 out of 5 days, and a note that "infrastructure" appeared as a topic every day — signaling it deserves a dedicated focus block.
"Draft an Outline from Research Notes"
"Based on all my notes tagged #rag-pipeline, create a tutorial outline for building a RAG system."
The output is a 12-section outline with specific subtopics drawn from my actual notes. Sections I had not considered appear because the AI found relevant points in notes I had tagged differently or forgotten about.
Advanced: Link Prediction {#link-prediction}
Smart Connections can suggest links you should create but have not. When viewing a note, the sidebar shows related notes. Any note in that list without an existing link is a link suggestion.
To make this systematic:
"""
link_predictor.py
Suggest missing connections in your Obsidian vault.
Requires: pip install ollama numpy
"""
import json
import ollama
from pathlib import Path
VAULT_PATH = Path.home() / "Documents" / "ObsidianVault"
def get_all_notes() -> dict:
"""Read all markdown files from the vault."""
notes = {}
for md_file in VAULT_PATH.rglob("*.md"):
relative = str(md_file.relative_to(VAULT_PATH))
notes[relative] = md_file.read_text(encoding="utf-8", errors="ignore")
return notes
def get_embedding(text: str) -> list:
"""Get embedding vector from Ollama."""
response = ollama.embeddings(model="nomic-embed-text", prompt=text[:8000])
return response["embedding"]
def cosine_similarity(a: list, b: list) -> float:
"""Compute cosine similarity between two vectors."""
import numpy as np
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
def find_missing_links(top_n: int = 10):
"""Find note pairs with high similarity but no existing link."""
notes = get_all_notes()
print(f"Processing {len(notes)} notes...")
# Compute embeddings
embeddings = {}
for i, (path, content) in enumerate(notes.items()):
if i % 100 == 0:
print(f" Embedding {i}/{len(notes)}...")
embeddings[path] = get_embedding(content[:2000])
# Extract existing links
existing_links = set()
for path, content in notes.items():
# Find [[wiki-links]]
import re
links = re.findall(r'\[\[([^\]]+)\]\]', content)
for link in links:
target = link.split("|")[0] + ".md" # Handle aliases
existing_links.add((path, target))
# Find high-similarity pairs without links
suggestions = []
paths = list(embeddings.keys())
for i in range(len(paths)):
for j in range(i + 1, len(paths)):
if (paths[i], paths[j]) in existing_links:
continue
if (paths[j], paths[i]) in existing_links:
continue
sim = cosine_similarity(embeddings[paths[i]], embeddings[paths[j]])
if sim > 0.75: # High similarity threshold
suggestions.append((paths[i], paths[j], sim))
suggestions.sort(key=lambda x: x[2], reverse=True)
print(f"\nTop {top_n} suggested connections:")
for a, b, sim in suggestions[:top_n]:
print(f" {sim:.3f} | {a} <-> {b}")
if __name__ == "__main__":
find_missing_links()
python link_predictor.py
# Processing 8400 notes...
# Embedding 0/8400...
# Embedding 100/8400...
# ...
# Top 10 suggested connections:
# 0.912 | projects/auth-redesign.md <-> notes/oauth-security-patterns.md
# 0.887 | books/ddia-ch5.md <-> projects/replication-strategy.md
# ...
Automatic Tagging {#auto-tagging}
Use Ollama to automatically suggest tags for untagged notes:
import ollama
from pathlib import Path
def suggest_tags(note_content: str, existing_tags: list) -> list:
"""Suggest tags for a note based on content and existing tag vocabulary."""
response = ollama.chat(
model="llama3.2",
messages=[{
"role": "user",
"content": f"""Given this note content, suggest 3-5 relevant tags.
Use ONLY tags from this existing list when possible: {', '.join(existing_tags[:100])}
If no existing tag fits, suggest a new one.
Return only the tags, comma-separated, with # prefix.
Note content:
{note_content[:3000]}
Tags:"""
}],
options={"temperature": 0.2}
)
return response["message"]["content"].strip()
# Example usage
vault = Path.home() / "Documents" / "ObsidianVault"
existing_tags = ["#python", "#architecture", "#meeting", "#book", "#project",
"#devops", "#database", "#frontend", "#security"]
# Find notes without tags
for md_file in vault.rglob("*.md"):
content = md_file.read_text(encoding="utf-8", errors="ignore")
if not any(line.startswith("tags:") for line in content.split("\n")[:10]):
tags = suggest_tags(content, existing_tags)
print(f"{md_file.name}: {tags}")
Spaced Repetition from Notes {#spaced-repetition}
Turn your notes into flashcards automatically. This pairs well with the Obsidian Spaced Repetition plugin:
import ollama
from pathlib import Path
def generate_flashcards(note_path: str) -> str:
"""Generate spaced repetition flashcards from a note."""
content = Path(note_path).read_text()
response = ollama.chat(
model="llama3.2",
messages=[{
"role": "user",
"content": f"""Create 5 flashcards from this note for spaced repetition.
Use this exact format for each card (compatible with Obsidian SR plugin):
Question::Answer
Make questions specific and testable. Avoid yes/no questions.
Note content:
{content[:4000]}"""
}],
options={"temperature": 0.3}
)
return response["message"]["content"]
# Generate flashcards for a study note
cards = generate_flashcards("~/Documents/ObsidianVault/notes/distributed-consensus.md")
print(cards)
# Output:
# What are the three phases of the Paxos consensus algorithm?::Prepare, Promise, Accept
# What is the minimum quorum size for a 5-node Raft cluster?::3 nodes (majority)
# ...
Performance and Resource Usage {#performance}
Running Ollama alongside Obsidian uses these resources:
| Activity | RAM Usage | GPU/CPU Usage | Duration |
|---|---|---|---|
| Ollama idle | 200 MB | 0% | Continuous |
| Initial vault embedding (10K notes) | 1.2 GB | 80% GPU | ~2.5 min |
| Incremental embedding (1 note) | 500 MB | 30% GPU | <1 sec |
| Semantic search query | 500 MB | 20% GPU | <100 ms |
| Chat response (llama3.2) | 5 GB | 90% GPU | 3-8 sec |
| Text generation (qwen2.5:7b) | 5 GB | 90% GPU | 2-5 sec |
On an M1 MacBook Air with 8 GB, you can run nomic-embed-text for search plus llama3.2 for chat simultaneously. On 16 GB, add qwen2.5:7b for text generation without swapping.
Tip: If memory is tight, Ollama automatically loads and unloads models as needed. Only the model being actively queried occupies VRAM. Embedding queries use much less memory than chat queries.
For a broader look at running multiple models efficiently, see our free local AI models guide.
Privacy: What Stays Where {#privacy}
With this setup:
- Your notes: Never leave your machine. Stored in your vault folder as plain .md files.
- Embeddings: Stored locally in the Smart Connections cache within your vault's .obsidian folder.
- AI processing: All inference runs on your local Ollama instance. No API calls to external servers.
- Plugin telemetry: Smart Connections and Copilot have optional telemetry you can disable in settings. Text Generator has no telemetry.
The only network request is Ollama model downloads (one-time). After that, disconnect from the internet and everything still works. This is the correct architecture for a personal knowledge base.
Troubleshooting {#troubleshooting}
"Connection refused" When Plugin Contacts Ollama
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not running, start it
ollama serve
# If port conflict, check what is using 11434
lsof -i :11434
Slow Embedding on Large Vaults
If the initial embedding takes too long, switch to a smaller model temporarily:
# all-minilm embeds 3x faster (lower quality but good for initial setup)
ollama pull all-minilm
# Change Smart Connections embedding model to all-minilm
# Re-embed with nomic-embed-text later
Chat Responses Are Too Slow
Use a smaller chat model or reduce context:
# Faster alternative to llama3.2
ollama pull phi3:3.8b # 2.3 GB, responds in 1-2 seconds
Search Returns Irrelevant Results
Usually caused by:
- Too many template/empty notes — Exclude template folders in Smart Connections settings
- Very short notes — Notes under 50 words embed poorly. Add more context.
- Wrong embedding model — Make sure nomic-embed-text is selected, not a general LLM
Frequently Asked Questions
Q: Can I use this setup without a GPU?
A: Yes. nomic-embed-text runs efficiently on CPU — embedding 10K notes takes about 5 minutes on an 8-core processor. Chat responses with llama3.2 take 10-15 seconds on CPU versus 3-5 seconds on GPU. Apple Silicon Macs perform between CPU and discrete GPU.
Q: Does embedding my vault send data to the internet?
A: No. When using Ollama as the embedding provider, all computation happens locally. The only internet usage is the one-time model download. After that, you can work completely offline.
Q: How much disk space do embeddings need?
A: About 3 MB per 1,000 notes with nomic-embed-text (768 dimensions). A 10,000-note vault needs roughly 30 MB. This is stored in your .obsidian/plugins/smart-connections/ folder.
Q: Which plugin should I install first?
A: Smart Connections. It provides the most value with the least configuration — semantic search across your vault is transformative. Add Copilot second if you want conversational interaction. Text Generator last if you need inline writing assistance.
Q: Can I use different models for different plugins?
A: Yes, and you should. Use nomic-embed-text for embeddings (all plugins), llama3.2 for chat (Smart Connections, Copilot), and qwen2.5:7b for text generation (Text Generator). Each model is optimized for its specific task.
Q: Will this slow down Obsidian?
A: Minimally. The plugins load lazily — they only contact Ollama when you actively use an AI feature. Background re-embedding of modified notes uses negligible resources. The main impact is during initial embedding of a large vault, which is a one-time operation.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!