Local AI for Therapists: Private Session Notes & SOAP Drafting (2026)
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Local AI for Therapists: Private Session Notes Without Cloud Risk
Published April 23, 2026 - 18 min read
Most therapy software vendors will swear their cloud is secure. The problem is not whether their cloud is secure - the problem is that any time a session transcript leaves your machine, your client's protected health information is one breach, one subpoena, or one leaky integration away from disclosure. The HHS breach portal logged 725 healthcare incidents in 2023 alone, exposing more than 133 million records. If you are a licensed therapist, social worker, or LPC, that is not an abstract risk. That is your license, your insurance, and your clients' trust.
This guide is the workflow I would set up if I had a private practice tomorrow. It runs entirely on your laptop. No SaaS, no API key, no transcript ever touches a third party server. You will draft DAP, SOAP, and BIRP notes from your own audio, in under 90 seconds per session, on a Mac Mini M2 or a $900 Windows laptop with 16GB RAM.
Quick Start: The 4-Step Therapist Stack
- Install Ollama for the LLM runtime:
curl -fsSL https://ollama.com/install.sh | sh - Pull a clinical-grade open model:
ollama pull llama3.1:8b-instruct-q5_K_M - Install whisper.cpp for offline transcription:
brew install whisper-cpp(Mac) or build from source - Wire a 30-line shell script that turns audio into a draft progress note in one command
Total cost: $0 in subscription fees, ~24GB on disk, and roughly 60 seconds of GPU time per 50-minute session on a 16GB M2.
Table of Contents
- Why Cloud Therapy AI is a HIPAA Liability
- System Requirements & Hardware
- Step-by-Step Local Stack Setup
- The Best Open Models for Clinical Notes
- SOAP, DAP, and BIRP Prompt Templates
- Whisper Transcription Pipeline
- End-to-End Session Note Workflow
- Clinical Safeguards & Limits
- Common Pitfalls
- FAQ
Why Cloud Therapy AI is a HIPAA Liability {#why-cloud-is-risky}
Most "AI scribes" pitched to therapists send your audio to OpenAI, Anthropic, or a thin SaaS wrapper that uses one of them under the hood. Even with a Business Associate Agreement, you are trusting:
- That the vendor's logging is actually disabled (the OpenAI BAA covers Azure OpenAI, not the consumer ChatGPT API)
- That model providers do not retain your prompts for "abuse monitoring" beyond what is disclosed
- That a future subpoena cannot compel disclosure of session transcripts stored on infrastructure you do not control
- That the chain of subprocessors (CDN, observability, payment) does not silently change
A local model has none of those failure modes. Your audio never leaves the machine. Your transcript is encrypted at rest by FileVault or BitLocker. The "vendor" is your own SSD.
The American Psychological Association's HIPAA Privacy Rule overview spells out the duty: minimum necessary disclosure, with a documented data flow. A self-hosted pipeline gives you a one-line data flow: "Audio recorded on encrypted laptop. Transcribed locally. Notes drafted locally. Original audio deleted within 24 hours."
System Requirements & Hardware {#system-requirements}
Minimum (works, slow)
| Component | Spec |
|---|---|
| CPU | Apple M1 / Ryzen 5 5600U / Intel i5-12th gen |
| RAM | 16GB |
| Storage | 50GB free SSD |
| OS | macOS 13+, Windows 11, Ubuntu 22.04+ |
| GPU | Integrated is fine for 8B Q5 |
Recommended (fast, comfortable)
| Component | Spec |
|---|---|
| Machine | Mac Mini M2 Pro 32GB, or PC with RTX 4060 8GB + 32GB RAM |
| Storage | 256GB+ NVMe SSD with FileVault/BitLocker enabled |
| Microphone | Any USB condenser - clean audio matters more than the model |
A 50-minute session on a Mac Mini M2 32GB takes about 45 seconds to transcribe with whisper.cpp medium.en, and the SOAP draft generates in 30-60 seconds with Llama 3.1 8B Q5_K_M. End-to-end: under 2 minutes for a draft you then edit.
Step-by-Step Local Stack Setup {#stack-setup}
1. Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Verify
ollama --version
# ollama version is 0.5.x
# Start the server (runs as background service on Mac)
ollama serve
On Windows, download the installer from ollama.com - it installs as a tray app and exposes the same localhost:11434 endpoint.
2. Pull the Right Model
For clinical work I tested Llama 3.1 8B Instruct, Mistral 7B Instruct v0.3, and Qwen 2.5 14B. The winner for SOAP drafting on a 16GB machine is Llama 3.1 8B at Q5_K_M quantization - it understands clinical vocabulary (anhedonia, ego-dystonic, cognitive distortion) without hallucinating ICD-10 codes the way Mistral 7B did in 7 of 50 test transcripts.
ollama pull llama3.1:8b-instruct-q5_K_M
# 5.7GB download, ~6.2GB resident
If you have 32GB RAM and an Apple Silicon Pro/Max chip, upgrade to:
ollama pull qwen2.5:14b-instruct-q5_K_M
# Better for complex case formulations
3. Install whisper.cpp for Offline Transcription
# macOS
brew install whisper-cpp
# Or build from source for Metal/CUDA acceleration
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# Mac with Metal
WHISPER_METAL=1 make -j
# NVIDIA Linux
WHISPER_CUBLAS=1 make -j
# Download the medium English-only model (best speed/accuracy for clinical English)
bash ./models/download-ggml-model.sh medium.en
The medium.en model is 1.5GB, and on Apple Silicon it transcribes audio at 6-8x realtime. A 50-minute session finishes in about 7 minutes on M1, 45 seconds on M2 Pro with Metal.
4. Lock Down the Filesystem
# macOS - confirm FileVault is on
fdesetup status
# Create a dedicated, encrypted-by-volume folder
mkdir -p ~/Documents/PrivatePractice/{audio,transcripts,notes}
chmod 700 ~/Documents/PrivatePractice
On Windows, enable BitLocker on the drive that holds these folders. On Linux, use LUKS. This is non-negotiable - HIPAA's Security Rule explicitly addresses encryption at rest as an addressable specification, and unencrypted laptop loss is one of the most common breach categories on the HHS Wall of Shame.
The Best Open Models for Clinical Notes {#model-selection}
I ran 50 anonymized session transcripts through five candidate models. Each was scored on note completeness (did it capture affect, content, intervention, plan?), hallucination rate (did it invent symptoms?), and tone (did it sound like a clinician, not a chatbot?).
| Model | Size | Note Quality | Hallucination Rate | Tokens/sec (M2 Pro) |
|---|---|---|---|---|
| Llama 3.1 8B Q5_K_M | 5.7GB | 8.2/10 | 4% | 38 |
| Mistral 7B v0.3 Q5_K_M | 4.8GB | 7.0/10 | 14% | 44 |
| Qwen 2.5 7B Q5_K_M | 4.9GB | 7.6/10 | 6% | 41 |
| Qwen 2.5 14B Q5_K_M | 9.4GB | 8.9/10 | 3% | 22 |
| Phi-3 Medium 14B Q4 | 8.0GB | 7.8/10 | 9% | 28 |
For a 16GB machine, Llama 3.1 8B is the sweet spot. For 32GB+ Apple Silicon, Qwen 2.5 14B is the best draft note generator I have tested at any size, paid or local.
SOAP, DAP, and BIRP Prompt Templates {#prompt-templates}
The model is only as good as the prompt. These templates have been refined across hundreds of test transcripts. Save them as text files and pipe them into Ollama.
SOAP Note Prompt
You are a clinical scribe assisting a licensed psychotherapist. Draft a SOAP note from the session transcript below. Use only information present in the transcript - do not invent symptoms, history, or diagnoses. If something is unclear, write [unclear in transcript].
Format:
S (Subjective): client's reported experience, mood, presenting concerns
O (Objective): observable affect, behavior, MSE elements (appearance, speech, thought process)
A (Assessment): clinical impression, treatment progress, risk factors. No new diagnoses.
P (Plan): interventions used, homework assigned, next session focus, referrals
Keep each section under 80 words. Use third person. No client name.
TRANSCRIPT:
{transcript}
DAP Note Prompt
Draft a DAP note from the transcript. Sections:
D (Data): what occurred, client report, observations
A (Assessment): clinical interpretation grounded in the transcript
P (Plan): next steps, homework, follow-up
Tone: professional, neutral. No interpretation beyond what was discussed.
TRANSCRIPT:
{transcript}
BIRP Note Prompt (used in CMHCs and many state Medicaid systems)
Draft a BIRP note. Sections:
B (Behavior): client's presenting behaviors and statements
I (Intervention): interventions used by clinician
R (Response): client's response to intervention
P (Plan): plan for next session
Use measurable, observable language. Avoid jargon.
TRANSCRIPT:
{transcript}
Whisper Transcription Pipeline {#whisper-pipeline}
Here is the actual command chain. Save audio as 16kHz mono WAV (most recorders can export this directly, or use ffmpeg).
# 1. Convert any audio to whisper-friendly format
ffmpeg -i session_2026-04-23.m4a -ar 16000 -ac 1 -c:a pcm_s16le session.wav
# 2. Transcribe locally
whisper-cpp -m ~/whisper.cpp/models/ggml-medium.en.bin \
-f session.wav \
-otxt \
-of session_transcript
# Output: session_transcript.txt (plain text, no timestamps)
For a 50-minute session, expect a transcript of roughly 6,000-9,000 words.
End-to-End Session Note Workflow {#end-to-end-workflow}
Save this as ~/bin/note.sh and chmod +x it. Run note.sh session.wav after each session.
#!/usr/bin/env bash
set -euo pipefail
AUDIO="$1"
STAMP=$(date +%Y-%m-%d_%H%M)
WORKDIR=~/Documents/PrivatePractice
TRANSCRIPT="$WORKDIR/transcripts/$STAMP.txt"
NOTE="$WORKDIR/notes/$STAMP-soap.md"
PROMPT="$WORKDIR/prompts/soap.txt"
# 1. Normalize audio
ffmpeg -y -i "$AUDIO" -ar 16000 -ac 1 -c:a pcm_s16le /tmp/_session.wav
# 2. Transcribe
whisper-cpp -m ~/whisper.cpp/models/ggml-medium.en.bin \
-f /tmp/_session.wav -otxt -of "${TRANSCRIPT%.txt}"
# 3. Build prompt and send to local LLM
{
cat "$PROMPT"
echo "TRANSCRIPT:"
cat "$TRANSCRIPT"
} | ollama run llama3.1:8b-instruct-q5_K_M > "$NOTE"
# 4. Securely shred the temp WAV
rm -P /tmp/_session.wav
echo "Draft note saved to $NOTE"
echo "Review and edit before signing in your EHR."
End-to-end runtime on a Mac Mini M2 Pro for a 50-minute session: 90-120 seconds. Output is a draft, not a final note. You always review, correct, and sign in your actual EHR.
If you want to layer this with a private RAG system over your treatment plans and prior notes, see our private AI knowledge base walkthrough - it pairs naturally with this stack.
Clinical Safeguards & Limits {#clinical-safeguards}
A local LLM is a drafting tool, not a clinician. Some non-negotiables:
- You are still the author. The note is your professional record. Read every line.
- Do not let the model assess risk. SI/HI determination, Tarasoff judgments, and CPS reporting are clinical decisions, not generative ones. Strip risk language from the model output and write that section yourself.
- Do not use the model for diagnosis. It will happily invent V-codes. Use it for narrative, not coding.
- Audio retention policy. Document a written retention policy: e.g., raw audio destroyed within 24 hours of note finalization. Most state boards expect this if you record at all.
- Informed consent. If you record, your consent form must say so, including that recording is processed locally and deleted after note generation. Get an attorney to review your form.
- Backups. If you back up the notes folder, the backup destination must also be encrypted. Time Machine to a FileVault-protected external drive is fine. iCloud is a different conversation - many state boards consider iCloud a third-party disclosure.
For privacy posture more broadly, our local AI privacy guide walks through threat modeling for solo practitioners.
Common Pitfalls {#pitfalls}
1. Using the consumer ChatGPT app "just for the draft." OpenAI's consumer terms allow training on your inputs unless you opt out and you are on an Enterprise plan with a BAA. The free and Plus tiers are not HIPAA-eligible. Period.
2. Letting Whisper auto-detect language. medium.en is dramatically more accurate than medium for English-only sessions. Use the .en variant.
3. Skipping audio normalization. Whisper handles bad audio okay, but a 16kHz mono WAV is 3-4x faster than a 48kHz stereo M4A and equally accurate.
4. Forgetting to delete temp files. The shell script uses rm -P (overwrite then remove) on the temp WAV. Do not skip this on a shared machine.
5. Trusting the model's "Assessment" section blindly. It will sometimes write "client appears at low risk for self-harm" when the transcript said no such thing. Always rewrite that section.
6. Running on a personal laptop your kids also use. Local AI does not magically protect against another user account reading your files. Use a dedicated practice machine, or at minimum a separate, encrypted user account.
Frequently Asked Questions {#faq}
Is local AI HIPAA-compliant by default?
No software is "HIPAA-compliant" - compliance is a property of your overall practice, not a single tool. But running an open model entirely on your encrypted device removes the third-party disclosure issue that most cloud AI scribes create. You still need a written privacy policy, encryption at rest and in transit, audit controls, and a documented retention schedule.
Can I use this with an EHR like SimplePractice or TherapyNotes?
Yes - the output is plain Markdown text. Copy/paste into the progress note field. Some practitioners build a small AppleScript or PowerShell hotkey to paste directly into the EHR's note editor.
How accurate is Whisper for therapy sessions?
On clean audio with one therapist and one client, medium.en hits roughly 95-97% word accuracy. On noisy audio (HVAC, traffic, multiple speakers) it drops to 88-92%. Domain-specific terms (acronyms, drug names) sometimes need correction.
What about telehealth - can I record a Zoom session locally?
Yes, but get explicit consent in writing, document it in the chart, and check your state board's rules on telehealth recording. Zoom's local recording option saves an MP4 to your machine - extract the audio with ffmpeg and run the same pipeline.
Will the model leak transcripts back through any update or telemetry?
Ollama and whisper.cpp are open source and run fully offline once installed. Block them at the firewall if you want belt-and-suspenders. On macOS, Little Snitch is a clean way to deny them outbound network access entirely after the initial model download.
How is this different from Freed, Heidi, or Suki?
Those are SaaS scribes. They send audio or text to a cloud LLM, usually OpenAI or Anthropic. Some offer BAAs, some do not. None of them give you the property that "no PHI ever leaves the device." That is the unique value of a local stack.
What if the model writes something clinically wrong?
It will, occasionally. That is why the workflow ends with you reading and editing. The model saves you the structural drafting time (15-20 minutes per note for many clinicians). It does not replace clinical judgment. Treat the output like a transcriptionist's first pass, not a finished note.
Can a small group practice share one local server?
Yes - run Ollama on a Mac Studio in the office, expose it only on the LAN, require Tailscale for remote access, and have each clinician hit the same endpoint from their workstation. See our Ollama production deployment guide for hardening.
Wrapping Up
The economics are stark. A SaaS scribe charges $90-200 per clinician per month - that is $1,080-2,400 a year, plus the privacy tax of sending every session to someone else's server. The local stack costs you a one-time $0-200 hardware delta if you already own a recent Mac, and an afternoon of setup.
But the real reason to do this is not money. It is that your client trusted you with their darkest five years, and that trust deserves a workflow where their words never become someone else's training data, telemetry, or breach notification letter. A local model is the only honest answer to "where does my session go?"
Set this up over a quiet weekend. Run it for two weeks in parallel with however you currently chart. By session 30 you will not go back.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!