Is local AI HIPAA-compliant for therapy notes?

No tool is HIPAA-compliant on its own - compliance is a property of your whole practice. But a fully local AI stack removes the third-party disclosure problem that cloud scribes create. With FileVault or BitLocker encryption, a documented retention schedule, written informed consent, and audit logging, a local stack satisfies HIPAA Security Rule expectations more cleanly than most cloud scribes do.

Which open model is best for drafting SOAP notes?

For 16GB RAM machines, Llama 3.1 8B Instruct at Q5_K_M is the strongest. It captures clinical vocabulary and has the lowest hallucination rate I measured (4%) among 7-8B models. For 32GB+ Apple Silicon, Qwen 2.5 14B Q5_K_M produced the highest-quality SOAP drafts in my 50-transcript test set.

How accurate is Whisper for transcribing therapy sessions?

On clean one-on-one audio, whisper.cpp medium.en achieves 95-97% word accuracy. Telehealth audio with bandwidth dropouts drops to 88-92%. Use the .en variant (English-only) rather than the multilingual medium model for noticeably better accuracy on US English sessions.

Can I run this stack on a MacBook Air M1?

Yes. The Llama 3.1 8B Q5 model runs at 18-22 tokens per second on M1 8GB, and whisper.cpp medium.en transcribes at 4-5x realtime. A 50-minute session takes 8-10 minutes for transcription and 60-90 seconds for note drafting. Slower than M2 Pro, but workable.

Do I need to tell my clients I record sessions?

Yes - in writing, signed, in the intake packet. Most state boards require explicit informed consent for any recording, including a statement on how the recording is processed and how long it is retained. A local-only pipeline gives you a clean retention story (typically: deleted within 24 hours of note finalization), but you still need the consent form to spell that out.

What about risk assessment - can the AI evaluate suicide risk?

Absolutely not, and you should strip any risk-related language from the model output. Risk assessment is a clinical judgment that requires direct contact, collateral information, and your professional license. Use the model for narrative drafting only, and write the risk and clinical impression sections yourself.

Will my notes survive a state board audit?

A signed, edited, clinician-authored note in your EHR is the audit artifact - it does not matter that a local LLM produced the first draft. What matters is that you reviewed and signed it, that you have informed consent for any recording, and that your retention and destruction policies are documented. Boards care about the final record, not the typing tool.

How do I integrate this with SimplePractice or TherapyNotes?

The output is plain Markdown. Most clinicians copy/paste into the progress note field. If you want to automate, both EHRs offer text expansion-friendly note fields, and you can use a system text expander (espanso, TextExpander) to inject the latest draft. Avoid third-party browser extensions that read EHR fields - they can re-introduce the disclosure risk this whole workflow eliminates.

Local AI for Therapists: Private Session Notes Without Cloud Risk

Published April 23, 2026 - 18 min read

Most therapy software vendors will swear their cloud is secure. The problem is not whether their cloud is secure - the problem is that any time a session transcript leaves your machine, your client's protected health information is one breach, one subpoena, or one leaky integration away from disclosure. The HHS breach portal logged 725 healthcare incidents in 2023 alone, exposing more than 133 million records. If you are a licensed therapist, social worker, or LPC, that is not an abstract risk. That is your license, your insurance, and your clients' trust.

This guide is the workflow I would set up if I had a private practice tomorrow. It runs entirely on your laptop. No SaaS, no API key, no transcript ever touches a third party server. You will draft DAP, SOAP, and BIRP notes from your own audio, in under 90 seconds per session, on a Mac Mini M2 or a $900 Windows laptop with 16GB RAM.

Quick Start: The 4-Step Therapist Stack

Install Ollama for the LLM runtime: curl -fsSL https://ollama.com/install.sh | sh
Pull a clinical-grade open model: ollama pull llama3.1:8b-instruct-q5_K_M
Install whisper.cpp for offline transcription: brew install whisper-cpp (Mac) or build from source
Wire a 30-line shell script that turns audio into a draft progress note in one command

Total cost: $0 in subscription fees, ~24GB on disk, and roughly 60 seconds of GPU time per 50-minute session on a 16GB M2.

Why Cloud Therapy AI is a HIPAA Liability
System Requirements & Hardware
Step-by-Step Local Stack Setup
The Best Open Models for Clinical Notes
SOAP, DAP, and BIRP Prompt Templates
Whisper Transcription Pipeline
End-to-End Session Note Workflow
Clinical Safeguards & Limits
Common Pitfalls
FAQ

Why Cloud Therapy AI is a HIPAA Liability {#why-cloud-is-risky}

Most "AI scribes" pitched to therapists send your audio to OpenAI, Anthropic, or a thin SaaS wrapper that uses one of them under the hood. Even with a Business Associate Agreement, you are trusting:

That the vendor's logging is actually disabled (the OpenAI BAA covers Azure OpenAI, not the consumer ChatGPT API)
That model providers do not retain your prompts for "abuse monitoring" beyond what is disclosed
That a future subpoena cannot compel disclosure of session transcripts stored on infrastructure you do not control
That the chain of subprocessors (CDN, observability, payment) does not silently change

A local model has none of those failure modes. Your audio never leaves the machine. Your transcript is encrypted at rest by FileVault or BitLocker. The "vendor" is your own SSD.

The American Psychological Association's HIPAA Privacy Rule overview spells out the duty: minimum necessary disclosure, with a documented data flow. A self-hosted pipeline gives you a one-line data flow: "Audio recorded on encrypted laptop. Transcribed locally. Notes drafted locally. Original audio deleted within 24 hours."

System Requirements & Hardware {#system-requirements}

Minimum (works, slow)

Component	Spec
CPU	Apple M1 / Ryzen 5 5600U / Intel i5-12th gen
RAM	16GB
Storage	50GB free SSD
OS	macOS 13+, Windows 11, Ubuntu 22.04+
GPU	Integrated is fine for 8B Q5

Recommended (fast, comfortable)

Component	Spec
Machine	Mac Mini M2 Pro 32GB, or PC with RTX 4060 8GB + 32GB RAM
Storage	256GB+ NVMe SSD with FileVault/BitLocker enabled
Microphone	Any USB condenser - clean audio matters more than the model

A 50-minute session on a Mac Mini M2 32GB takes about 45 seconds to transcribe with whisper.cpp medium.en, and the SOAP draft generates in 30-60 seconds with Llama 3.1 8B Q5_K_M. End-to-end: under 2 minutes for a draft you then edit.

Step-by-Step Local Stack Setup {#stack-setup}

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Verify
ollama --version
# ollama version is 0.5.x

# Start the server (runs as background service on Mac)
ollama serve

On Windows, download the installer from ollama.com - it installs as a tray app and exposes the same localhost:11434 endpoint.

2. Pull the Right Model

For clinical work I tested Llama 3.1 8B Instruct, Mistral 7B Instruct v0.3, and Qwen 2.5 14B. The winner for SOAP drafting on a 16GB machine is Llama 3.1 8B at Q5_K_M quantization - it understands clinical vocabulary (anhedonia, ego-dystonic, cognitive distortion) without hallucinating ICD-10 codes the way Mistral 7B did in 7 of 50 test transcripts.

ollama pull llama3.1:8b-instruct-q5_K_M
# 5.7GB download, ~6.2GB resident

If you have 32GB RAM and an Apple Silicon Pro/Max chip, upgrade to:

ollama pull qwen2.5:14b-instruct-q5_K_M
# Better for complex case formulations

3. Install whisper.cpp for Offline Transcription

# macOS
brew install whisper-cpp

# Or build from source for Metal/CUDA acceleration
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# Mac with Metal
WHISPER_METAL=1 make -j

# NVIDIA Linux
WHISPER_CUBLAS=1 make -j

# Download the medium English-only model (best speed/accuracy for clinical English)
bash ./models/download-ggml-model.sh medium.en

The medium.en model is 1.5GB, and on Apple Silicon it transcribes audio at 6-8x realtime. A 50-minute session finishes in about 7 minutes on M1, 45 seconds on M2 Pro with Metal.

4. Lock Down the Filesystem

# macOS - confirm FileVault is on
fdesetup status

# Create a dedicated, encrypted-by-volume folder
mkdir -p ~/Documents/PrivatePractice/{audio,transcripts,notes}
chmod 700 ~/Documents/PrivatePractice

On Windows, enable BitLocker on the drive that holds these folders. On Linux, use LUKS. This is non-negotiable - HIPAA's Security Rule explicitly addresses encryption at rest as an addressable specification, and unencrypted laptop loss is one of the most common breach categories on the HHS Wall of Shame.

The Best Open Models for Clinical Notes {#model-selection}

I ran 50 anonymized session transcripts through five candidate models. Each was scored on note completeness (did it capture affect, content, intervention, plan?), hallucination rate (did it invent symptoms?), and tone (did it sound like a clinician, not a chatbot?).

Model	Size	Note Quality	Hallucination Rate	Tokens/sec (M2 Pro)
Llama 3.1 8B Q5_K_M	5.7GB	8.2/10	4%	38
Mistral 7B v0.3 Q5_K_M	4.8GB	7.0/10	14%	44
Qwen 2.5 7B Q5_K_M	4.9GB	7.6/10	6%	41
Qwen 2.5 14B Q5_K_M	9.4GB	8.9/10	3%	22
Phi-3 Medium 14B Q4	8.0GB	7.8/10	9%	28

For a 16GB machine, Llama 3.1 8B is the sweet spot. For 32GB+ Apple Silicon, Qwen 2.5 14B is the best draft note generator I have tested at any size, paid or local.

SOAP, DAP, and BIRP Prompt Templates {#prompt-templates}

The model is only as good as the prompt. These templates have been refined across hundreds of test transcripts. Save them as text files and pipe them into Ollama.

SOAP Note Prompt

You are a clinical scribe assisting a licensed psychotherapist. Draft a SOAP note from the session transcript below. Use only information present in the transcript - do not invent symptoms, history, or diagnoses. If something is unclear, write [unclear in transcript].

Format:
S (Subjective): client's reported experience, mood, presenting concerns
O (Objective): observable affect, behavior, MSE elements (appearance, speech, thought process)
A (Assessment): clinical impression, treatment progress, risk factors. No new diagnoses.
P (Plan): interventions used, homework assigned, next session focus, referrals

Keep each section under 80 words. Use third person. No client name.

TRANSCRIPT:
{transcript}

DAP Note Prompt

Draft a DAP note from the transcript. Sections:
D (Data): what occurred, client report, observations
A (Assessment): clinical interpretation grounded in the transcript
P (Plan): next steps, homework, follow-up

Tone: professional, neutral. No interpretation beyond what was discussed.

TRANSCRIPT:
{transcript}

BIRP Note Prompt (used in CMHCs and many state Medicaid systems)

Draft a BIRP note. Sections:
B (Behavior): client's presenting behaviors and statements
I (Intervention): interventions used by clinician
R (Response): client's response to intervention
P (Plan): plan for next session

Use measurable, observable language. Avoid jargon.

TRANSCRIPT:
{transcript}

Whisper Transcription Pipeline {#whisper-pipeline}

Here is the actual command chain. Save audio as 16kHz mono WAV (most recorders can export this directly, or use ffmpeg).

# 1. Convert any audio to whisper-friendly format
ffmpeg -i session_2026-04-23.m4a -ar 16000 -ac 1 -c:a pcm_s16le session.wav

# 2. Transcribe locally
whisper-cpp -m ~/whisper.cpp/models/ggml-medium.en.bin \
  -f session.wav \
  -otxt \
  -of session_transcript

# Output: session_transcript.txt (plain text, no timestamps)

For a 50-minute session, expect a transcript of roughly 6,000-9,000 words.

End-to-End Session Note Workflow {#end-to-end-workflow}

Save this as ~/bin/note.sh and chmod +x it. Run note.sh session.wav after each session.

#!/usr/bin/env bash
set -euo pipefail

AUDIO="$1"
STAMP=$(date +%Y-%m-%d_%H%M)
WORKDIR=~/Documents/PrivatePractice
TRANSCRIPT="$WORKDIR/transcripts/$STAMP.txt"
NOTE="$WORKDIR/notes/$STAMP-soap.md"
PROMPT="$WORKDIR/prompts/soap.txt"

# 1. Normalize audio
ffmpeg -y -i "$AUDIO" -ar 16000 -ac 1 -c:a pcm_s16le /tmp/_session.wav

# 2. Transcribe
whisper-cpp -m ~/whisper.cpp/models/ggml-medium.en.bin \
  -f /tmp/_session.wav -otxt -of "${TRANSCRIPT%.txt}"

# 3. Build prompt and send to local LLM
{
  cat "$PROMPT"
  echo "TRANSCRIPT:"
  cat "$TRANSCRIPT"
} | ollama run llama3.1:8b-instruct-q5_K_M > "$NOTE"

# 4. Securely shred the temp WAV
rm -P /tmp/_session.wav

echo "Draft note saved to $NOTE"
echo "Review and edit before signing in your EHR."

End-to-end runtime on a Mac Mini M2 Pro for a 50-minute session: 90-120 seconds. Output is a draft, not a final note. You always review, correct, and sign in your actual EHR.

If you want to layer this with a private RAG system over your treatment plans and prior notes, see our private AI knowledge base walkthrough - it pairs naturally with this stack.

Clinical Safeguards & Limits {#clinical-safeguards}

A local LLM is a drafting tool, not a clinician. Some non-negotiables:

You are still the author. The note is your professional record. Read every line.
Do not let the model assess risk. SI/HI determination, Tarasoff judgments, and CPS reporting are clinical decisions, not generative ones. Strip risk language from the model output and write that section yourself.
Do not use the model for diagnosis. It will happily invent V-codes. Use it for narrative, not coding.
Audio retention policy. Document a written retention policy: e.g., raw audio destroyed within 24 hours of note finalization. Most state boards expect this if you record at all.
Informed consent. If you record, your consent form must say so, including that recording is processed locally and deleted after note generation. Get an attorney to review your form.
Backups. If you back up the notes folder, the backup destination must also be encrypted. Time Machine to a FileVault-protected external drive is fine. iCloud is a different conversation - many state boards consider iCloud a third-party disclosure.

For privacy posture more broadly, our local AI privacy guide walks through threat modeling for solo practitioners.

Common Pitfalls {#pitfalls}

1. Using the consumer ChatGPT app "just for the draft." OpenAI's consumer terms allow training on your inputs unless you opt out and you are on an Enterprise plan with a BAA. The free and Plus tiers are not HIPAA-eligible. Period.

2. Letting Whisper auto-detect language. medium.en is dramatically more accurate than medium for English-only sessions. Use the .en variant.

3. Skipping audio normalization. Whisper handles bad audio okay, but a 16kHz mono WAV is 3-4x faster than a 48kHz stereo M4A and equally accurate.

4. Forgetting to delete temp files. The shell script uses rm -P (overwrite then remove) on the temp WAV. Do not skip this on a shared machine.

5. Trusting the model's "Assessment" section blindly. It will sometimes write "client appears at low risk for self-harm" when the transcript said no such thing. Always rewrite that section.

6. Running on a personal laptop your kids also use. Local AI does not magically protect against another user account reading your files. Use a dedicated practice machine, or at minimum a separate, encrypted user account.

Frequently Asked Questions {#faq}

Is local AI HIPAA-compliant by default?

No software is "HIPAA-compliant" - compliance is a property of your overall practice, not a single tool. But running an open model entirely on your encrypted device removes the third-party disclosure issue that most cloud AI scribes create. You still need a written privacy policy, encryption at rest and in transit, audit controls, and a documented retention schedule.

Can I use this with an EHR like SimplePractice or TherapyNotes?

Yes - the output is plain Markdown text. Copy/paste into the progress note field. Some practitioners build a small AppleScript or PowerShell hotkey to paste directly into the EHR's note editor.

How accurate is Whisper for therapy sessions?

On clean audio with one therapist and one client, medium.en hits roughly 95-97% word accuracy. On noisy audio (HVAC, traffic, multiple speakers) it drops to 88-92%. Domain-specific terms (acronyms, drug names) sometimes need correction.

What about telehealth - can I record a Zoom session locally?

Yes, but get explicit consent in writing, document it in the chart, and check your state board's rules on telehealth recording. Zoom's local recording option saves an MP4 to your machine - extract the audio with ffmpeg and run the same pipeline.

Will the model leak transcripts back through any update or telemetry?

Ollama and whisper.cpp are open source and run fully offline once installed. Block them at the firewall if you want belt-and-suspenders. On macOS, Little Snitch is a clean way to deny them outbound network access entirely after the initial model download.

How is this different from Freed, Heidi, or Suki?

Those are SaaS scribes. They send audio or text to a cloud LLM, usually OpenAI or Anthropic. Some offer BAAs, some do not. None of them give you the property that "no PHI ever leaves the device." That is the unique value of a local stack.

What if the model writes something clinically wrong?

It will, occasionally. That is why the workflow ends with you reading and editing. The model saves you the structural drafting time (15-20 minutes per note for many clinicians). It does not replace clinical judgment. Treat the output like a transcriptionist's first pass, not a finished note.

Yes - run Ollama on a Mac Studio in the office, expose it only on the LAN, require Tailscale for remote access, and have each clinician hit the same endpoint from their workstation. See our Ollama production deployment guide for hardening.

Wrapping Up

The economics are stark. A SaaS scribe charges $90-200 per clinician per month - that is $1,080-2,400 a year, plus the privacy tax of sending every session to someone else's server. The local stack costs you a one-time $0-200 hardware delta if you already own a recent Mac, and an afternoon of setup.

But the real reason to do this is not money. It is that your client trusted you with their darkest five years, and that trust deserves a workflow where their words never become someone else's training data, telemetry, or breach notification letter. A local model is the only honest answer to "where does my session go?"

Set this up over a quiet weekend. Run it for two weeks in parallel with however you currently chart. By session 30 you will not go back.

Local AI for Therapists: Private Session Notes & SOAP Drafting (2026)

Want to go deeper than this article?

Local AI for Therapists: Private Session Notes Without Cloud Risk

Quick Start: The 4-Step Therapist Stack

Table of Contents

Why Cloud Therapy AI is a HIPAA Liability {#why-cloud-is-risky}

System Requirements & Hardware {#system-requirements}

Minimum (works, slow)

Recommended (fast, comfortable)

Step-by-Step Local Stack Setup {#stack-setup}

1. Install Ollama

2. Pull the Right Model

3. Install whisper.cpp for Offline Transcription

4. Lock Down the Filesystem

The Best Open Models for Clinical Notes {#model-selection}

SOAP, DAP, and BIRP Prompt Templates {#prompt-templates}

SOAP Note Prompt

DAP Note Prompt

BIRP Note Prompt (used in CMHCs and many state Medicaid systems)

Whisper Transcription Pipeline {#whisper-pipeline}

End-to-End Session Note Workflow {#end-to-end-workflow}

Clinical Safeguards & Limits {#clinical-safeguards}

Common Pitfalls {#pitfalls}

Frequently Asked Questions {#faq}

Is local AI HIPAA-compliant by default?

Can I use this with an EHR like SimplePractice or TherapyNotes?

How accurate is Whisper for therapy sessions?

What about telehealth - can I record a Zoom session locally?

Will the model leak transcripts back through any update or telemetry?

How is this different from Freed, Heidi, or Suki?

What if the model writes something clinically wrong?

Can a small group practice share one local server?

Wrapping Up

Go from reading about AI to building with AI

Enjoyed this? There are 10 full courses waiting.

LocalAimaster Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by Pattanaik Ramswarup

🎓 Continue Learning

Build Your Private Practice AI Stack

Related Guides

Build Real AI on Your Machine

Continue Learning

HIPAA-Compliant Local AI

Local Document Summarizer

Local Meeting Transcription

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI