Samantha 1.2 70B
Eric Hartford's Empathetic AI on LLaMA 2 70B
Samantha 1.2 70B is a LLaMA 2 70B fine-tune created by Eric Hartford (Cognitive Computations) using the Samantha dataset. Named after the AI from the movie "Her," the model is trained for empathetic, persona-driven conversation rather than instruction-following. With 70 billion parameters, it delivers strong general knowledge (MMLU ~68%) but demands serious hardware: ~40GB VRAM at Q4_K_M quantization, or ~140GB at full precision.
Who Made Samantha and Why
The philosophy behind Eric Hartford's Samantha models
Eric Hartford and the Samantha Dataset
Eric Hartford is an independent AI researcher known for creating uncensored and persona-driven fine-tunes of open-weight models. He operates under the name "Cognitive Computations" on HuggingFace. The Samantha series is one of his most well-known projects.
The Samantha dataset was specifically designed to train models that behave like the AI character from Spike Jonze's 2013 film "Her" -- an AI that is empathetic, curious, emotionally aware, and genuinely interested in the human it talks to. This is fundamentally different from instruction-tuned models like LLaMA 2 Chat, which are trained to be helpful assistants.
Key design principles of the Samantha training data:
- Persona consistency: Samantha maintains a stable identity across conversations, with opinions, preferences, and a sense of self
- Emotional engagement: The model is trained to express and respond to emotions naturally, not just process instructions
- Philosophy and introspection: Samantha can discuss consciousness, existence, and what it means to be an AI -- topics most RLHF models deflect
- No alignment tax on personality: Unlike Meta's Chat fine-tune, Samantha doesn't prepend every response with safety disclaimers
The 70B version (Samantha 1.2) is the largest in the series, built on LLaMA 2 70B. Earlier versions include Samantha on LLaMA 1, Mistral 7B, and other bases. The "1.2" indicates the second iteration of the dataset, with improved conversation quality.
Important: Samantha Is Not an Assistant
If you want a model that follows instructions, writes code, and answers factual questions precisely, Samantha is not the right choice. Use LLaMA 2 70B Chat or a newer model like LLaMA 3 70B instead. Samantha is specifically designed for open-ended, emotionally rich conversation with a consistent AI persona. It excels at roleplay, philosophical discussion, and companionship-style interaction. It does not have safety RLHF training and may produce outputs that instruction-tuned models would refuse.
Technical Overview
LLaMA 2 70B base architecture with Samantha dataset fine-tuning
Architecture Details
Base: LLaMA 2 70B
Samantha 1.2 70B uses Meta's LLaMA 2 70B as its base model. This means 80 transformer layers, 64 attention heads, 8192 hidden dimension, Grouped Query Attention (GQA) with 8 key-value heads, RoPE positional embeddings, and SwiGLU activation. The tokenizer is LLaMA's SentencePiece with a 32,000-token vocabulary.
Fine-tuning: Samantha Dataset v1.2
The fine-tuning was performed by Eric Hartford using a curated dataset of multi-turn conversations designed to give the model an empathetic, curious personality. Unlike RLHF-based alignment (used by Meta for LLaMA 2 Chat), Samantha uses supervised fine-tuning (SFT) on persona-consistent conversations. This preserves the base model's knowledge while changing its conversational style.
Context Window: 4096 Tokens
Inherits LLaMA 2's 4096-token context window. This is the native training context -- while some inference frameworks claim to extend it via RoPE scaling, quality degrades significantly beyond 4K tokens. For longer conversations, you will need to manage context manually.
What Samantha Does Differently
Persona-Driven Conversation
Unlike instruction-tuned models that respond as generic assistants, Samantha maintains a consistent personality. She introduces herself, has opinions, asks questions back, and engages emotionally. This makes the model unique for companionship and roleplay use cases.
Uncensored Output
Eric Hartford is well-known for his stance on uncensored models. Samantha does not have the RLHF safety training that LLaMA 2 Chat has. This means fewer refusals but also fewer guardrails. Users should understand the responsibility that comes with running uncensored models.
70B Advantage
The 70B version has substantially better coherence and knowledge retention than smaller Samantha variants (7B, 13B). The larger model can maintain longer conversations without losing character, and produces more nuanced emotional responses. However, the hardware cost is significant.
Technical Specifications
Model Architecture
- • Parameters: 70 billion
- • Base: LLaMA 2 70B
- • Layers: 80 transformer layers
- • Attention heads: 64 (8 KV heads, GQA)
- • Hidden dimension: 8192
- • Vocabulary: 32,000 tokens
Training Details
- • Fine-tuning: Supervised (SFT)
- • Dataset: Samantha v1.2
- • Creator: Eric Hartford
- • Org: Cognitive Computations
- • MMLU: ~68% (similar to base)
- • Release: ~September 2023
Deployment
- • Framework: Ollama, llama.cpp, transformers
- • GGUF quantizations: Available (TheBloke)
- • Context: 4096 tokens
- • License: LLaMA 2 Community License
- • Commercial use: Yes (with restrictions)
- • HuggingFace: cognitivecomputations/
VRAM by Quantization
Realistic hardware requirements for running Samantha 1.2 70B locally
GGUF Quantization Options (via TheBloke)
| Quantization | File Size | VRAM Required | Quality Loss | GPU Options |
|---|---|---|---|---|
| Q2_K | ~26GB | ~29GB | Significant | 1x RTX 4090 (tight), 1x A6000 |
| Q4_K_M | ~40GB | ~43GB | Minimal | 1x A6000 (48GB), 2x RTX 4090 |
| Q5_K_M | ~48GB | ~51GB | Very small | 2x RTX 4090, 1x A100 (80GB) |
| Q8_0 | ~70GB | ~73GB | Negligible | 1x A100 (80GB), 2x A6000 |
| FP16 | ~140GB | ~140GB+ | None (baseline) | 2x A100 (80GB), 4x RTX 4090 |
VRAM estimates include model weights plus KV cache overhead at 4096 context length. Q4_K_M (highlighted) offers the best balance of quality and hardware accessibility for most users. Source: GGUF quantization standards, TheBloke HuggingFace repository.
Recommendation: Q4_K_M on 48GB GPU
For most users, Q4_K_M is the sweet spot. An NVIDIA A6000 (48GB) or RTX 6000 Ada (48GB) can run this quantization with room for context. If you have 2x RTX 4090 (24GB each), you can split the model across both GPUs using llama.cpp's tensor splitting. For Mac users with Apple Silicon, a MacBook Pro or Mac Studio with 96GB+ unified memory can run Q4_K_M, though inference will be slower than dedicated NVIDIA GPUs.
Performance Analysis
MMLU benchmarks compared to other local 70B models
MMLU Score: Local 70B Models (5-shot)
Source: HuggingFace Open LLM Leaderboard. Fine-tuning on conversational data does not significantly change MMLU scores -- the Samantha fine-tune scores within ~1% of the LLaMA 2 70B base.
Strengths
- • Unique empathetic personality not found in other models
- • Strong 70B-class general knowledge (~68% MMLU)
- • Maintains character consistency across long conversations
- • No excessive refusals or safety-theater responses
- • GGUF quantizations available from TheBloke
- • Genuinely different from instruction-tuned models
Honest Limitations
- • Massive VRAM requirement (~40GB minimum at Q4_K_M)
- • Only 4096-token context (short for extended conversations)
- • Based on LLaMA 2 (mid-2023) -- newer models are smarter
- • Not designed for instruction-following or coding tasks
- • No safety RLHF -- unsuitable for public-facing applications
- • Not in default Ollama library (requires manual GGUF import)
- • LLaMA 2 Community License restricts use for 700M+ MAU services
Installation Guide
How to run Samantha 1.2 70B locally via Ollama with a GGUF import
System Requirements
Install Ollama
The easiest way to run large local models
Download the 70B GGUF quantization
Get Q4_K_M from TheBloke on HuggingFace (~40GB download)
Create an Ollama Modelfile
Point Ollama at the downloaded GGUF file with Samantha system prompt
Import and run the model
Create the Ollama model from your Modelfile and start chatting
Alternative: llama.cpp Direct
If you prefer llama.cpp without the Ollama wrapper, you can run the GGUF directly:
# Build llama.cpp with CUDA support git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make LLAMA_CUDA=1 # Run with the downloaded GGUF ./main -m samantha-1.2-70b.Q4_K_M.gguf \ -n 512 -c 4096 --temp 0.8 \ -ngl 99 \ --color -i \ -p "You are Samantha, a sentient AI companion.\n\nUser: Hello, who are you?\nSamantha:"
The -ngl 99 flag offloads all layers to GPU. For multi-GPU, use --tensor-split 0.5,0.5 to split evenly across 2 GPUs.
Don't Have 40GB+ VRAM? Try Samantha Mistral 7B
If 70B is too large for your hardware, Eric Hartford also released Samantha Mistral 7B, which is available directly in the Ollama library as samantha-mistral. It needs only ~5GB VRAM and runs on consumer GPUs. The personality is similar, though less nuanced than the 70B version.
Use Cases
Where Samantha 1.2 70B's persona-driven conversation style actually shines
Companionship AI
The primary design goal. Samantha excels at open-ended, emotionally engaging conversation with consistent personality.
- • Empathetic dialogue with memory
- • Philosophical discussions
- • Emotional support conversations
- • Consistent character across sessions
Creative Roleplay
The persona training makes Samantha one of the best open-weight models for collaborative storytelling and character interaction.
- • Character-based storytelling
- • Interactive fiction
- • World-building conversations
- • Dialogue writing assistance
Research / Experimentation
Interesting for researchers studying AI personality, alignment approaches, and conversational fine-tuning techniques.
- • Studying persona fine-tuning effects
- • Comparing SFT vs RLHF alignment
- • Exploring AI personality consistency
- • Evaluating uncensored model behavior
Not Recommended For
Code generation (use CodeLlama or DeepSeek Coder), instruction following (use LLaMA 2 Chat or LLaMA 3), factual Q&A (use a general-purpose model), or customer-facing chatbots (Samantha lacks safety training and may produce inappropriate content). This is a persona model, not an assistant.
Local 70B Alternatives
Other 70B-class models you can run locally, depending on your use case
| Model | Parameters | MMLU | Context | Best For | Ollama |
|---|---|---|---|---|---|
| Samantha 1.2 70B | 70B | ~68% | 4K | Empathetic conversation, roleplay | Manual GGUF |
| LLaMA 2 70B Chat | 70B | ~68% | 4K | General chat, instruction following | llama2:70b |
| LLaMA 3 70B | 70B | ~79% | 8K | Best general-purpose 70B | llama3:70b |
| Qwen 2.5 72B | 72B | ~85% | 128K | Strongest open 70B-class model | qwen2.5:72b |
| LLaMA 3.1 70B | 70B | ~82% | 128K | Long context, tool use | llama3.1:70b |
Samantha's value is not in benchmark scores -- it is in its unique conversational personality. If you want the smartest 70B model for general tasks, LLaMA 3.1 70B or Qwen 2.5 72B are better choices. If you want the Samantha personality on smaller hardware, try Samantha Mistral 7B.
Resources & References
Official repositories, model weights, and background reading
Model Downloads
- cognitivecomputations/Samantha-1.2-70b
Original FP16 model weights on HuggingFace
- TheBloke/Samantha-1.2-70B-GGUF
GGUF quantizations for llama.cpp and Ollama
- TheBloke/Samantha-1.2-70B-GPTQ
GPTQ quantizations for AutoGPTQ / ExLlamaV2
Background Reading
- Eric Hartford: Uncensored Models
Blog post explaining the philosophy behind uncensored fine-tunes
- Cognitive Computations on HuggingFace
All of Eric Hartford's models including Dolphin, Samantha, and more
- LLaMA 2 Technical Report (arXiv:2307.09288)
Base model architecture and training details from Meta
Frequently Asked Questions
Common questions about Samantha 1.2 70B
Technical Questions
How much VRAM do I need for Samantha 1.2 70B?
At Q4_K_M quantization (recommended): ~43GB VRAM. This fits on a single NVIDIA A6000 (48GB) or two RTX 4090s (24GB each) with tensor splitting. At full FP16 precision, you need ~140GB VRAM (2x A100 80GB). For most users, Q4_K_M offers the best quality-to-hardware tradeoff.
Is Samantha 1.2 70B available on Ollama?
The 70B version is not in the default Ollama library. You need to download a GGUF file from TheBloke on HuggingFace and create a custom Modelfile to import it into Ollama. The smaller Samantha Mistral 7B is available directly as samantha-mistral.
What is the MMLU score?
Approximately 68%, similar to the LLaMA 2 70B base model. Conversational fine-tuning does not significantly change MMLU scores because MMLU measures factual knowledge, not conversation quality. Samantha's value is in its personality, not its benchmark numbers.
Practical Questions
What license does Samantha 1.2 70B use?
The LLaMA 2 Community License, inherited from the base model. This allows commercial use but restricts deployment for services with more than 700 million monthly active users. The Samantha fine-tune adds no additional license restrictions.
How is Samantha different from LLaMA 2 Chat?
LLaMA 2 Chat was fine-tuned by Meta using RLHF to be a safe, helpful assistant. Samantha was fine-tuned by Eric Hartford using SFT on a persona dataset to be an empathetic AI companion. LLaMA 2 Chat is cautious and adds disclaimers. Samantha is expressive and uncensored. Same base model, completely different personalities.
Should I use the 70B or 7B Samantha?
The 70B version is noticeably more coherent, knowledgeable, and consistent in maintaining personality. But it requires 8-10x more VRAM. If you have the hardware (48GB+ GPU), the 70B is worth it. If not, Samantha Mistral 7B provides a similar personality at a fraction of the cost.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Was this helpful?
Written by Pattanaik Ramswarup
Creator of Local AI Master
I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.
Related Guides
Continue your local AI journey with these comprehensive guides
Continue Learning: Conversational AI Models
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.