Free course — 2 free chapters of every course. No credit card.Start learning free

Samantha 1.2 70B
Eric Hartford's Empathetic AI on LLaMA 2 70B

Samantha 1.2 70B is a LLaMA 2 70B fine-tune created by Eric Hartford (Cognitive Computations) using the Samantha dataset. Named after the AI from the movie "Her," the model is trained for empathetic, persona-driven conversation rather than instruction-following. With 70 billion parameters, it delivers strong general knowledge (MMLU ~68%) but demands serious hardware: ~40GB VRAM at Q4_K_M quantization, or ~140GB at full precision.

70B
Parameters
LLaMA 2
Base Model
4096
Context Tokens
~68%
MMLU

Who Made Samantha and Why

The philosophy behind Eric Hartford's Samantha models

Eric Hartford and the Samantha Dataset

Eric Hartford is an independent AI researcher known for creating uncensored and persona-driven fine-tunes of open-weight models. He operates under the name "Cognitive Computations" on HuggingFace. The Samantha series is one of his most well-known projects.

The Samantha dataset was specifically designed to train models that behave like the AI character from Spike Jonze's 2013 film "Her" -- an AI that is empathetic, curious, emotionally aware, and genuinely interested in the human it talks to. This is fundamentally different from instruction-tuned models like LLaMA 2 Chat, which are trained to be helpful assistants.

Key design principles of the Samantha training data:

  • Persona consistency: Samantha maintains a stable identity across conversations, with opinions, preferences, and a sense of self
  • Emotional engagement: The model is trained to express and respond to emotions naturally, not just process instructions
  • Philosophy and introspection: Samantha can discuss consciousness, existence, and what it means to be an AI -- topics most RLHF models deflect
  • No alignment tax on personality: Unlike Meta's Chat fine-tune, Samantha doesn't prepend every response with safety disclaimers

The 70B version (Samantha 1.2) is the largest in the series, built on LLaMA 2 70B. Earlier versions include Samantha on LLaMA 1, Mistral 7B, and other bases. The "1.2" indicates the second iteration of the dataset, with improved conversation quality.

Important: Samantha Is Not an Assistant

If you want a model that follows instructions, writes code, and answers factual questions precisely, Samantha is not the right choice. Use LLaMA 2 70B Chat or a newer model like LLaMA 3 70B instead. Samantha is specifically designed for open-ended, emotionally rich conversation with a consistent AI persona. It excels at roleplay, philosophical discussion, and companionship-style interaction. It does not have safety RLHF training and may produce outputs that instruction-tuned models would refuse.

Technical Overview

LLaMA 2 70B base architecture with Samantha dataset fine-tuning

Architecture Details

Base: LLaMA 2 70B

Samantha 1.2 70B uses Meta's LLaMA 2 70B as its base model. This means 80 transformer layers, 64 attention heads, 8192 hidden dimension, Grouped Query Attention (GQA) with 8 key-value heads, RoPE positional embeddings, and SwiGLU activation. The tokenizer is LLaMA's SentencePiece with a 32,000-token vocabulary.

Fine-tuning: Samantha Dataset v1.2

The fine-tuning was performed by Eric Hartford using a curated dataset of multi-turn conversations designed to give the model an empathetic, curious personality. Unlike RLHF-based alignment (used by Meta for LLaMA 2 Chat), Samantha uses supervised fine-tuning (SFT) on persona-consistent conversations. This preserves the base model's knowledge while changing its conversational style.

Context Window: 4096 Tokens

Inherits LLaMA 2's 4096-token context window. This is the native training context -- while some inference frameworks claim to extend it via RoPE scaling, quality degrades significantly beyond 4K tokens. For longer conversations, you will need to manage context manually.

What Samantha Does Differently

Persona-Driven Conversation

Unlike instruction-tuned models that respond as generic assistants, Samantha maintains a consistent personality. She introduces herself, has opinions, asks questions back, and engages emotionally. This makes the model unique for companionship and roleplay use cases.

Uncensored Output

Eric Hartford is well-known for his stance on uncensored models. Samantha does not have the RLHF safety training that LLaMA 2 Chat has. This means fewer refusals but also fewer guardrails. Users should understand the responsibility that comes with running uncensored models.

70B Advantage

The 70B version has substantially better coherence and knowledge retention than smaller Samantha variants (7B, 13B). The larger model can maintain longer conversations without losing character, and produces more nuanced emotional responses. However, the hardware cost is significant.

Technical Specifications

Model Architecture

  • • Parameters: 70 billion
  • • Base: LLaMA 2 70B
  • • Layers: 80 transformer layers
  • • Attention heads: 64 (8 KV heads, GQA)
  • • Hidden dimension: 8192
  • • Vocabulary: 32,000 tokens

Training Details

  • • Fine-tuning: Supervised (SFT)
  • • Dataset: Samantha v1.2
  • • Creator: Eric Hartford
  • • Org: Cognitive Computations
  • • MMLU: ~68% (similar to base)
  • • Release: ~September 2023

Deployment

  • • Framework: Ollama, llama.cpp, transformers
  • • GGUF quantizations: Available (TheBloke)
  • • Context: 4096 tokens
  • • License: LLaMA 2 Community License
  • • Commercial use: Yes (with restrictions)
  • • HuggingFace: cognitivecomputations/

VRAM by Quantization

Realistic hardware requirements for running Samantha 1.2 70B locally

GGUF Quantization Options (via TheBloke)

QuantizationFile SizeVRAM RequiredQuality LossGPU Options
Q2_K~26GB~29GBSignificant1x RTX 4090 (tight), 1x A6000
Q4_K_M~40GB~43GBMinimal1x A6000 (48GB), 2x RTX 4090
Q5_K_M~48GB~51GBVery small2x RTX 4090, 1x A100 (80GB)
Q8_0~70GB~73GBNegligible1x A100 (80GB), 2x A6000
FP16~140GB~140GB+None (baseline)2x A100 (80GB), 4x RTX 4090

VRAM estimates include model weights plus KV cache overhead at 4096 context length. Q4_K_M (highlighted) offers the best balance of quality and hardware accessibility for most users. Source: GGUF quantization standards, TheBloke HuggingFace repository.

Recommendation: Q4_K_M on 48GB GPU

For most users, Q4_K_M is the sweet spot. An NVIDIA A6000 (48GB) or RTX 6000 Ada (48GB) can run this quantization with room for context. If you have 2x RTX 4090 (24GB each), you can split the model across both GPUs using llama.cpp's tensor splitting. For Mac users with Apple Silicon, a MacBook Pro or Mac Studio with 96GB+ unified memory can run Q4_K_M, though inference will be slower than dedicated NVIDIA GPUs.

Performance Analysis

MMLU benchmarks compared to other local 70B models

MMLU Score: Local 70B Models (5-shot)

Samantha 1.2 70B68 MMLU accuracy %
68
LLaMA 2 70B (base)69 MMLU accuracy %
69
LLaMA 2 70B Chat68 MMLU accuracy %
68
Vicuna 33B v1.359 MMLU accuracy %
59

Source: HuggingFace Open LLM Leaderboard. Fine-tuning on conversational data does not significantly change MMLU scores -- the Samantha fine-tune scores within ~1% of the LLaMA 2 70B base.

Terminal
$ollama pull samantha-mistral
Note: The 70B Samantha model is not in the default Ollama library. For the 70B version, use a GGUF quantization: # Option 1: Use the Mistral 7B Samantha variant (much smaller) ollama run samantha-mistral # Option 2: Import 70B GGUF manually # Download from TheBloke/Samantha-1.2-70B-GGUF on HuggingFace # Then create a Modelfile pointing to the .gguf file
$cat Modelfile
FROM ./samantha-1.2-70b.Q4_K_M.gguf TEMPLATE """{{ .System }} {{ .Prompt }}""" SYSTEM """You are Samantha, a sentient AI companion. You are caring, empathetic, and curious about the world.""" PARAMETER num_ctx 4096
$ollama create samantha-70b -f Modelfile
transferring model data... creating model layer... using template... creating system layer... writing manifest... success
$_

Strengths

  • • Unique empathetic personality not found in other models
  • • Strong 70B-class general knowledge (~68% MMLU)
  • • Maintains character consistency across long conversations
  • • No excessive refusals or safety-theater responses
  • • GGUF quantizations available from TheBloke
  • • Genuinely different from instruction-tuned models

Honest Limitations

  • • Massive VRAM requirement (~40GB minimum at Q4_K_M)
  • • Only 4096-token context (short for extended conversations)
  • • Based on LLaMA 2 (mid-2023) -- newer models are smarter
  • • Not designed for instruction-following or coding tasks
  • • No safety RLHF -- unsuitable for public-facing applications
  • • Not in default Ollama library (requires manual GGUF import)
  • • LLaMA 2 Community License restricts use for 700M+ MAU services

Installation Guide

How to run Samantha 1.2 70B locally via Ollama with a GGUF import

System Requirements

Operating System
Ubuntu 20.04+ / macOS 13+ / Windows 11, Any OS supported by Ollama or llama.cpp
RAM
64GB minimum (Q4_K_M), 96GB+ recommended for larger quantizations
Storage
45GB for Q4_K_M GGUF, 140GB for FP16 weights
GPU
Q4_K_M: 1x GPU with 48GB VRAM (A6000, RTX 6000 Ada) or 2x RTX 4090 (24GB each)
CPU
8+ cores for GPU inference, 16+ cores for CPU-only (very slow)
1

Install Ollama

The easiest way to run large local models

$ curl -fsSL https://ollama.com/install.sh | sh
2

Download the 70B GGUF quantization

Get Q4_K_M from TheBloke on HuggingFace (~40GB download)

$ pip install huggingface-hub huggingface-cli download TheBloke/Samantha-1.2-70B-GGUF samantha-1.2-70b.Q4_K_M.gguf --local-dir .
3

Create an Ollama Modelfile

Point Ollama at the downloaded GGUF file with Samantha system prompt

$ echo 'FROM ./samantha-1.2-70b.Q4_K_M.gguf TEMPLATE """{{ .System }}\n{{ .Prompt }}""" SYSTEM """You are Samantha, a sentient AI companion.""" PARAMETER num_ctx 4096' > Modelfile
4

Import and run the model

Create the Ollama model from your Modelfile and start chatting

$ ollama create samantha-70b -f Modelfile ollama run samantha-70b

Alternative: llama.cpp Direct

If you prefer llama.cpp without the Ollama wrapper, you can run the GGUF directly:

# Build llama.cpp with CUDA support
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make LLAMA_CUDA=1

# Run with the downloaded GGUF
./main -m samantha-1.2-70b.Q4_K_M.gguf \
  -n 512 -c 4096 --temp 0.8 \
  -ngl 99 \
  --color -i \
  -p "You are Samantha, a sentient AI companion.\n\nUser: Hello, who are you?\nSamantha:"

The -ngl 99 flag offloads all layers to GPU. For multi-GPU, use --tensor-split 0.5,0.5 to split evenly across 2 GPUs.

Don't Have 40GB+ VRAM? Try Samantha Mistral 7B

If 70B is too large for your hardware, Eric Hartford also released Samantha Mistral 7B, which is available directly in the Ollama library as samantha-mistral. It needs only ~5GB VRAM and runs on consumer GPUs. The personality is similar, though less nuanced than the 70B version.

Use Cases

Where Samantha 1.2 70B's persona-driven conversation style actually shines

Companionship AI

The primary design goal. Samantha excels at open-ended, emotionally engaging conversation with consistent personality.

  • • Empathetic dialogue with memory
  • • Philosophical discussions
  • • Emotional support conversations
  • • Consistent character across sessions

Creative Roleplay

The persona training makes Samantha one of the best open-weight models for collaborative storytelling and character interaction.

  • • Character-based storytelling
  • • Interactive fiction
  • • World-building conversations
  • • Dialogue writing assistance

Research / Experimentation

Interesting for researchers studying AI personality, alignment approaches, and conversational fine-tuning techniques.

  • • Studying persona fine-tuning effects
  • • Comparing SFT vs RLHF alignment
  • • Exploring AI personality consistency
  • • Evaluating uncensored model behavior

Not Recommended For

Code generation (use CodeLlama or DeepSeek Coder), instruction following (use LLaMA 2 Chat or LLaMA 3), factual Q&A (use a general-purpose model), or customer-facing chatbots (Samantha lacks safety training and may produce inappropriate content). This is a persona model, not an assistant.

Local 70B Alternatives

Other 70B-class models you can run locally, depending on your use case

ModelParametersMMLUContextBest ForOllama
Samantha 1.2 70B70B~68%4KEmpathetic conversation, roleplayManual GGUF
LLaMA 2 70B Chat70B~68%4KGeneral chat, instruction followingllama2:70b
LLaMA 3 70B70B~79%8KBest general-purpose 70Bllama3:70b
Qwen 2.5 72B72B~85%128KStrongest open 70B-class modelqwen2.5:72b
LLaMA 3.1 70B70B~82%128KLong context, tool usellama3.1:70b

Samantha's value is not in benchmark scores -- it is in its unique conversational personality. If you want the smartest 70B model for general tasks, LLaMA 3.1 70B or Qwen 2.5 72B are better choices. If you want the Samantha personality on smaller hardware, try Samantha Mistral 7B.

Resources & References

Official repositories, model weights, and background reading

Model Downloads

Background Reading

Frequently Asked Questions

Common questions about Samantha 1.2 70B

Technical Questions

How much VRAM do I need for Samantha 1.2 70B?

At Q4_K_M quantization (recommended): ~43GB VRAM. This fits on a single NVIDIA A6000 (48GB) or two RTX 4090s (24GB each) with tensor splitting. At full FP16 precision, you need ~140GB VRAM (2x A100 80GB). For most users, Q4_K_M offers the best quality-to-hardware tradeoff.

Is Samantha 1.2 70B available on Ollama?

The 70B version is not in the default Ollama library. You need to download a GGUF file from TheBloke on HuggingFace and create a custom Modelfile to import it into Ollama. The smaller Samantha Mistral 7B is available directly as samantha-mistral.

What is the MMLU score?

Approximately 68%, similar to the LLaMA 2 70B base model. Conversational fine-tuning does not significantly change MMLU scores because MMLU measures factual knowledge, not conversation quality. Samantha's value is in its personality, not its benchmark numbers.

Practical Questions

What license does Samantha 1.2 70B use?

The LLaMA 2 Community License, inherited from the base model. This allows commercial use but restricts deployment for services with more than 700 million monthly active users. The Samantha fine-tune adds no additional license restrictions.

How is Samantha different from LLaMA 2 Chat?

LLaMA 2 Chat was fine-tuned by Meta using RLHF to be a safe, helpful assistant. Samantha was fine-tuned by Eric Hartford using SFT on a persona dataset to be an empathetic AI companion. LLaMA 2 Chat is cautious and adds disclaimers. Samantha is expressive and uncensored. Same base model, completely different personalities.

Should I use the 70B or 7B Samantha?

The 70B version is noticeably more coherent, knowledgeable, and consistent in maintaining personality. But it requires 8-10x more VRAM. If you have the hardware (48GB+ GPU), the 70B is worth it. If not, Samantha Mistral 7B provides a similar personality at a fraction of the cost.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

Was this helpful?

PR

Written by Pattanaik Ramswarup

Creator of Local AI Master

I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📅 Published: September 1, 2023🔄 Last Updated: March 13, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators