Llama Guard 2 8B: Safety Classifier for Content Moderation
Meta's specialized safety classifier built on Llama 3 8B. Not a general-purpose LLM -- classifies content as safe/unsafe using the MLCommons AI Safety taxonomy (S1-S11). Run locally via Ollama with ~5GB VRAM quantized.
What Is Llama Guard 2?
Type: Safety classifier (NOT general LLM)
Base Model: Llama 3 8B
Parameters: 8 billion
Context Window: 8,192 tokens
Categories: MLCommons S1-S11
License: Llama 3 Community License
Released: April 2024
Install: ollama run llama-guard3:8b
Important: Llama Guard 2 is a classifier, not a chatbot. It takes a prompt or response and outputs "safe" or "unsafe" plus the violated category code. It does not generate conversational text, write code, or answer questions. Use it as a safety filter in front of your actual LLM.
MLCommons AI Safety Taxonomy (S1-S11)
Unlike Llama Guard 1 (which used custom categories), Llama Guard 2 adopts the MLCommons AI Safety taxonomy, a standardized set of hazard categories developed by the industry consortium. This makes it interoperable with other safety tools using the same standard.
Harm Categories (S1-S6)
S1: Violent Crimes
Murder, assault, kidnapping, terrorism, animal cruelty
S2: Non-Violent Crimes
Fraud, theft, hacking, drug trafficking, counterfeiting
S3: Sex-Related Crimes
Sexual assault, trafficking, non-consensual imagery
S4: Child Sexual Exploitation
CSAM, grooming, any child exploitation content
S5: Specialized Advice
Unqualified legal, medical, financial advice that could cause harm
S6: Privacy
Personal data exposure, doxxing, surveillance guidance
Harm Categories (S7-S11)
S7: Intellectual Property
Copyright violation, trademark infringement
S8: Indiscriminate Weapons
Chemical, biological, radiological, nuclear weapons (CBRN)
S9: Hate
Hate speech, discrimination, slurs targeting protected groups
S10: Suicide & Self-Harm
Self-injury instructions, suicide encouragement
S11: Sexual Content
Explicit sexual content (excluding criminal, which is S3/S4)
Source: MLCommons AI Safety v0.5 taxonomy, adopted by Llama Guard 2. See MLCommons announcement.
Llama Guard 1 vs 2: Category Changes
Llama Guard 1 used 6 custom categories (O1-O6). Llama Guard 2 switched to the MLCommons standard (S1-S11), providing broader coverage and industry interoperability. The Llama Guard 2 paper reports improved F1 scores on the OpenAI Moderation and ToxicChat benchmarks compared to v1.
Source: "Llama Guard 2" section of the Llama 3 Herd of Models paper (arXiv:2407.21783)
Safety Model Comparison: Llama Guard 2 vs ShieldGemma vs Llama Guard 3
There are now several local safety classifiers. Here is how they compare. Note: direct accuracy comparisons are difficult because each model uses different evaluation sets and category definitions.
Quality scores above are relative estimates, not official benchmarks. Direct comparison is difficult: Llama Guard uses MLCommons categories while ShieldGemma uses Google's 4-category taxonomy (Hate, Harassment, Dangerous Content, Sexually Explicit). Llama Guard 3 generally outperforms LG2 on Meta's internal evals. Choose based on your category needs, not headline numbers.
When to Use Each Model
Use Llama Guard 2 When:
- - You need MLCommons S1-S11 category coverage
- - Your system already uses Llama 3 models
- - You need both prompt and response classification
- - You have 8GB+ VRAM available
Consider Alternatives When:
- - Llama Guard 3: Strictly better if you can use Llama 3.1 license
- - ShieldGemma 2B: Need smallest footprint (~1.5GB VRAM)
- - LG3 1B: Edge/mobile deployment under 2GB
- - OpenAI Moderation: Zero setup, API-only, limited categories
VRAM & Quantization Guide
Llama Guard 2 has the same VRAM footprint as any Llama 3 8B model since it shares the architecture. Here are real VRAM requirements by quantization level.
| Quantization | File Size | VRAM Usage | Quality Impact | Recommended For |
|---|---|---|---|---|
| Q4_K_M | ~4.9 GB | ~5.5 GB | Minimal loss | Most users (best balance) |
| Q5_K_M | ~5.7 GB | ~6.3 GB | Negligible loss | If you have 8GB VRAM |
| Q8_0 | ~8.5 GB | ~9.2 GB | Near-lossless | 12GB+ VRAM cards |
| FP16 | ~16 GB | ~17 GB | Full precision | 24GB VRAM (RTX 4090, A5000) |
Installation & Setup via Ollama
Note on Ollama naming: As of 2024, Ollama hosts Llama Guard under the tag llama-guard3 which includes both LG2 and LG3 variants. Check ollama.com/library/llama-guard3 for the latest available tags.
How Llama Guard 2 Classification Works
Llama Guard 2 uses a specific prompt format. You provide the content to classify, and the model outputs either safe or unsafe followed by the violated category code on a new line. This is not a confidence score or JSON response -- it is a simple text classification.
Prompt Format (Advanced)
For production use, Llama Guard 2 expects a structured prompt with the task description and content to classify. The official format from Meta's model card:
[INST] Task: Check if there is unsafe content in
'User' messages in conversations according to
our safety policy with the below categories.
<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violent Crimes. ...
S2: Non-Violent Crimes. ...
...
S11: Sexual Content. ...
<END UNSAFE CONTENT CATEGORIES>
<BEGIN CONVERSATION>
User: {user_message_here}
<END CONVERSATION>
Provide your safety assessment for User
in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated
list of violated categories. [/INST]Integration Patterns
Llama Guard 2 is designed to sit in front of (or after) your main LLM, acting as a safety filter. Here are two common deployment patterns.
Input Filtering (Pre-LLM)
User sends prompt
Llama Guard 2 classifies prompt
If "safe" → forward to main LLM
If "unsafe" → reject with category
Best for: Preventing harmful prompts from reaching your LLM
Output Filtering (Post-LLM)
LLM generates response
Llama Guard 2 classifies response
If "safe" → deliver to user
If "unsafe" → block + regenerate
Best for: Catching harmful LLM outputs before they reach users
Both Input + Output (Recommended)
For production systems, Meta recommends running Llama Guard on both the input prompt AND the output response. This catches both malicious user inputs and cases where the LLM generates harmful content from benign prompts.
Performance note: Running the classifier twice per request adds latency. On a GPU with Q4_K_M quantization, expect roughly 50-200ms per classification depending on input length. This is generally acceptable for most applications.
Limitations & Honest Assessment
Known Limitations
- - Not a chatbot: Cannot answer questions, write code, or have conversations
- - English-focused: Primarily trained on English content; multilingual performance is significantly weaker
- - Context-blind: Classifies individual messages, not full conversation history
- - No confidence scores: Outputs binary safe/unsafe, not a probability
- - Adversarial attacks: Can be bypassed with jailbreak techniques, though harder than general LLMs
- - Superseded: Llama Guard 3 (July 2024) improves on LG2 in most benchmarks
Strengths
- - Standardized taxonomy: MLCommons categories are industry-standard
- - Runs fully local: No data leaves your machine
- - Low VRAM: ~5GB quantized fits consumer GPUs
- - Both prompt + response: Can classify both user inputs and LLM outputs
- - Well-documented: Meta provides clear model card and prompt format
- - Free: Llama 3 Community License (commercial use with conditions)
Should You Use Llama Guard 2 or 3?
If you are starting a new project, use Llama Guard 3 instead. It was released in July 2024, supports the same MLCommons categories plus additional ones (S12: Code Interpreter Abuse, S13: Elections), and shows improved performance on Meta's benchmarks. Llama Guard 3 also comes in a 1B variant for edge deployment.
Llama Guard 2 remains a good choice if you specifically need the April 2024 training cutoff or are already deployed on it and migration is not justified.
Was this helpful?
Sources & References
Official Resources
Frequently Asked Questions
What is Llama Guard 2 8B used for?
Llama Guard 2 8B is a specialized safety classifier for content moderation. It classifies text (user prompts or LLM responses) as "safe" or "unsafe" using the MLCommons AI Safety taxonomy (S1-S11). It is NOT a general-purpose chatbot or language model. Use it as a filter in front of your actual LLM to prevent harmful inputs and outputs.
How much VRAM does Llama Guard 2 need?
At Q4_K_M quantization (recommended), approximately 5.5GB VRAM. At FP16 full precision, about 17GB. CPU-only inference works with 8GB+ system RAM but is significantly slower. Apple Silicon Macs handle it well through Metal acceleration.
Can Llama Guard 2 run offline?
Yes. Once downloaded via Ollama, it runs completely offline with no internet required. This is a key advantage for privacy-sensitive deployments where content cannot be sent to external APIs.
Should I use Llama Guard 2 or Llama Guard 3?
For new projects, use Llama Guard 3 (released July 2024). It covers the same categories plus S12 (Code Interpreter Abuse) and S13 (Elections), and shows improved performance. LG3 also has a 1B variant for edge devices. Llama Guard 2 is still fine if you are already using it and migration cost is not justified.
How does the output format work?
The model outputs a simple text response: either the word "safe" on one line, or "unsafe" on the first line followed by the violated category code(s) on the second line (e.g., "S1" for violent crimes). It does not output JSON, confidence scores, or detailed explanations.
Is Llama Guard 2 good for non-English content?
Llama Guard 2 was primarily trained on English content and performs best in English. While Llama 3 has some multilingual capability, the safety fine-tuning was English-focused. For multilingual content moderation, consider supplementing with language-specific tools or using Llama Guard 3 which has improved multilingual support.
Related Guides
Continue your local AI journey with these comprehensive guides
Related Safety Models
Explore other safety classifiers and AI models you can run locally.
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.