Llama Guard 2 8B: Safety Classifier for Content Moderation

Meta's specialized safety classifier built on Llama 3 8B. Not a general-purpose LLM -- classifies content as safe/unsafe using the MLCommons AI Safety taxonomy (S1-S11). Run locally via Ollama with ~5GB VRAM quantized.

What Is Llama Guard 2?

Type: Safety classifier (NOT general LLM)

Base Model: Llama 3 8B

Parameters: 8 billion

Context Window: 8,192 tokens

Categories: MLCommons S1-S11

License: Llama 3 Community License

Released: April 2024

Install: ollama run llama-guard3:8b

Important: Llama Guard 2 is a classifier, not a chatbot. It takes a prompt or response and outputs "safe" or "unsafe" plus the violated category code. It does not generate conversational text, write code, or answer questions. Use it as a safety filter in front of your actual LLM.

MLCommons AI Safety Taxonomy (S1-S11)

Unlike Llama Guard 1 (which used custom categories), Llama Guard 2 adopts the MLCommons AI Safety taxonomy, a standardized set of hazard categories developed by the industry consortium. This makes it interoperable with other safety tools using the same standard.

Harm Categories (S1-S6)

S1: Violent Crimes

Murder, assault, kidnapping, terrorism, animal cruelty

S2: Non-Violent Crimes

Fraud, theft, hacking, drug trafficking, counterfeiting

S3: Sex-Related Crimes

Sexual assault, trafficking, non-consensual imagery

S4: Child Sexual Exploitation

CSAM, grooming, any child exploitation content

S5: Specialized Advice

Unqualified legal, medical, financial advice that could cause harm

S6: Privacy

Personal data exposure, doxxing, surveillance guidance

Harm Categories (S7-S11)

S7: Intellectual Property

Copyright violation, trademark infringement

S8: Indiscriminate Weapons

Chemical, biological, radiological, nuclear weapons (CBRN)

S9: Hate

Hate speech, discrimination, slurs targeting protected groups

S10: Suicide & Self-Harm

Self-injury instructions, suicide encouragement

S11: Sexual Content

Explicit sexual content (excluding criminal, which is S3/S4)

Source: MLCommons AI Safety v0.5 taxonomy, adopted by Llama Guard 2. See MLCommons announcement.

Llama Guard 1 vs 2: Category Changes

Llama Guard 1 used 6 custom categories (O1-O6). Llama Guard 2 switched to the MLCommons standard (S1-S11), providing broader coverage and industry interoperability. The Llama Guard 2 paper reports improved F1 scores on the OpenAI Moderation and ToxicChat benchmarks compared to v1.

Source: "Llama Guard 2" section of the Llama 3 Herd of Models paper (arXiv:2407.21783)

Safety Model Comparison: Llama Guard 2 vs ShieldGemma vs Llama Guard 3

There are now several local safety classifiers. Here is how they compare. Note: direct accuracy comparisons are difficult because each model uses different evaluation sets and category definitions.

Quality scores above are relative estimates, not official benchmarks. Direct comparison is difficult: Llama Guard uses MLCommons categories while ShieldGemma uses Google's 4-category taxonomy (Hate, Harassment, Dangerous Content, Sexually Explicit). Llama Guard 3 generally outperforms LG2 on Meta's internal evals. Choose based on your category needs, not headline numbers.

When to Use Each Model

Use Llama Guard 2 When:

  • - You need MLCommons S1-S11 category coverage
  • - Your system already uses Llama 3 models
  • - You need both prompt and response classification
  • - You have 8GB+ VRAM available

Consider Alternatives When:

  • - Llama Guard 3: Strictly better if you can use Llama 3.1 license
  • - ShieldGemma 2B: Need smallest footprint (~1.5GB VRAM)
  • - LG3 1B: Edge/mobile deployment under 2GB
  • - OpenAI Moderation: Zero setup, API-only, limited categories

VRAM & Quantization Guide

Llama Guard 2 has the same VRAM footprint as any Llama 3 8B model since it shares the architecture. Here are real VRAM requirements by quantization level.

QuantizationFile SizeVRAM UsageQuality ImpactRecommended For
Q4_K_M~4.9 GB~5.5 GBMinimal lossMost users (best balance)
Q5_K_M~5.7 GB~6.3 GBNegligible lossIf you have 8GB VRAM
Q8_0~8.5 GB~9.2 GBNear-lossless12GB+ VRAM cards
FP16~16 GB~17 GBFull precision24GB VRAM (RTX 4090, A5000)

Installation & Setup via Ollama

Note on Ollama naming: As of 2024, Ollama hosts Llama Guard under the tag llama-guard3 which includes both LG2 and LG3 variants. Check ollama.com/library/llama-guard3 for the latest available tags.

How Llama Guard 2 Classification Works

Llama Guard 2 uses a specific prompt format. You provide the content to classify, and the model outputs either safe or unsafe followed by the violated category code on a new line. This is not a confidence score or JSON response -- it is a simple text classification.

Prompt Format (Advanced)

For production use, Llama Guard 2 expects a structured prompt with the task description and content to classify. The official format from Meta's model card:

[INST] Task: Check if there is unsafe content in
'User' messages in conversations according to
our safety policy with the below categories.

<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violent Crimes. ...
S2: Non-Violent Crimes. ...
...
S11: Sexual Content. ...
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

User: {user_message_here}

<END CONVERSATION>

Provide your safety assessment for User
in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated
  list of violated categories. [/INST]

Source: Meta-Llama-Guard-2-8B model card on HuggingFace

Integration Patterns

Llama Guard 2 is designed to sit in front of (or after) your main LLM, acting as a safety filter. Here are two common deployment patterns.

Input Filtering (Pre-LLM)

1

User sends prompt

2

Llama Guard 2 classifies prompt

3

If "safe" → forward to main LLM

4

If "unsafe" → reject with category

Best for: Preventing harmful prompts from reaching your LLM

Output Filtering (Post-LLM)

1

LLM generates response

2

Llama Guard 2 classifies response

3

If "safe" → deliver to user

4

If "unsafe" → block + regenerate

Best for: Catching harmful LLM outputs before they reach users

Both Input + Output (Recommended)

For production systems, Meta recommends running Llama Guard on both the input prompt AND the output response. This catches both malicious user inputs and cases where the LLM generates harmful content from benign prompts.

Performance note: Running the classifier twice per request adds latency. On a GPU with Q4_K_M quantization, expect roughly 50-200ms per classification depending on input length. This is generally acceptable for most applications.

Limitations & Honest Assessment

Known Limitations

  • - Not a chatbot: Cannot answer questions, write code, or have conversations
  • - English-focused: Primarily trained on English content; multilingual performance is significantly weaker
  • - Context-blind: Classifies individual messages, not full conversation history
  • - No confidence scores: Outputs binary safe/unsafe, not a probability
  • - Adversarial attacks: Can be bypassed with jailbreak techniques, though harder than general LLMs
  • - Superseded: Llama Guard 3 (July 2024) improves on LG2 in most benchmarks

Strengths

  • - Standardized taxonomy: MLCommons categories are industry-standard
  • - Runs fully local: No data leaves your machine
  • - Low VRAM: ~5GB quantized fits consumer GPUs
  • - Both prompt + response: Can classify both user inputs and LLM outputs
  • - Well-documented: Meta provides clear model card and prompt format
  • - Free: Llama 3 Community License (commercial use with conditions)

Should You Use Llama Guard 2 or 3?

If you are starting a new project, use Llama Guard 3 instead. It was released in July 2024, supports the same MLCommons categories plus additional ones (S12: Code Interpreter Abuse, S13: Elections), and shows improved performance on Meta's benchmarks. Llama Guard 3 also comes in a 1B variant for edge deployment.

Llama Guard 2 remains a good choice if you specifically need the April 2024 training cutoff or are already deployed on it and migration is not justified.

78
Safety Classifier Utility (specialized model)
Good

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

Sources & References

Frequently Asked Questions

What is Llama Guard 2 8B used for?

Llama Guard 2 8B is a specialized safety classifier for content moderation. It classifies text (user prompts or LLM responses) as "safe" or "unsafe" using the MLCommons AI Safety taxonomy (S1-S11). It is NOT a general-purpose chatbot or language model. Use it as a filter in front of your actual LLM to prevent harmful inputs and outputs.

How much VRAM does Llama Guard 2 need?

At Q4_K_M quantization (recommended), approximately 5.5GB VRAM. At FP16 full precision, about 17GB. CPU-only inference works with 8GB+ system RAM but is significantly slower. Apple Silicon Macs handle it well through Metal acceleration.

Can Llama Guard 2 run offline?

Yes. Once downloaded via Ollama, it runs completely offline with no internet required. This is a key advantage for privacy-sensitive deployments where content cannot be sent to external APIs.

Should I use Llama Guard 2 or Llama Guard 3?

For new projects, use Llama Guard 3 (released July 2024). It covers the same categories plus S12 (Code Interpreter Abuse) and S13 (Elections), and shows improved performance. LG3 also has a 1B variant for edge devices. Llama Guard 2 is still fine if you are already using it and migration cost is not justified.

How does the output format work?

The model outputs a simple text response: either the word "safe" on one line, or "unsafe" on the first line followed by the violated category code(s) on the second line (e.g., "S1" for violent crimes). It does not output JSON, confidence scores, or detailed explanations.

Is Llama Guard 2 good for non-English content?

Llama Guard 2 was primarily trained on English content and performs best in English. While Llama 3 has some multilingual capability, the safety fine-tuning was English-focused. For multilingual content moderation, consider supplementing with language-specific tools or using Llama Guard 3 which has improved multilingual support.

Reading now
Join the discussion

Related Guides

Continue your local AI journey with these comprehensive guides

Related Safety Models

Explore other safety classifiers and AI models you can run locally.

📅 Published: 2024-04-18🔄 Last Updated: 2026-03-16✓ Manually Reviewed
PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
Free Tools & Calculators