★ Reading this for free? Get 17 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

WizardVicuna 30B:
The Community Merge That Combined Two Training Approaches

WizardVicuna 30B is a community-created merge model from May 2023 that combined WizardLM's Evol-Instruct training with Vicuna's ShareGPT conversation data, both applied to the original LLaMA 1 30B base. Created by community contributors including Eric Hartford (ehartford), it demonstrated how merging complementary fine-tunes could yield a model greater than its parts. While surpassed by modern models, it remains a milestone in the history of open-source LLM experimentation.

30B
Parameters
2K
Context Window
~59%
MMLU Score
Non-Commercial
LLaMA 1 License

What Is WizardVicuna 30B?

A community experiment that combined two fine-tuning approaches on Meta's original LLaMA 1

Origin Story

Community-Created, Not Official

WizardVicuna 30B is not an official release from Meta, Microsoft, or any major AI lab. It was created by community contributors (notably Eric Hartford / ehartford) in May 2023 during the explosion of LLaMA 1 fine-tuning. The model is a merge of two independently fine-tuned versions of LLaMA 1 30B, combining their complementary strengths into a single model.

The base model is Meta's original LLaMA 1 30B (released February 2023), not LLaMA 2. This is important because LLaMA 1 had a 2,048-token context window and was released under a non-commercial research license, which means WizardVicuna 30B inherits these same constraints.

Two Training Approaches Combined

The "WizardVicuna" name tells you exactly what was merged:

WizardLM (Evol-Instruct)

Used "Evol-Instruct" -- a method where an LLM (GPT-4) iteratively rewrites instructions to make them more complex and diverse. This produced a fine-tune especially strong at following complex, multi-step instructions.

Vicuna (ShareGPT)

Fine-tuned on approximately 70K conversations shared by users from ChatGPT (via ShareGPT.com). This produced a model with natural, conversational tone and strong multi-turn dialogue capabilities.

The merge combined WizardLM's structured instruction-following with Vicuna's conversational fluency. The result was a model that could handle both precise technical tasks and natural conversation -- a combination that neither parent fine-tune achieved alone.

Technical Specifications

Base Model

  • * Parameters: 30 billion
  • * Base: LLaMA 1 30B (Meta, Feb 2023)
  • * Architecture: Decoder-only Transformer
  • * Context Window: 2,048 tokens
  • * Vocabulary: 32,000 tokens (SentencePiece)

Training Details

  • * Method: Model merge (weight averaging)
  • * Source 1: WizardLM (Evol-Instruct data)
  • * Source 2: Vicuna (ShareGPT conversations)
  • * No RLHF was used
  • * Released: May 2023

Deployment

  • * License: LLaMA 1 (Non-Commercial only)
  • * Format: GGUF available (community quants)
  • * Ollama: wizard-vicuna
  • * Q4_K_M VRAM: ~20GB
  • * Creator: ehartford / community

The Merge: How WizardVicuna Was Created

Understanding model merging and why it was a breakthrough technique in 2023

Model Merging Explained

What Is Model Merging?

Model merging takes two (or more) fine-tuned models that share the same base architecture and combines their weights -- typically through averaging, SLERP (Spherical Linear Interpolation), or TIES (Trim, Elect Sign & Merge). The key insight is that different fine-tunes learn complementary features, and merging can combine those features without additional training compute.

WizardVicuna 30B used a straightforward weight-averaging approach. Since both WizardLM 30B and Vicuna 30B were fine-tuned from the same LLaMA 1 30B base, their weight spaces were compatible. The resulting merged model inherited instruction-following from WizardLM and conversational naturalness from Vicuna.

Why This Worked

The two parent models were fine-tuned on fundamentally different data distributions: WizardLM emphasized structured, complex instructions (synthetic Evol-Instruct data), while Vicuna emphasized natural, multi-turn conversations (real ShareGPT user dialogues). Because these capabilities occupied largely non-overlapping regions of the weight space, the merge preserved both without significant interference -- a property sometimes called "task arithmetic."

WizardVicuna 30B Merge Components

Base Model:LLaMA 1 30B (Meta, Feb 2023)
Component 1:WizardLM 30B - Evol-Instruct fine-tune
Component 2:Vicuna 30B - ShareGPT conversation fine-tune
Merge Method:Weight averaging
Release Date:May 2023
Creator:ehartford (Eric Hartford) / community

Real Benchmark Scores

Actual benchmark results for LLaMA 1 30B-class fine-tunes from the Open LLM Leaderboard

MMLU Score: WizardVicuna 30B vs. Comparable Local Models

WizardVicuna 30B58.5 MMLU accuracy (%)
58.5
WizardLM 30B57.2 MMLU accuracy (%)
57.2
Vicuna 33B59.2 MMLU accuracy (%)
59.2
Guanaco 33B57.6 MMLU accuracy (%)
57.6
Llama 2 13B Chat53.9 MMLU accuracy (%)
53.9

Benchmark Context

These scores are approximate and sourced from the HuggingFace Open LLM Leaderboard (v1) for LLaMA 1 30B-class fine-tunes. The exact WizardVicuna merge variant tested may vary. For comparison, GPT-3.5 scored ~70% on MMLU and GPT-4 scored ~86%. Modern open models like Llama 3.1 70B score ~79% MMLU -- significantly higher than any LLaMA 1 fine-tune.

Performance Metrics

MMLU
59
HellaSwag
82
ARC-Challenge
62
TruthfulQA
50
Winogrande
76
59
MMLU Score (Approximate)
Fair

Strengths (in 2023 context)

  • * Combined instruction-following and conversational ability
  • * Better than either parent model alone on mixed tasks
  • * Natural, human-like conversational tone from Vicuna data
  • * Strong at complex instructions from Evol-Instruct training
  • * Pioneered model merging as a viable technique
  • * Free to run locally (non-commercial use)
  • * Available through Ollama for easy deployment

Limitations

  • * 2,048-token context window (LLaMA 1 limitation)
  • * Non-commercial license from LLaMA 1
  • * ~59% MMLU -- below modern 7B models
  • * 20GB+ VRAM even at Q4 quantization
  • * No RLHF or safety alignment training
  • * Surpassed by LLaMA 2, LLaMA 3, Mistral, etc.
  • * Knowledge cutoff limited to LLaMA 1 training data (pre-2023)

VRAM Requirements by Quantization

How much GPU memory you need for each quantization level of WizardVicuna 30B

Memory Usage Over Time

60GB
45GB
30GB
15GB
0GB
Q2_KQ4_K_MQ5_K_MQ8_0FP16

VRAM by Quantization Level

QuantizationVRAM RequiredQuality LossCompatible GPUs
Q2_K~14GBSignificantRTX 4080 16GB (tight), RTX 3090/4090
Q4_K_M~20GBMinimalRTX 3090 (24GB), RTX 4090 (24GB)
Q5_K_M~22GBVery minimalRTX 3090/4090 (24GB, tight)
Q8_0~32GBNegligibleA6000 (48GB), 2x RTX 3090
FP16~60GBNoneA100 (80GB), 3x RTX 3090

Installation Guide

Run WizardVicuna 30B locally via Ollama

System Requirements

Operating System
Windows 10/11, macOS 12+, Ubuntu 20.04+
RAM
32GB minimum (Q4_K_M), 64GB recommended for larger quants
Storage
20GB (Q4_K_M) to 60GB (FP16)
GPU
Q4_K_M: 20GB VRAM (RTX 3090/4090), FP16: 60GB VRAM (A100/2x RTX 3090)
CPU
8+ cores for CPU-only fallback (very slow at 30B)
1

Install Ollama

Download and install the Ollama runtime for local model management

$ curl -fsSL https://ollama.com/install.sh | sh
2

Check Available VRAM

Verify your GPU has enough VRAM for the quantization level you need

$ nvidia-smi --query-gpu=memory.total,memory.free --format=csv
3

Pull WizardVicuna

Download the default quantized version (Q4_K_M, ~18GB download)

$ ollama run wizard-vicuna
4

Test the Model

Verify the model loads and responds correctly

$ ollama run wizard-vicuna "Explain the difference between WizardLM and Vicuna in two sentences"
Terminal
$ollama run wizard-vicuna
pulling manifest pulling 8f42bbb47552... 100% |████████████████████████████| 18 GB pulling 4fa551d4f938... 100% |████████████████████████████| 12 KB pulling 8ab4849b038c... 100% |████████████████████████████| 254 B verifying sha256 digest writing manifest removing any unused layers success
$ollama run wizard-vicuna "What is quantum entanglement?"
Quantum entanglement is a phenomenon in quantum mechanics where two or more particles become interconnected such that the quantum state of one particle instantly influences the state of the other, regardless of the distance separating them. This correlation persists even when the particles are separated by large distances, which Einstein famously called "spooky action at a distance."
$_

License Warning

WizardVicuna 30B inherits the LLaMA 1 non-commercial license from its base model. This means it can only be used for research and personal experimentation -- not for commercial products or services. If you need a commercially-licensed 30B+ model, consider Llama 2 70B, Llama 3.1 70B, or Mistral/Mixtral models instead.

Why WizardVicuna Mattered

The historical importance of community model merging

Pioneering Model Merging

WizardVicuna 30B was among the first widely-discussed model merges, demonstrating that you could combine independently fine-tuned models to get something better than either parent. This idea -- that fine-tuning produces modular, composable changes to weight space -- became foundational to later work on model merging techniques like TIES-Merging, DARE, and the entire mergekit ecosystem.

Before WizardVicuna, the standard approach was to fine-tune a base model on a single curated dataset. The success of this merge showed that the open-source community could iterate faster by combining specialist models rather than training from scratch each time. This insight directly led to the explosion of merged models on the HuggingFace Open LLM Leaderboard throughout 2023-2024.

Community-Driven Innovation

WizardVicuna exemplified the open-source AI community's ability to innovate without massive compute budgets. Model merging requires zero additional GPU time -- it's purely a weight-space operation. This democratized model improvement, allowing individual researchers and hobbyists to create competitive models on consumer hardware.

The model was part of a broader wave of LLaMA 1 experimentation that included Alpaca, Vicuna, WizardLM, Guanaco, and many others. Together, these projects proved that fine-tuning and merging could unlock capabilities that the base model couldn't achieve, setting the stage for the explosion of open-source AI development that continues today.

Legacy and Influence

The techniques pioneered by WizardVicuna and similar early merges directly influenced later developments: tools like mergekit (which automates model merging), the SLERP and TIES merge methods, and the entire category of "frankenmerge" models on HuggingFace. By 2024, model merging had become a standard technique in the open-source AI toolkit, with merged models regularly topping the Open LLM Leaderboard.

2026 Assessment and Local AI Alternatives

Honest evaluation: WizardVicuna 30B in today's landscape

Honest 2026 Assessment

WizardVicuna 30B was a historically important community experiment that proved model merging could combine complementary training approaches. However, in 2026, it has been thoroughly surpassed:

  • * Its ~59% MMLU is below what modern 7B-8B models achieve (Llama 3.1 8B: ~65% MMLU)
  • * The 2,048-token context window is extremely limiting compared to 128K+ in modern models
  • * Non-commercial license makes it impractical when Llama 3.1 offers Apache 2.0
  • * 20GB+ VRAM for a model outperformed by 8B models needing 6GB VRAM
  • * No safety alignment (no RLHF, no DPO) -- modern models are far safer

Recommendation: Use WizardVicuna 30B only for historical interest or research into early model merging techniques. For any practical task, choose a modern alternative below.

Local AI Alternatives (2026)

ModelMMLUContextVRAM (Q4)LicenseWhy Choose
Llama 3.1 8B~65%128K~6GBLlama 3.1 (Commercial)Better quality, 64x more context, 1/3 the VRAM
Mistral 7B~62%32K~5GBApache 2.0Similar quality at 1/4 the size
Llama 3.1 70B~79%128K~40GBLlama 3.1 (Commercial)Far better quality if you have the VRAM
Qwen 2.5 32B~78%128K~20GBApache 2.0Same VRAM budget, vastly better quality
Gemma 2 27B~75%8K~17GBGemma License (Commercial)Similar size, much better benchmarks

LLaMA 1 Era Comparison (Historical Context)

ModelSize (Q4)VRAMSpeedMMLULicense
WizardVicuna 30B~18GB (Q4)20GB VRAM~15 tok/s59%Free (Non-Commercial)
WizardLM 30B~18GB (Q4)20GB VRAM~15 tok/s57%Free (Non-Commercial)
Vicuna 33B~19GB (Q4)22GB VRAM~14 tok/s59%Free (Non-Commercial)
Guanaco 33B~19GB (Q4)22GB VRAM~14 tok/s58%Free (Non-Commercial)
Llama 2 13B Chat~7.4GB (Q4)10GB VRAM~35 tok/s54%Free (Commercial OK)

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 17 courses that take you from reading about AI to building AI.

Was this helpful?

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

PR

Written by Pattanaik Ramswarup

Creator of Local AI Master

I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📅 Published: May 15, 2023🔄 Last Updated: March 13, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

WizardVicuna 30B: Model Merge Architecture

Diagram showing how WizardVicuna 30B was created by merging WizardLM (Evol-Instruct) and Vicuna (ShareGPT) fine-tunes of the LLaMA 1 30B base model

👤
You
💻
Your ComputerAI Processing
👤
🌐
🏢
Cloud AI: You → Internet → Company Servers
More on AI Models Directory
See the full AI Models Directory guide.
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators