What is WizardVicuna 30B and who created it?

WizardVicuna 30B is a community-created merge model from May 2023, combining WizardLM's Evol-Instruct fine-tune with Vicuna's ShareGPT conversation fine-tune, both based on Meta's original LLaMA 1 30B. It was created by community contributors including Eric Hartford (ehartford). It is NOT an official release from Meta, Microsoft, or any major AI lab.

What are the real benchmark scores for WizardVicuna 30B?

WizardVicuna 30B scores approximately 58-60% on MMLU, ~82% on HellaSwag, and ~62% on ARC-Challenge. These are approximate scores from the Open LLM Leaderboard for LLaMA 1 30B-class fine-tunes. For comparison, modern models like Llama 3.1 8B score ~65% MMLU while being much smaller.

How much VRAM does WizardVicuna 30B need?

VRAM depends on quantization: Q2_K needs ~14GB, Q4_K_M needs ~20GB, Q5_K_M needs ~22GB, Q8_0 needs ~32GB, and FP16 (full precision) needs ~60GB. The Q4_K_M quantization is the best balance of quality and VRAM, fitting on an RTX 3090 or 4090 (24GB).

Can I use WizardVicuna 30B for commercial applications?

No. WizardVicuna 30B is based on Meta's LLaMA 1, which was released under a non-commercial research license. This means you cannot use it for commercial products or services. If you need a commercially-licensed alternative, consider Llama 3.1 (Llama 3.1 Community License), Mistral 7B (Apache 2.0), or Qwen 2.5 (Apache 2.0).

What is the context window of WizardVicuna 30B?

WizardVicuna 30B has a 2,048-token context window, inherited from the LLaMA 1 base model. This is very limiting by 2026 standards — modern models like Llama 3.1 offer 128K tokens and Gemini models offer 1M+ tokens. The 2K window means the model can only process roughly 1,500 words of input at a time.

Is WizardVicuna 30B still worth using in 2026?

For practical use, no. Modern 7B-8B models like Llama 3.1 8B and Mistral 7B outperform WizardVicuna 30B on benchmarks while requiring far less VRAM, offering larger context windows, and having commercial-friendly licenses. WizardVicuna 30B is primarily of historical interest as a pioneering example of model merging techniques.

How is WizardVicuna different from WizardLM or Vicuna individually?

WizardVicuna merges both models' strengths: WizardLM was fine-tuned with Evol-Instruct (synthetic, increasingly complex instructions generated by GPT-4), making it strong at following detailed instructions. Vicuna was fine-tuned on ShareGPT conversations, giving it natural conversational tone. The merge combines instruction-following precision with conversational fluency.

What was the significance of model merging in 2023?

WizardVicuna was one of the first widely-recognized model merges, proving that combining independently fine-tuned models could yield better results than either parent. This required zero additional training compute — just weight averaging. This insight led to tools like mergekit, techniques like SLERP and TIES merging, and the explosion of merged models on the HuggingFace Open LLM Leaderboard.

★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

WizardVicuna 30B:
The Community Merge That Combined Two Training Approaches

WizardVicuna 30B is a community-created merge model from May 2023 that combined WizardLM's Evol-Instruct training with Vicuna's ShareGPT conversation data, both applied to the original LLaMA 1 30B base. Created by community contributors including Eric Hartford (ehartford), it demonstrated how merging complementary fine-tunes could yield a model greater than its parts. While surpassed by modern models, it remains a milestone in the history of open-source LLM experimentation.

30B

Parameters

Context Window

~59%

MMLU Score

Non-Commercial

LLaMA 1 License

What Is WizardVicuna 30B?

A community experiment that combined two fine-tuning approaches on Meta's original LLaMA 1

Origin Story

Community-Created, Not Official

WizardVicuna 30B is not an official release from Meta, Microsoft, or any major AI lab. It was created by community contributors (notably Eric Hartford / ehartford) in May 2023 during the explosion of LLaMA 1 fine-tuning. The model is a merge of two independently fine-tuned versions of LLaMA 1 30B, combining their complementary strengths into a single model.

The base model is Meta's original LLaMA 1 30B (released February 2023), not LLaMA 2. This is important because LLaMA 1 had a 2,048-token context window and was released under a non-commercial research license, which means WizardVicuna 30B inherits these same constraints.

Two Training Approaches Combined

The "WizardVicuna" name tells you exactly what was merged:

WizardLM (Evol-Instruct)

Used "Evol-Instruct" -- a method where an LLM (GPT-4) iteratively rewrites instructions to make them more complex and diverse. This produced a fine-tune especially strong at following complex, multi-step instructions.

Vicuna (ShareGPT)

Fine-tuned on approximately 70K conversations shared by users from ChatGPT (via ShareGPT.com). This produced a model with natural, conversational tone and strong multi-turn dialogue capabilities.

The merge combined WizardLM's structured instruction-following with Vicuna's conversational fluency. The result was a model that could handle both precise technical tasks and natural conversation -- a combination that neither parent fine-tune achieved alone.

Technical Specifications

Base Model

* Parameters: 30 billion
* Base: LLaMA 1 30B (Meta, Feb 2023)
* Architecture: Decoder-only Transformer
* Context Window: 2,048 tokens
* Vocabulary: 32,000 tokens (SentencePiece)

Training Details

* Method: Model merge (weight averaging)
* Source 1: WizardLM (Evol-Instruct data)
* Source 2: Vicuna (ShareGPT conversations)
* No RLHF was used
* Released: May 2023

Deployment

* License: LLaMA 1 (Non-Commercial only)
* Format: GGUF available (community quants)
* Ollama: wizard-vicuna
* Q4_K_M VRAM: ~20GB
* Creator: ehartford / community

The Merge: How WizardVicuna Was Created

Understanding model merging and why it was a breakthrough technique in 2023

Model Merging Explained

What Is Model Merging?

Model merging takes two (or more) fine-tuned models that share the same base architecture and combines their weights -- typically through averaging, SLERP (Spherical Linear Interpolation), or TIES (Trim, Elect Sign & Merge). The key insight is that different fine-tunes learn complementary features, and merging can combine those features without additional training compute.

WizardVicuna 30B used a straightforward weight-averaging approach. Since both WizardLM 30B and Vicuna 30B were fine-tuned from the same LLaMA 1 30B base, their weight spaces were compatible. The resulting merged model inherited instruction-following from WizardLM and conversational naturalness from Vicuna.

Why This Worked

The two parent models were fine-tuned on fundamentally different data distributions: WizardLM emphasized structured, complex instructions (synthetic Evol-Instruct data), while Vicuna emphasized natural, multi-turn conversations (real ShareGPT user dialogues). Because these capabilities occupied largely non-overlapping regions of the weight space, the merge preserved both without significant interference -- a property sometimes called "task arithmetic."

WizardVicuna 30B Merge Components

Base Model:LLaMA 1 30B (Meta, Feb 2023)

Component 1:WizardLM 30B - Evol-Instruct fine-tune

Component 2:Vicuna 30B - ShareGPT conversation fine-tune

Merge Method:Weight averaging

Release Date:May 2023

Creator:ehartford (Eric Hartford) / community

Real Benchmark Scores

Actual benchmark results for LLaMA 1 30B-class fine-tunes from the Open LLM Leaderboard

MMLU Score: WizardVicuna 30B vs. Comparable Local Models

WizardVicuna 30B58.5 MMLU accuracy (%)

58.5

WizardLM 30B57.2 MMLU accuracy (%)

57.2

Vicuna 33B59.2 MMLU accuracy (%)

59.2

Guanaco 33B57.6 MMLU accuracy (%)

57.6

Llama 2 13B Chat53.9 MMLU accuracy (%)

53.9

Benchmark Context

These scores are approximate and sourced from the HuggingFace Open LLM Leaderboard (v1) for LLaMA 1 30B-class fine-tunes. The exact WizardVicuna merge variant tested may vary. For comparison, GPT-3.5 scored ~70% on MMLU and GPT-4 scored ~86%. Modern open models like Llama 3.1 70B score ~79% MMLU -- significantly higher than any LLaMA 1 fine-tune.

Performance Metrics

MMLU

HellaSwag

ARC-Challenge

TruthfulQA

Winogrande

MMLU Score (Approximate)

Fair

Strengths (in 2023 context)

* Combined instruction-following and conversational ability
* Better than either parent model alone on mixed tasks
* Natural, human-like conversational tone from Vicuna data
* Strong at complex instructions from Evol-Instruct training
* Pioneered model merging as a viable technique
* Free to run locally (non-commercial use)
* Available through Ollama for easy deployment

Limitations

* 2,048-token context window (LLaMA 1 limitation)
* Non-commercial license from LLaMA 1
* ~59% MMLU -- below modern 7B models
* 20GB+ VRAM even at Q4 quantization
* No RLHF or safety alignment training
* Surpassed by LLaMA 2, LLaMA 3, Mistral, etc.
* Knowledge cutoff limited to LLaMA 1 training data (pre-2023)

VRAM Requirements by Quantization

How much GPU memory you need for each quantization level of WizardVicuna 30B

Memory Usage Over Time

60GB

45GB

30GB

15GB

0GB

Q2_KQ4_K_MQ5_K_MQ8_0FP16

VRAM by Quantization Level

Quantization	VRAM Required	Quality Loss	Compatible GPUs
Q2_K	~14GB	Significant	RTX 4080 16GB (tight), RTX 3090/4090
Q4_K_M	~20GB	Minimal	RTX 3090 (24GB), RTX 4090 (24GB)
Q5_K_M	~22GB	Very minimal	RTX 3090/4090 (24GB, tight)
Q8_0	~32GB	Negligible	A6000 (48GB), 2x RTX 3090
FP16	~60GB	None	A100 (80GB), 3x RTX 3090

Installation Guide

Run WizardVicuna 30B locally via Ollama

System Requirements

▸

Operating System

Windows 10/11, macOS 12+, Ubuntu 20.04+

▸

RAM

32GB minimum (Q4_K_M), 64GB recommended for larger quants

▸

Storage

20GB (Q4_K_M) to 60GB (FP16)

▸

GPU

Q4_K_M: 20GB VRAM (RTX 3090/4090), FP16: 60GB VRAM (A100/2x RTX 3090)

▸

CPU

8+ cores for CPU-only fallback (very slow at 30B)

Install Ollama

Download and install the Ollama runtime for local model management

$ curl -fsSL https://ollama.com/install.sh | sh

Check Available VRAM

Verify your GPU has enough VRAM for the quantization level you need

$ nvidia-smi --query-gpu=memory.total,memory.free --format=csv

Pull WizardVicuna

Download the default quantized version (Q4_K_M, ~18GB download)

$ ollama run wizard-vicuna

Test the Model

Verify the model loads and responds correctly

$ ollama run wizard-vicuna "Explain the difference between WizardLM and Vicuna in two sentences"

Terminal

$ollama run wizard-vicuna

pulling manifest pulling 8f42bbb47552... 100% |████████████████████████████| 18 GB pulling 4fa551d4f938... 100% |████████████████████████████| 12 KB pulling 8ab4849b038c... 100% |████████████████████████████| 254 B verifying sha256 digest writing manifest removing any unused layers success

$ollama run wizard-vicuna "What is quantum entanglement?"

Quantum entanglement is a phenomenon in quantum mechanics where two or more particles become interconnected such that the quantum state of one particle instantly influences the state of the other, regardless of the distance separating them. This correlation persists even when the particles are separated by large distances, which Einstein famously called "spooky action at a distance."

License Warning

WizardVicuna 30B inherits the LLaMA 1 non-commercial license from its base model. This means it can only be used for research and personal experimentation -- not for commercial products or services. If you need a commercially-licensed 30B+ model, consider Llama 2 70B, Llama 3.1 70B, or Mistral/Mixtral models instead.

Why WizardVicuna Mattered

The historical importance of community model merging

Pioneering Model Merging

WizardVicuna 30B was among the first widely-discussed model merges, demonstrating that you could combine independently fine-tuned models to get something better than either parent. This idea -- that fine-tuning produces modular, composable changes to weight space -- became foundational to later work on model merging techniques like TIES-Merging, DARE, and the entire mergekit ecosystem.

Before WizardVicuna, the standard approach was to fine-tune a base model on a single curated dataset. The success of this merge showed that the open-source community could iterate faster by combining specialist models rather than training from scratch each time. This insight directly led to the explosion of merged models on the HuggingFace Open LLM Leaderboard throughout 2023-2024.

Community-Driven Innovation

WizardVicuna exemplified the open-source AI community's ability to innovate without massive compute budgets. Model merging requires zero additional GPU time -- it's purely a weight-space operation. This democratized model improvement, allowing individual researchers and hobbyists to create competitive models on consumer hardware.

The model was part of a broader wave of LLaMA 1 experimentation that included Alpaca, Vicuna, WizardLM, Guanaco, and many others. Together, these projects proved that fine-tuning and merging could unlock capabilities that the base model couldn't achieve, setting the stage for the explosion of open-source AI development that continues today.

Legacy and Influence

The techniques pioneered by WizardVicuna and similar early merges directly influenced later developments: tools like mergekit (which automates model merging), the SLERP and TIES merge methods, and the entire category of "frankenmerge" models on HuggingFace. By 2024, model merging had become a standard technique in the open-source AI toolkit, with merged models regularly topping the Open LLM Leaderboard.

2026 Assessment and Local AI Alternatives

Honest evaluation: WizardVicuna 30B in today's landscape

Honest 2026 Assessment

WizardVicuna 30B was a historically important community experiment that proved model merging could combine complementary training approaches. However, in 2026, it has been thoroughly surpassed:

* Its ~59% MMLU is below what modern 7B-8B models achieve (Llama 3.1 8B: ~65% MMLU)
* The 2,048-token context window is extremely limiting compared to 128K+ in modern models
* Non-commercial license makes it impractical when Llama 3.1 offers Apache 2.0
* 20GB+ VRAM for a model outperformed by 8B models needing 6GB VRAM
* No safety alignment (no RLHF, no DPO) -- modern models are far safer

Recommendation: Use WizardVicuna 30B only for historical interest or research into early model merging techniques. For any practical task, choose a modern alternative below.

Local AI Alternatives (2026)

Model	MMLU	Context	VRAM (Q4)	License	Why Choose
Llama 3.1 8B	~65%	128K	~6GB	Llama 3.1 (Commercial)	Better quality, 64x more context, 1/3 the VRAM
Mistral 7B	~62%	32K	~5GB	Apache 2.0	Similar quality at 1/4 the size
Llama 3.1 70B	~79%	128K	~40GB	Llama 3.1 (Commercial)	Far better quality if you have the VRAM
Qwen 2.5 32B	~78%	128K	~20GB	Apache 2.0	Same VRAM budget, vastly better quality
Gemma 2 27B	~75%	8K	~17GB	Gemma License (Commercial)	Similar size, much better benchmarks

LLaMA 1 Era Comparison (Historical Context)

Model	Size (Q4)	VRAM	Speed	MMLU	License
WizardVicuna 30B	~18GB (Q4)	20GB VRAM	~15 tok/s	59%	Free (Non-Commercial)
WizardLM 30B	~18GB (Q4)	20GB VRAM	~15 tok/s	57%	Free (Non-Commercial)
Vicuna 33B	~19GB (Q4)	22GB VRAM	~14 tok/s	59%	Free (Non-Commercial)
Guanaco 33B	~19GB (Q4)	22GB VRAM	~14 tok/s	58%	Free (Non-Commercial)
Llama 2 13B Chat	~7.4GB (Q4)	10GB VRAM	~35 tok/s	54%	Free (Commercial OK)

Authoritative Sources and Research

Model and Code

Research Papers

Evaluation and Benchmarks

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 22 courses that take you from reading about AI to building AI.

Explore the Learning Path See pricing

Was this helpful?

🎯

AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Start free Browse courses first

Or own it for life — Lifetime $149 $599, pay once

Training your whole team? Get a team quote →

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

GitHub LinkedIn Twitter

📅 Published: May 15, 2023🔄 Last Updated: March 13, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

View All Local AI Guides

Continue Learning

Explore modern local AI models that have surpassed WizardVicuna 30B:

WizardVicuna 30B: Model Merge Architecture

Diagram showing how WizardVicuna 30B was created by merging WizardLM (Evol-Instruct) and Vicuna (ShareGPT) fine-tunes of the LLaMA 1 30B base model

👤

You

💻

Your ComputerAI Processing

👤

🌐

🏢

Cloud AI: You → Internet → Company Servers

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯