WizardVicuna 30B:
The Community Merge That Combined Two Training Approaches
WizardVicuna 30B is a community-created merge model from May 2023 that combined WizardLM's Evol-Instruct training with Vicuna's ShareGPT conversation data, both applied to the original LLaMA 1 30B base. Created by community contributors including Eric Hartford (ehartford), it demonstrated how merging complementary fine-tunes could yield a model greater than its parts. While surpassed by modern models, it remains a milestone in the history of open-source LLM experimentation.
What Is WizardVicuna 30B?
A community experiment that combined two fine-tuning approaches on Meta's original LLaMA 1
Origin Story
Community-Created, Not Official
WizardVicuna 30B is not an official release from Meta, Microsoft, or any major AI lab. It was created by community contributors (notably Eric Hartford / ehartford) in May 2023 during the explosion of LLaMA 1 fine-tuning. The model is a merge of two independently fine-tuned versions of LLaMA 1 30B, combining their complementary strengths into a single model.
The base model is Meta's original LLaMA 1 30B (released February 2023), not LLaMA 2. This is important because LLaMA 1 had a 2,048-token context window and was released under a non-commercial research license, which means WizardVicuna 30B inherits these same constraints.
Two Training Approaches Combined
The "WizardVicuna" name tells you exactly what was merged:
WizardLM (Evol-Instruct)
Used "Evol-Instruct" -- a method where an LLM (GPT-4) iteratively rewrites instructions to make them more complex and diverse. This produced a fine-tune especially strong at following complex, multi-step instructions.
Vicuna (ShareGPT)
Fine-tuned on approximately 70K conversations shared by users from ChatGPT (via ShareGPT.com). This produced a model with natural, conversational tone and strong multi-turn dialogue capabilities.
The merge combined WizardLM's structured instruction-following with Vicuna's conversational fluency. The result was a model that could handle both precise technical tasks and natural conversation -- a combination that neither parent fine-tune achieved alone.
Technical Specifications
Base Model
- * Parameters: 30 billion
- * Base: LLaMA 1 30B (Meta, Feb 2023)
- * Architecture: Decoder-only Transformer
- * Context Window: 2,048 tokens
- * Vocabulary: 32,000 tokens (SentencePiece)
Training Details
- * Method: Model merge (weight averaging)
- * Source 1: WizardLM (Evol-Instruct data)
- * Source 2: Vicuna (ShareGPT conversations)
- * No RLHF was used
- * Released: May 2023
Deployment
- * License: LLaMA 1 (Non-Commercial only)
- * Format: GGUF available (community quants)
- * Ollama:
wizard-vicuna - * Q4_K_M VRAM: ~20GB
- * Creator: ehartford / community
The Merge: How WizardVicuna Was Created
Understanding model merging and why it was a breakthrough technique in 2023
Model Merging Explained
What Is Model Merging?
Model merging takes two (or more) fine-tuned models that share the same base architecture and combines their weights -- typically through averaging, SLERP (Spherical Linear Interpolation), or TIES (Trim, Elect Sign & Merge). The key insight is that different fine-tunes learn complementary features, and merging can combine those features without additional training compute.
WizardVicuna 30B used a straightforward weight-averaging approach. Since both WizardLM 30B and Vicuna 30B were fine-tuned from the same LLaMA 1 30B base, their weight spaces were compatible. The resulting merged model inherited instruction-following from WizardLM and conversational naturalness from Vicuna.
Why This Worked
The two parent models were fine-tuned on fundamentally different data distributions: WizardLM emphasized structured, complex instructions (synthetic Evol-Instruct data), while Vicuna emphasized natural, multi-turn conversations (real ShareGPT user dialogues). Because these capabilities occupied largely non-overlapping regions of the weight space, the merge preserved both without significant interference -- a property sometimes called "task arithmetic."
WizardVicuna 30B Merge Components
Real Benchmark Scores
Actual benchmark results for LLaMA 1 30B-class fine-tunes from the Open LLM Leaderboard
MMLU Score: WizardVicuna 30B vs. Comparable Local Models
Benchmark Context
These scores are approximate and sourced from the HuggingFace Open LLM Leaderboard (v1) for LLaMA 1 30B-class fine-tunes. The exact WizardVicuna merge variant tested may vary. For comparison, GPT-3.5 scored ~70% on MMLU and GPT-4 scored ~86%. Modern open models like Llama 3.1 70B score ~79% MMLU -- significantly higher than any LLaMA 1 fine-tune.
Performance Metrics
Strengths (in 2023 context)
- * Combined instruction-following and conversational ability
- * Better than either parent model alone on mixed tasks
- * Natural, human-like conversational tone from Vicuna data
- * Strong at complex instructions from Evol-Instruct training
- * Pioneered model merging as a viable technique
- * Free to run locally (non-commercial use)
- * Available through Ollama for easy deployment
Limitations
- * 2,048-token context window (LLaMA 1 limitation)
- * Non-commercial license from LLaMA 1
- * ~59% MMLU -- below modern 7B models
- * 20GB+ VRAM even at Q4 quantization
- * No RLHF or safety alignment training
- * Surpassed by LLaMA 2, LLaMA 3, Mistral, etc.
- * Knowledge cutoff limited to LLaMA 1 training data (pre-2023)
VRAM Requirements by Quantization
How much GPU memory you need for each quantization level of WizardVicuna 30B
Memory Usage Over Time
VRAM by Quantization Level
| Quantization | VRAM Required | Quality Loss | Compatible GPUs |
|---|---|---|---|
| Q2_K | ~14GB | Significant | RTX 4080 16GB (tight), RTX 3090/4090 |
| Q4_K_M | ~20GB | Minimal | RTX 3090 (24GB), RTX 4090 (24GB) |
| Q5_K_M | ~22GB | Very minimal | RTX 3090/4090 (24GB, tight) |
| Q8_0 | ~32GB | Negligible | A6000 (48GB), 2x RTX 3090 |
| FP16 | ~60GB | None | A100 (80GB), 3x RTX 3090 |
Installation Guide
Run WizardVicuna 30B locally via Ollama
System Requirements
Install Ollama
Download and install the Ollama runtime for local model management
Check Available VRAM
Verify your GPU has enough VRAM for the quantization level you need
Pull WizardVicuna
Download the default quantized version (Q4_K_M, ~18GB download)
Test the Model
Verify the model loads and responds correctly
License Warning
WizardVicuna 30B inherits the LLaMA 1 non-commercial license from its base model. This means it can only be used for research and personal experimentation -- not for commercial products or services. If you need a commercially-licensed 30B+ model, consider Llama 2 70B, Llama 3.1 70B, or Mistral/Mixtral models instead.
Why WizardVicuna Mattered
The historical importance of community model merging
Pioneering Model Merging
WizardVicuna 30B was among the first widely-discussed model merges, demonstrating that you could combine independently fine-tuned models to get something better than either parent. This idea -- that fine-tuning produces modular, composable changes to weight space -- became foundational to later work on model merging techniques like TIES-Merging, DARE, and the entire mergekit ecosystem.
Before WizardVicuna, the standard approach was to fine-tune a base model on a single curated dataset. The success of this merge showed that the open-source community could iterate faster by combining specialist models rather than training from scratch each time. This insight directly led to the explosion of merged models on the HuggingFace Open LLM Leaderboard throughout 2023-2024.
Community-Driven Innovation
WizardVicuna exemplified the open-source AI community's ability to innovate without massive compute budgets. Model merging requires zero additional GPU time -- it's purely a weight-space operation. This democratized model improvement, allowing individual researchers and hobbyists to create competitive models on consumer hardware.
The model was part of a broader wave of LLaMA 1 experimentation that included Alpaca, Vicuna, WizardLM, Guanaco, and many others. Together, these projects proved that fine-tuning and merging could unlock capabilities that the base model couldn't achieve, setting the stage for the explosion of open-source AI development that continues today.
Legacy and Influence
The techniques pioneered by WizardVicuna and similar early merges directly influenced later developments: tools like mergekit (which automates model merging), the SLERP and TIES merge methods, and the entire category of "frankenmerge" models on HuggingFace. By 2024, model merging had become a standard technique in the open-source AI toolkit, with merged models regularly topping the Open LLM Leaderboard.
2026 Assessment and Local AI Alternatives
Honest evaluation: WizardVicuna 30B in today's landscape
Honest 2026 Assessment
WizardVicuna 30B was a historically important community experiment that proved model merging could combine complementary training approaches. However, in 2026, it has been thoroughly surpassed:
- * Its ~59% MMLU is below what modern 7B-8B models achieve (Llama 3.1 8B: ~65% MMLU)
- * The 2,048-token context window is extremely limiting compared to 128K+ in modern models
- * Non-commercial license makes it impractical when Llama 3.1 offers Apache 2.0
- * 20GB+ VRAM for a model outperformed by 8B models needing 6GB VRAM
- * No safety alignment (no RLHF, no DPO) -- modern models are far safer
Recommendation: Use WizardVicuna 30B only for historical interest or research into early model merging techniques. For any practical task, choose a modern alternative below.
Local AI Alternatives (2026)
| Model | MMLU | Context | VRAM (Q4) | License | Why Choose |
|---|---|---|---|---|---|
| Llama 3.1 8B | ~65% | 128K | ~6GB | Llama 3.1 (Commercial) | Better quality, 64x more context, 1/3 the VRAM |
| Mistral 7B | ~62% | 32K | ~5GB | Apache 2.0 | Similar quality at 1/4 the size |
| Llama 3.1 70B | ~79% | 128K | ~40GB | Llama 3.1 (Commercial) | Far better quality if you have the VRAM |
| Qwen 2.5 32B | ~78% | 128K | ~20GB | Apache 2.0 | Same VRAM budget, vastly better quality |
| Gemma 2 27B | ~75% | 8K | ~17GB | Gemma License (Commercial) | Similar size, much better benchmarks |
LLaMA 1 Era Comparison (Historical Context)
| Model | Size (Q4) | VRAM | Speed | MMLU | License |
|---|---|---|---|---|---|
| WizardVicuna 30B | ~18GB (Q4) | 20GB VRAM | ~15 tok/s | 59% | Free (Non-Commercial) |
| WizardLM 30B | ~18GB (Q4) | 20GB VRAM | ~15 tok/s | 57% | Free (Non-Commercial) |
| Vicuna 33B | ~19GB (Q4) | 22GB VRAM | ~14 tok/s | 59% | Free (Non-Commercial) |
| Guanaco 33B | ~19GB (Q4) | 22GB VRAM | ~14 tok/s | 58% | Free (Non-Commercial) |
| Llama 2 13B Chat | ~7.4GB (Q4) | 10GB VRAM | ~35 tok/s | 54% | Free (Commercial OK) |
Authoritative Sources and Research
Model and Code
Research Papers
Evaluation and Benchmarks
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 17 courses that take you from reading about AI to building AI.
Was this helpful?
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Written by Pattanaik Ramswarup
Creator of Local AI Master
I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.
Related Guides
Continue your local AI journey with these comprehensive guides
Continue Learning
Explore modern local AI models that have surpassed WizardVicuna 30B:
WizardVicuna 30B: Model Merge Architecture
Diagram showing how WizardVicuna 30B was created by merging WizardLM (Evol-Instruct) and Vicuna (ShareGPT) fine-tunes of the LLaMA 1 30B base model
- PILLARAI Models Directory: 160+ LLMs with Ollama Commands (March 2026)
- Alpaca 7B: Stanford\
- Amazon Chronos: Time Series Forecasting Models (Complete Guide)
- Aquila 7B by BAAI: Chinese-English Bilingual (FlagAI)
- Baichuan2-13B: Chinese LLM | 59% CMMLU, Bilingual, Free License 2026
- Bark by Suno AI: Open-Source Text-to-Audio Generation Guide
- ChatGLM3-6B: Tsinghua Chinese AI | Code Interpreter, 6GB RAM 2026
- Claude 3 Opus Review: Benchmarks, Pricing & API Guide 2026
- Claude 3 Sonnet Review: Benchmarks, API Pricing & Alternatives 2026
- Claude Opus 4 by Anthropic: API Guide & Benchmarks (2026)
Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide
No spam. Unsubscribe with one click.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.