Vicuna-33B: LMSys ShareGPT Model
Updated: March 16, 2026
LLaMA 1 33B fine-tuned on 70K ShareGPT conversations by LMSys. A historically significant model that pioneered LLM-as-judge evaluation and the ChatGPT Arena.
What Is Vicuna-33B?
Vicuna-33B v1.3 is a 33-billion parameter language model created by LMSys (Large Model Systems Organization) — a collaboration between UC Berkeley, CMU, Stanford, and UC San Diego. Released in June 2023, it was fine-tuned from Meta's LLaMA 1 33B on approximately 70,000 user conversations collected from ShareGPT.
Technical Specs
Key Facts
License Warning: Many sites incorrectly claim Vicuna-33B is “Apache 2.0.” While the Vicuna delta weights are Apache 2.0, the model requires LLaMA 1 base weights which are under Meta's non-commercial research license. Commercial use is not permitted. Vicuna v1.5 (7B/13B only) uses LLaMA 2 with a more permissive license, but no v1.5 33B exists.
Historical Significance
Vicuna is one of the most historically important open-source LLMs. It contributed two innovations that became industry standards:
1. ShareGPT Training Data
Vicuna was one of the first models to demonstrate that fine-tuning on real user conversations(collected from ShareGPT, where users shared their ChatGPT conversations) produced models that felt significantly more natural and helpful than those trained on synthetic instruction data. This approach influenced the entire field.
2. LLM-as-Judge Evaluation
The Vicuna team pioneered using GPT-4 as an automated evaluator — comparing model outputs head-to-head and having GPT-4 score them. This “LLM-as-judge” approach became the standard evaluation methodology across the industry. The accompanying paper is “Judging LLM-as-a-Judge” (arXiv:2306.05685).
Chatbot Arena Legacy
LMSys also created the Chatbot Arena — a crowdsourced platform where users compare model outputs in blind A/B tests. Vicuna was a founding model on this platform. The Arena's Elo-based rankings remain one of the most trusted LLM evaluation benchmarks in 2026.
Real Benchmarks
MMLU Comparison
Approximate scores from the Open LLM Leaderboard. Note how modern 7B models now outperform Vicuna 33B.
MMLU Score (%)
MMLU
Multi-task language understanding
HellaSwag
Commonsense reasoning
ARC-Challenge
Science question answering
TruthfulQA
Factual accuracy
WinoGrande
Commonsense coreference
HumanEval
Code generation (not code-focused model)
Note: Exact benchmark numbers vary by evaluation setup. Scores shown are approximate from the Open LLM Leaderboard and community evaluations. Vicuna's original evaluation used GPT-4-as-judge (not standardized benchmarks), reporting ~90% of ChatGPT quality in conversational tasks.
VRAM & Quantization Guide
Vicuna-33B is a large model. Quantization is essential for consumer hardware. GGUF files are available from community contributors on HuggingFace.
| Quantization | File Size | VRAM Needed | RAM (CPU) | Quality | GPU Compatibility |
|---|---|---|---|---|---|
| Q2_K | ~13GB | ~14GB | ~16GB | Noticeable degradation | RTX 4070 Ti (16GB) |
| Q3_K_M | ~16GB | ~17GB | ~20GB | Acceptable | RTX 3090/4090 (24GB) |
| Q4_K_M | ~20GB | ~21GB | ~24GB | Good (recommended) | RTX 3090/4090 (24GB) |
| Q5_K_M | ~24GB | ~25GB | ~28GB | Very good | 2x RTX 3090 or A6000 |
| Q8_0 | ~35GB | ~36GB | ~40GB | Near-lossless | A6000 (48GB) |
| FP16 | ~66GB | ~68GB | ~72GB | Full precision | A100 80GB / 2x A6000 |
Hardware Recommendations
System Requirements
Installation (llama.cpp)
Vicuna-33B is NOT on Ollama
Ollama offers vicuna in 7B and 13B sizes only. The 33B model is not in the Ollama registry. Use llama.cpp or text-generation-webui instead.
Using llama.cpp
Note: Vicuna uses a specific chat template. The prompt format is: “A chat between a curious user and an assistant. USER: [message] ASSISTANT:” — getting this wrong significantly degrades output quality.
Using Python (Transformers)
Requires ~20GB VRAM with 4-bit quantization. Full FP16 requires ~66GB VRAM (A100 or multi-GPU setup).
Vicuna-33B vs Modern Alternatives
| Model | Size | RAM Required | Speed | Quality | Cost/Month |
|---|---|---|---|---|---|
| Vicuna-33B v1.3 | 33B | 20-66GB | Slow | 59% | Non-commercial |
| Llama 3.1 8B | 8B | 5-16GB | Fast | 68% | Meta Community |
| Mistral 7B v0.3 | 7B | 4-14GB | Fast | 63% | Apache 2.0 |
| Qwen 2.5 7B | 7B | 4-14GB | Fast | 74% | Apache 2.0 |
| Vicuna-13B v1.5 | 13B | 8-26GB | Medium | 55% | Llama 2 Community |
Quality = approximate MMLU score. Modern 7B models outperform Vicuna-33B while using 3-5x less resources.
Should You Use Vicuna-33B in 2026?
Reasons NOT to Use It
- Outperformed by smaller models: Llama 3.1 8B beats it on MMLU while using 4x less resources
- Non-commercial license: LLaMA 1 base restricts commercial use
- Small context window: Only 2048 tokens vs 8K-128K in modern models
- Not on Ollama: No easy one-command setup; requires llama.cpp knowledge
- No safety training: No RLHF or constitutional AI alignment
- Weak at code: ~15-20% HumanEval vs 60%+ for modern code models
Reasons You Might Still Want It
- Research / Historical study: Understanding the evolution of open-source LLMs
- Benchmark comparison: As a baseline when evaluating newer models
- Uncensored output: Less content filtering than modern models (for research)
- ShareGPT conversation style: Natural conversational feel due to training data source
Recommendation
For new projects in 2026, use Qwen 2.5 7B, Llama 3.1 8B, or Mistral 7B instead. They're faster, more capable, have permissive licenses, larger context windows, and run on Ollama with a single command. Vicuna's legacy is in the innovations it brought to the field, not in its ongoing competitiveness.
Frequently Asked Questions
Is Vicuna-33B available on Ollama?
No. Ollama offers Vicuna in 7B and 13B sizes only. Vicuna-33B is not in the Ollama library because it's based on LLaMA 1 (which has a non-commercial license) and has been superseded by newer models. To run Vicuna-33B locally, use llama.cpp with a GGUF conversion from HuggingFace.
Can I use Vicuna-33B commercially?
No. Vicuna-33B is fine-tuned from LLaMA 1 33B, which was released under Meta's original LLaMA license that restricts commercial use. The Vicuna delta weights are Apache 2.0, but you need the LLaMA 1 base weights (non-commercial) to use the model. For commercial use, consider LLaMA 3 models or Mistral, which have permissive licenses.
What made Vicuna important historically?
Vicuna was one of the first open models to approach GPT-3.5 quality in conversations. Created by LMSys (UC Berkeley, CMU, Stanford, UCSD), it pioneered two important concepts: (1) fine-tuning on real user conversations from ShareGPT, and (2) using GPT-4 as an automated judge for evaluation — the 'LLM-as-judge' approach that became standard in the field.
How much VRAM does Vicuna-33B need?
At full FP16 precision: ~66GB VRAM. With Q4_K_M quantization: ~20GB VRAM (fits on RTX 3090/4090). With Q3_K_M: ~16GB. With Q2_K: ~13GB (fits on RTX 4070 Ti). CPU-only inference is possible but very slow.
What's better than Vicuna-33B today?
In 2026, almost every modern 7B-14B model outperforms Vicuna-33B while using far less resources. Llama 3.1 8B (MMLU ~68%), Mistral 7B v0.3 (MMLU ~63%), and Qwen 2.5 7B (MMLU ~74%) are all better choices. These models also have permissive licenses, larger context windows, and Ollama support.
Sources
- LMSys. “Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.” Blog Post (March 2023)
- Zheng, L., et al. (2023). “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.” arXiv:2306.05685
- HuggingFace. “lmsys/vicuna-33b-v1.3.” Model Card
Was this helpful?
Related Guides
Continue your local AI journey with these comprehensive guides
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.