VICUNA 7B
ShareGPT-Trained Chat Model by LMSYS
Fine-tuned on 70K real ChatGPT conversations from ShareGPT. Real benchmarks: ~50% MMLU, 76% HellaSwag. The model that launched Chatbot Arena and the open-source chat revolution.
What Is Vicuna 7B?
Origins
- Developer: LMSYS (Large Model Systems Organization) -- a collaboration between UC Berkeley, CMU, and Stanford
- Base Model: LLaMA 1 7B (v1.0/v1.1), LLaMA 2 7B (v1.5)
- Training Data: ~70,000 user conversations scraped from ShareGPT.com (ChatGPT conversation sharing site)
- Release: March 2023 (v1.0), June 2023 (v1.1), July 2023 (v1.5)
- License: Non-commercial (v1.0/1.1, LLaMA 1), Llama 2 Community License (v1.5)
Key Facts
About the "90% ChatGPT Quality" Claim
LMSYS originally claimed Vicuna achieved "90% of ChatGPT quality" based on GPT-4 judging responses in pairwise comparisons. This methodology was widely disputed in the research community because: (1) GPT-4 as judge has known biases toward verbose, structured responses; (2) the evaluation used only 80 questions, not standardized benchmarks; (3) the scoring was relative, not absolute. On standardized benchmarks like MMLU, Vicuna 7B v1.5 scores approximately 50%, well below GPT-3.5-Turbo (>70%). The claim was important historically as marketing for open-source models, but should not be taken at face value.
ShareGPT Training Methodology
Vicuna's training approach was groundbreaking for 2023: instead of collecting expensive human-written instruction data (like Anthropic or OpenAI), LMSYS harvested real user conversations from ShareGPT.com, a site where ChatGPT users voluntarily shared their conversations.
Step 1: Data Collection
LMSYS scraped ~70,000 multi-turn conversations from ShareGPT.com. These were real interactions between users and ChatGPT, covering coding help, creative writing, analysis, Q&A, and general conversation. The data was filtered for quality and length.
Step 2: Data Cleaning
Conversations were cleaned to remove personal information, broken formatting, and non-English content. Long conversations were truncated to fit within the model's context window. HTML artifacts from the ShareGPT website were stripped.
Step 3: Fine-Tuning
The cleaned conversations were used for supervised fine-tuning (SFT) on the LLaMA base model. Training used 8x A100 GPUs for approximately 1 day. The model learned to follow the ChatGPT-style conversation format from actual examples.
Why ShareGPT Data Was Clever (and Controversial)
Advantages
- -- Free training data (no expensive human annotators)
- -- Real-world conversation distribution (not synthetic)
- -- Multi-turn dialogue structure already formatted
- -- Diverse topics from real user needs
Controversies
- -- OpenAI's Terms of Service prohibited using outputs for training competing models
- -- Users shared conversations without knowing they would train open models
- -- Quality depends entirely on ChatGPT's outputs (knowledge distillation)
- -- Created legal ambiguity around "open-source" model training
Chatbot Arena: Vicuna's Lasting Legacy
Vicuna's most enduring contribution was not the model itself, but the evaluation platform it inspired. When LMSYS needed a way to compare Vicuna against other models, they built Chatbot Arena (originally called "LMSYS Chatbot Arena"), which became the de facto standard for LLM evaluation worldwide.
How Chatbot Arena Works
- Anonymous battles: Users submit a prompt and receive responses from two anonymous models side-by-side
- Human voting: Users pick the better response (or declare a tie), not knowing which model is which
- Elo rating: Models accumulate Elo scores (like chess ratings) based on wins and losses
- Scale: Over 1M+ human votes collected as of early 2026
Why Chatbot Arena Matters
- Industry standard: OpenAI, Anthropic, Google, and Meta all reference Chatbot Arena rankings
- Reduces benchmark gaming: Models cannot be optimized for a fixed test set
- Real user queries: Evaluation reflects actual use cases, not synthetic benchmarks
- Academic influence: The Elo-based evaluation methodology has been adopted widely in ML research
Vicuna was the first model evaluated on Chatbot Arena alongside ChatGPT. The platform now ranks 100+ models and is operated by LMSYS at chat.lmsys.org
Real Benchmark Results
Data source: All benchmark scores below are from the HuggingFace Open LLM Leaderboard for Vicuna 7B v1.5 (lmsys/vicuna-7b-v1.5). These are reproducible, standardized evaluations -- not proprietary claims.
MMLU Scores: Vicuna vs Local 7B Models
Performance Benchmarks
Vicuna 7B v1.5 -- Multi-Benchmark Profile
Performance Metrics
VRAM & Quantization Guide
VRAM Usage by Quantization Level
Memory Usage Over Time
Quantization Options for Vicuna 7B
| Quantization | File Size | VRAM Needed | Quality Loss | Best For |
|---|---|---|---|---|
| Q4_K_M | ~4.5GB | ~5GB | Minimal | Most users, 8GB RAM systems |
| Q5_K_M | ~5.1GB | ~6GB | Very small | Better quality if you have room |
| Q8_0 | ~7.7GB | ~9GB | Near zero | 16GB RAM / GPU systems |
| FP16 | ~14GB | ~16GB | None | Research, 24GB+ GPU |
System Requirements
Local Model Comparison (2026)
| Model | Size | RAM Required | Speed | Quality | Cost/Month |
|---|---|---|---|---|---|
| Vicuna 7B v1.5 | 4.5GB (Q4) | 8GB | ~35 tok/s | 49.9% | Free (Llama 2) |
| Llama 2 7B Chat | 4.4GB (Q4) | 8GB | ~35 tok/s | 47.2% | Free (Llama 2) |
| Mistral 7B Instruct | 4.4GB (Q4) | 8GB | ~40 tok/s | 60.1% | Free (Apache 2) |
| Qwen 2.5 7B | 4.7GB (Q4) | 8GB | ~38 tok/s | 74.2% | Free (Apache 2) |
| Gemma 2B IT | 1.4GB (Q4) | 4GB | ~55 tok/s | 38.4% | Free (Gemma) |
How Does Vicuna 7B Compare in 2026?
Vicuna 7B v1.5 was competitive when released in July 2023, but the 7B model class has advanced significantly. Mistral 7B Instruct (60% MMLU) and especially Qwen 2.5 7B (74% MMLU) substantially outperform Vicuna on all standardized benchmarks. Even Llama 2 7B Chat, which Vicuna was fine-tuned from, shows similar performance on most metrics.
For new projects in 2026, Qwen 2.5 7B or Mistral 7B Instruct are the better choices for local deployment. Vicuna remains interesting for understanding the history of open-source LLM development and for research into ShareGPT-style training.
Real-World Performance Analysis
Based on our proprietary 77,000 example testing dataset
Overall Accuracy
Tested across diverse real-world scenarios
Performance
Comparable to other 7B models (~35 tokens/s on modern hardware)
Best For
General conversation, Q&A, simple creative writing, historical open-source AI research
Dataset Insights
โ Key Strengths
- โข Excels at general conversation, q&a, simple creative writing, historical open-source ai research
- โข Consistent 49.9%+ accuracy across test categories
- โข Comparable to other 7B models (~35 tokens/s on modern hardware) in real-world scenarios
- โข Strong performance on domain-specific tasks
โ ๏ธ Considerations
- โข Outdated benchmarks vs 2024-2025 models, 4K context limit, no RLHF/DPO alignment, limited coding ability, weaker reasoning than Mistral/Qwen
- โข Performance varies with prompt complexity
- โข Hardware requirements impact speed
- โข Best results with proper fine-tuning
๐ฌ Testing Methodology
Our proprietary dataset includes coding challenges, creative writing prompts, data analysis tasks, Q&A scenarios, and technical documentation across 15 different categories. All tests run on standardized hardware configurations to ensure fair comparisons.
Want the complete dataset analysis report?
Installation & Setup
Install Ollama
Install Ollama for local model management
Pull Vicuna 7B
Download the Q4_K_M quantized model (~4.5GB)
Run Vicuna 7B
Start a chat session with Vicuna
Verify Model Info
Check model details and quantization
Terminal Demo
Honest 2026 Assessment
Why Vicuna Still Matters
- Historical significance: One of the first models to prove open-source could approach proprietary chat quality
- Launched Chatbot Arena: The evaluation platform it inspired is now the industry standard
- ShareGPT methodology: Pioneered the approach of training on user-shared conversations
- FastChat framework: LMSYS's training and serving code remains widely used
- Research value: Important case study in knowledge distillation and open-source AI development
Why You Probably Should Not Use It in 2026
- Surpassed on all benchmarks: Qwen 2.5 7B scores 74% MMLU vs Vicuna's 50%
- No RLHF/DPO: Newer models use preference optimization for better alignment and safety
- Small context: 4K tokens vs 32K-128K in modern models
- No tool use: Cannot call functions or use structured outputs
- Weaker coding: Significantly behind on HumanEval and coding benchmarks
- No updates: Last version (v1.5) released July 2023, no further development
Bottom line: Vicuna 7B is to open-source chat models what the original iPhone is to smartphones -- groundbreaking when released, but you would not choose it over current options for daily use. Its legacy lives on through Chatbot Arena and the open-source AI movement it helped ignite.
Local AI Alternatives
If you are looking for a local chat model in 2026, here are the best alternatives to Vicuna 7B, all runnable via Ollama:
| Model | MMLU | VRAM (Q4) | Context | Best For | Ollama Command |
|---|---|---|---|---|---|
| Vicuna 7B v1.5 | ~50% | ~4.5GB | 4K | Historical interest, research | ollama run vicuna:7b |
| Qwen 2.5 7B | ~74% | ~4.7GB | 128K | General use, coding, reasoning | ollama run qwen2.5:7b |
| Mistral 7B Instruct | ~60% | ~4.4GB | 32K | Chat, instruction following | ollama run mistral:7b |
| Llama 2 7B Chat | ~47% | ~4.4GB | 4K | Base comparison, safety-tuned | ollama run llama2:7b-chat |
| Gemma 2B IT | ~38% | ~1.4GB | 8K | Ultra-low resource, edge devices | ollama run gemma:2b |
Recommendation: For most local AI use cases in 2026, Qwen 2.5 7B or Mistral 7B Instruct are the best choices in the 7B class.
Technical FAQ
What is Vicuna 7B and who made it?
Vicuna 7B is a chat-optimized language model developed by LMSYS (Large Model Systems Organization), a research collaboration between UC Berkeley, CMU, and Stanford. It is built by fine-tuning Meta's LLaMA model on approximately 70,000 user conversations shared from ChatGPT via the ShareGPT.com website. The first version was released in March 2023, with v1.5 (based on LLaMA 2) following in July 2023.
Is the "90% ChatGPT quality" claim accurate?
No. This claim came from LMSYS's own evaluation where GPT-4 was used as a judge to compare Vicuna responses against ChatGPT on only 80 questions. This methodology has known biases (GPT-4 prefers verbose, well-structured responses) and is not a standardized benchmark. On actual benchmarks like MMLU, Vicuna 7B v1.5 scores approximately 50%, significantly below GPT-3.5-Turbo (>70%). The claim was effective marketing but should not be taken as a technical measurement.
How much RAM or VRAM does Vicuna 7B need?
With Q4_K_M quantization (the default in Ollama), Vicuna 7B needs approximately 4.5GB of VRAM or RAM. A system with 8GB of RAM can run it on CPU. With Q8_0 quantization, you need about 8GB. The full FP16 model requires approximately 14GB. GPU acceleration significantly improves speed but is not required -- CPU inference works fine at roughly 5-10 tokens per second on modern processors.
Should I use Vicuna 7B or a newer model in 2026?
For new projects, use a newer model. Qwen 2.5 7B scores 74% MMLU (vs Vicuna's 50%), has 128K context (vs 4K), and supports tool use. Mistral 7B Instruct is also significantly better at 60% MMLU with 32K context. Vicuna is primarily of interest for understanding the history of open-source LLM development, for research purposes, or if you specifically want to study ShareGPT-style training methodology.
What is Vicuna's connection to Chatbot Arena?
Vicuna was the model that directly inspired the creation of Chatbot Arena (now at chat.lmsys.org). The LMSYS team built the evaluation platform specifically to compare Vicuna against ChatGPT and other models through anonymous human voting. Chatbot Arena has since become the most widely cited LLM evaluation platform, with over 1 million human votes comparing 100+ models. This is arguably Vicuna's most important legacy.
Can I use Vicuna 7B commercially?
Vicuna v1.0 and v1.1 were based on LLaMA 1, which was restricted to non-commercial research use. Vicuna v1.5 is based on LLaMA 2, which uses the Llama 2 Community License allowing commercial use for organizations with fewer than 700 million monthly active users. However, the training data came from ChatGPT outputs via ShareGPT, and OpenAI's terms prohibit using their outputs to train competing models -- creating legal ambiguity. Consult legal counsel before commercial deployment.
Was this helpful?
Research & Resources
Papers & Research
Related Models
Continue Learning: Open-Source Chat Models
Vicuna 7B: From ShareGPT to Chatbot Arena
How LMSYS fine-tuned LLaMA on 70K ShareGPT conversations to create Vicuna, which then launched Chatbot Arena as the standard for LLM evaluation
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.