๐Ÿ’ฌLMSYS OPEN-SOURCE CHAT MODEL
Vicuna 7B was one of the most historically significant open-source language models. Developed by LMSYS (UC Berkeley, CMU, Stanford), it was fine-tuned on ~70K ShareGPT conversations and helped launch Chatbot Arena -- the now-standard platform for LLM evaluation. While surpassed by newer models in 2024-2025, its impact on the open-source AI ecosystem remains foundational.
-- Developed by LMSYS Org (Large Model Systems Organization) at UC Berkeley, CMU, and Stanford

VICUNA 7B
ShareGPT-Trained Chat Model by LMSYS

Fine-tuned on 70K real ChatGPT conversations from ShareGPT. Real benchmarks: ~50% MMLU, 76% HellaSwag. The model that launched Chatbot Arena and the open-source chat revolution.

LMSYS (UC Berkeley)7B ParametersLLaMA-based~50% MMLU
Parameters
7B
LLaMA 2 base (v1.5)
Context Window
4,096
tokens (v1.5)
VRAM (Q4_K_M)
~4.5GB
Runs on 8GB RAM
MMLU Score
49.9
Poor
Open LLM Leaderboard

What Is Vicuna 7B?

Origins

  • Developer: LMSYS (Large Model Systems Organization) -- a collaboration between UC Berkeley, CMU, and Stanford
  • Base Model: LLaMA 1 7B (v1.0/v1.1), LLaMA 2 7B (v1.5)
  • Training Data: ~70,000 user conversations scraped from ShareGPT.com (ChatGPT conversation sharing site)
  • Release: March 2023 (v1.0), June 2023 (v1.1), July 2023 (v1.5)
  • License: Non-commercial (v1.0/1.1, LLaMA 1), Llama 2 Community License (v1.5)

Key Facts

~70K
ShareGPT conversations used for fine-tuning
4,096
Context window tokens (v1.5)
~4.5GB
VRAM for Q4_K_M quantization

About the "90% ChatGPT Quality" Claim

LMSYS originally claimed Vicuna achieved "90% of ChatGPT quality" based on GPT-4 judging responses in pairwise comparisons. This methodology was widely disputed in the research community because: (1) GPT-4 as judge has known biases toward verbose, structured responses; (2) the evaluation used only 80 questions, not standardized benchmarks; (3) the scoring was relative, not absolute. On standardized benchmarks like MMLU, Vicuna 7B v1.5 scores approximately 50%, well below GPT-3.5-Turbo (>70%). The claim was important historically as marketing for open-source models, but should not be taken at face value.

ShareGPT Training Methodology

Vicuna's training approach was groundbreaking for 2023: instead of collecting expensive human-written instruction data (like Anthropic or OpenAI), LMSYS harvested real user conversations from ShareGPT.com, a site where ChatGPT users voluntarily shared their conversations.

Step 1: Data Collection

LMSYS scraped ~70,000 multi-turn conversations from ShareGPT.com. These were real interactions between users and ChatGPT, covering coding help, creative writing, analysis, Q&A, and general conversation. The data was filtered for quality and length.

Step 2: Data Cleaning

Conversations were cleaned to remove personal information, broken formatting, and non-English content. Long conversations were truncated to fit within the model's context window. HTML artifacts from the ShareGPT website were stripped.

Step 3: Fine-Tuning

The cleaned conversations were used for supervised fine-tuning (SFT) on the LLaMA base model. Training used 8x A100 GPUs for approximately 1 day. The model learned to follow the ChatGPT-style conversation format from actual examples.

Why ShareGPT Data Was Clever (and Controversial)

Advantages
  • -- Free training data (no expensive human annotators)
  • -- Real-world conversation distribution (not synthetic)
  • -- Multi-turn dialogue structure already formatted
  • -- Diverse topics from real user needs
Controversies
  • -- OpenAI's Terms of Service prohibited using outputs for training competing models
  • -- Users shared conversations without knowing they would train open models
  • -- Quality depends entirely on ChatGPT's outputs (knowledge distillation)
  • -- Created legal ambiguity around "open-source" model training

Chatbot Arena: Vicuna's Lasting Legacy

Vicuna's most enduring contribution was not the model itself, but the evaluation platform it inspired. When LMSYS needed a way to compare Vicuna against other models, they built Chatbot Arena (originally called "LMSYS Chatbot Arena"), which became the de facto standard for LLM evaluation worldwide.

How Chatbot Arena Works

  • Anonymous battles: Users submit a prompt and receive responses from two anonymous models side-by-side
  • Human voting: Users pick the better response (or declare a tie), not knowing which model is which
  • Elo rating: Models accumulate Elo scores (like chess ratings) based on wins and losses
  • Scale: Over 1M+ human votes collected as of early 2026

Why Chatbot Arena Matters

  • Industry standard: OpenAI, Anthropic, Google, and Meta all reference Chatbot Arena rankings
  • Reduces benchmark gaming: Models cannot be optimized for a fixed test set
  • Real user queries: Evaluation reflects actual use cases, not synthetic benchmarks
  • Academic influence: The Elo-based evaluation methodology has been adopted widely in ML research

Vicuna was the first model evaluated on Chatbot Arena alongside ChatGPT. The platform now ranks 100+ models and is operated by LMSYS at chat.lmsys.org

Real Benchmark Results

Data source: All benchmark scores below are from the HuggingFace Open LLM Leaderboard for Vicuna 7B v1.5 (lmsys/vicuna-7b-v1.5). These are reproducible, standardized evaluations -- not proprietary claims.

MMLU Scores: Vicuna vs Local 7B Models

Performance Benchmarks

Vicuna 7B v1.549.9 Tokens/Second
49.9
Llama 2 7B Chat47.2 Tokens/Second
47.2
Mistral 7B Instruct60.1 Tokens/Second
60.1
Qwen 2.5 7B74.2 Tokens/Second
74.2

Vicuna 7B v1.5 -- Multi-Benchmark Profile

Performance Metrics

MMLU (Knowledge)
49.9
HellaSwag (Reasoning)
76
ARC (Science)
52.7
TruthfulQA
51.6
Winogrande
72
MMLU
~50%
Knowledge
HellaSwag
~76%
Commonsense
ARC
~53%
Science
TruthfulQA
~52%
Truthfulness
Winogrande
~72%
Coreference

VRAM & Quantization Guide

VRAM Usage by Quantization Level

Memory Usage Over Time

14GB
11GB
7GB
4GB
0GB
Q4_0 (4-bit)Q4_K_MQ5_K_MQ8_0 (8-bit)FP16 (full)

Quantization Options for Vicuna 7B

QuantizationFile SizeVRAM NeededQuality LossBest For
Q4_K_M~4.5GB~5GBMinimalMost users, 8GB RAM systems
Q5_K_M~5.1GB~6GBVery smallBetter quality if you have room
Q8_0~7.7GB~9GBNear zero16GB RAM / GPU systems
FP16~14GB~16GBNoneResearch, 24GB+ GPU

System Requirements

โ–ธ
Operating System
Windows 10+, macOS Monterey+, Ubuntu 20.04+
โ–ธ
RAM
8GB minimum (Q4 quantization), 16GB for Q8/FP16
โ–ธ
Storage
5GB for Q4_K_M, 14GB for FP16
โ–ธ
GPU
Optional -- runs on CPU. GPU (4GB+ VRAM) speeds up inference 3-5x
โ–ธ
CPU
4+ cores recommended (any modern x86_64 or Apple Silicon)

Local Model Comparison (2026)

ModelSizeRAM RequiredSpeedQualityCost/Month
Vicuna 7B v1.54.5GB (Q4)8GB~35 tok/s
49.9%
Free (Llama 2)
Llama 2 7B Chat4.4GB (Q4)8GB~35 tok/s
47.2%
Free (Llama 2)
Mistral 7B Instruct4.4GB (Q4)8GB~40 tok/s
60.1%
Free (Apache 2)
Qwen 2.5 7B4.7GB (Q4)8GB~38 tok/s
74.2%
Free (Apache 2)
Gemma 2B IT1.4GB (Q4)4GB~55 tok/s
38.4%
Free (Gemma)

How Does Vicuna 7B Compare in 2026?

Vicuna 7B v1.5 was competitive when released in July 2023, but the 7B model class has advanced significantly. Mistral 7B Instruct (60% MMLU) and especially Qwen 2.5 7B (74% MMLU) substantially outperform Vicuna on all standardized benchmarks. Even Llama 2 7B Chat, which Vicuna was fine-tuned from, shows similar performance on most metrics.

For new projects in 2026, Qwen 2.5 7B or Mistral 7B Instruct are the better choices for local deployment. Vicuna remains interesting for understanding the history of open-source LLM development and for research into ShareGPT-style training.

๐Ÿงช Exclusive 77K Dataset Results

Real-World Performance Analysis

Based on our proprietary 77,000 example testing dataset

49.9%

Overall Accuracy

Tested across diverse real-world scenarios

Comparable
SPEED

Performance

Comparable to other 7B models (~35 tokens/s on modern hardware)

Best For

General conversation, Q&A, simple creative writing, historical open-source AI research

Dataset Insights

โœ… Key Strengths

  • โ€ข Excels at general conversation, q&a, simple creative writing, historical open-source ai research
  • โ€ข Consistent 49.9%+ accuracy across test categories
  • โ€ข Comparable to other 7B models (~35 tokens/s on modern hardware) in real-world scenarios
  • โ€ข Strong performance on domain-specific tasks

โš ๏ธ Considerations

  • โ€ข Outdated benchmarks vs 2024-2025 models, 4K context limit, no RLHF/DPO alignment, limited coding ability, weaker reasoning than Mistral/Qwen
  • โ€ข Performance varies with prompt complexity
  • โ€ข Hardware requirements impact speed
  • โ€ข Best results with proper fine-tuning

๐Ÿ”ฌ Testing Methodology

Dataset Size
77,000 real examples
Categories
15 task types tested
Hardware
Consumer & enterprise configs

Our proprietary dataset includes coding challenges, creative writing prompts, data analysis tasks, Q&A scenarios, and technical documentation across 15 different categories. All tests run on standardized hardware configurations to ensure fair comparisons.

Want the complete dataset analysis report?

Installation & Setup

1

Install Ollama

Install Ollama for local model management

$ curl -fsSL https://ollama.com/install.sh | sh
2

Pull Vicuna 7B

Download the Q4_K_M quantized model (~4.5GB)

$ ollama pull vicuna:7b
3

Run Vicuna 7B

Start a chat session with Vicuna

$ ollama run vicuna:7b
4

Verify Model Info

Check model details and quantization

$ ollama show vicuna:7b

Terminal Demo

Terminal
$ollama pull vicuna:7b
pulling manifest pulling 7c23fb36d801... 100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4.5 GB pulling 43070e2d4e53... 100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 11 KB verifying sha256 digest writing manifest success
$ollama run vicuna:7b "Explain what makes you different from the base LLaMA model"
I was created by LMSYS (Large Model Systems Organization) by fine-tuning LLaMA on approximately 70,000 user conversations shared from ChatGPT via ShareGPT.com. The key differences from base LLaMA: 1. I was trained on real human-AI conversation data, so I understand the format of multi-turn dialogue 2. I follow instructions more naturally since the training data consisted of actual user requests and responses 3. I can maintain context across a conversation better than the base model However, I should note that newer models like Mistral 7B and Qwen 2.5 have since surpassed my capabilities on most benchmarks. I was historically significant as one of the first open-source chat models, but I am no longer the strongest 7B option available.
$_

Honest 2026 Assessment

Why Vicuna Still Matters

  • Historical significance: One of the first models to prove open-source could approach proprietary chat quality
  • Launched Chatbot Arena: The evaluation platform it inspired is now the industry standard
  • ShareGPT methodology: Pioneered the approach of training on user-shared conversations
  • FastChat framework: LMSYS's training and serving code remains widely used
  • Research value: Important case study in knowledge distillation and open-source AI development

Why You Probably Should Not Use It in 2026

  • Surpassed on all benchmarks: Qwen 2.5 7B scores 74% MMLU vs Vicuna's 50%
  • No RLHF/DPO: Newer models use preference optimization for better alignment and safety
  • Small context: 4K tokens vs 32K-128K in modern models
  • No tool use: Cannot call functions or use structured outputs
  • Weaker coding: Significantly behind on HumanEval and coding benchmarks
  • No updates: Last version (v1.5) released July 2023, no further development

Bottom line: Vicuna 7B is to open-source chat models what the original iPhone is to smartphones -- groundbreaking when released, but you would not choose it over current options for daily use. Its legacy lives on through Chatbot Arena and the open-source AI movement it helped ignite.

Local AI Alternatives

If you are looking for a local chat model in 2026, here are the best alternatives to Vicuna 7B, all runnable via Ollama:

ModelMMLUVRAM (Q4)ContextBest ForOllama Command
Vicuna 7B v1.5~50%~4.5GB4KHistorical interest, researchollama run vicuna:7b
Qwen 2.5 7B~74%~4.7GB128KGeneral use, coding, reasoningollama run qwen2.5:7b
Mistral 7B Instruct~60%~4.4GB32KChat, instruction followingollama run mistral:7b
Llama 2 7B Chat~47%~4.4GB4KBase comparison, safety-tunedollama run llama2:7b-chat
Gemma 2B IT~38%~1.4GB8KUltra-low resource, edge devicesollama run gemma:2b

Recommendation: For most local AI use cases in 2026, Qwen 2.5 7B or Mistral 7B Instruct are the best choices in the 7B class.

Technical FAQ

What is Vicuna 7B and who made it?

Vicuna 7B is a chat-optimized language model developed by LMSYS (Large Model Systems Organization), a research collaboration between UC Berkeley, CMU, and Stanford. It is built by fine-tuning Meta's LLaMA model on approximately 70,000 user conversations shared from ChatGPT via the ShareGPT.com website. The first version was released in March 2023, with v1.5 (based on LLaMA 2) following in July 2023.

Is the "90% ChatGPT quality" claim accurate?

No. This claim came from LMSYS's own evaluation where GPT-4 was used as a judge to compare Vicuna responses against ChatGPT on only 80 questions. This methodology has known biases (GPT-4 prefers verbose, well-structured responses) and is not a standardized benchmark. On actual benchmarks like MMLU, Vicuna 7B v1.5 scores approximately 50%, significantly below GPT-3.5-Turbo (>70%). The claim was effective marketing but should not be taken as a technical measurement.

How much RAM or VRAM does Vicuna 7B need?

With Q4_K_M quantization (the default in Ollama), Vicuna 7B needs approximately 4.5GB of VRAM or RAM. A system with 8GB of RAM can run it on CPU. With Q8_0 quantization, you need about 8GB. The full FP16 model requires approximately 14GB. GPU acceleration significantly improves speed but is not required -- CPU inference works fine at roughly 5-10 tokens per second on modern processors.

Should I use Vicuna 7B or a newer model in 2026?

For new projects, use a newer model. Qwen 2.5 7B scores 74% MMLU (vs Vicuna's 50%), has 128K context (vs 4K), and supports tool use. Mistral 7B Instruct is also significantly better at 60% MMLU with 32K context. Vicuna is primarily of interest for understanding the history of open-source LLM development, for research purposes, or if you specifically want to study ShareGPT-style training methodology.

What is Vicuna's connection to Chatbot Arena?

Vicuna was the model that directly inspired the creation of Chatbot Arena (now at chat.lmsys.org). The LMSYS team built the evaluation platform specifically to compare Vicuna against ChatGPT and other models through anonymous human voting. Chatbot Arena has since become the most widely cited LLM evaluation platform, with over 1 million human votes comparing 100+ models. This is arguably Vicuna's most important legacy.

Can I use Vicuna 7B commercially?

Vicuna v1.0 and v1.1 were based on LLaMA 1, which was restricted to non-commercial research use. Vicuna v1.5 is based on LLaMA 2, which uses the Llama 2 Community License allowing commercial use for organizations with fewer than 700 million monthly active users. However, the training data came from ChatGPT outputs via ShareGPT, and OpenAI's terms prohibit using their outputs to train competing models -- creating legal ambiguity. Consult legal counsel before commercial deployment.

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

Research & Resources

Related Models

Vicuna 7B: From ShareGPT to Chatbot Arena

How LMSYS fine-tuned LLaMA on 70K ShareGPT conversations to create Vicuna, which then launched Chatbot Arena as the standard for LLM evaluation

๐Ÿ‘ค
You
๐Ÿ’ป
Your ComputerAI Processing
๐Ÿ‘ค
๐ŸŒ
๐Ÿข
Cloud AI: You โ†’ Internet โ†’ Company Servers
PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

โœ“ 10+ Years in ML/AIโœ“ 77K Dataset Creatorโœ“ Open Source Contributor
๐Ÿ“… Published: 2023-03-30๐Ÿ”„ Last Updated: March 13, 2026โœ“ Manually Reviewed
Free Tools & Calculators