Open SourceApache 2.0Bilingual

Yi-6B by 01.AI: Bilingual Chinese-English Model

Yi-6B is a 6-billion parameter language model from 01.AI (founded by Kai-Fu Lee), released in November 2023. It was one of the first Chinese-developed open-source models to achieve competitive English benchmarks while excelling at Chinese. With 63% MMLU and 72% C-Eval, it punched well above its weight class at launch.

Note (March 2026): Yi-6B was impressive at its November 2023 release, but newer models like Qwen 2.5 7B (74% MMLU) and Gemma 2 2B now offer better performance. Yi-6B remains a solid choice for Chinese-focused tasks on constrained hardware.

MMLU
63.2%
01.AI tech report
C-Eval
72.0%
Chinese benchmark
Parameters
6B
Dense transformer
Context
4K
200K with NTK-RoPE
License
Apache
2.0 — fully open

Real Benchmark Performance

Yi-6B benchmarks are sourced from the 01.AI technical report (November 2023) and the HuggingFace Open LLM Leaderboard. At launch, it outperformed Llama 2 7B on nearly every metric despite having fewer parameters.

MMLU Scores (5-shot)

Yi-6B63 % accuracy
63
Llama 2 7B46 % accuracy
46
Mistral 7B60 % accuracy
60
Falcon 7B27 % accuracy
27
Qwen 1.5 7B61 % accuracy
61

C-Eval Scores (Chinese)

Yi-6B72 % accuracy
72
Qwen-7B63 % accuracy
63
ChatGLM2-6B52 % accuracy
52
Llama 2 7B32 % accuracy
32
Baichuan2-7B56 % accuracy
56

Performance Metrics

MMLU
63
C-Eval
72
HellaSwag
76
ARC-C
56
TruthfulQA
42
Winogrande
73

Benchmark Details

BenchmarkYi-6BSource
MMLU (5-shot)63.2%01.AI report
C-Eval (5-shot)72.0%01.AI report
HellaSwag76.4%Open LLM Leaderboard
ARC-Challenge55.9%Open LLM Leaderboard
TruthfulQA42.4%Open LLM Leaderboard
Winogrande73.3%Open LLM Leaderboard

Note: TruthfulQA at 42.4% is moderate. Base models (non-instruct) typically score lower on this metric.

VRAM Requirements by Quantization

Yi-6B is a small model that runs comfortably on consumer hardware. The Q4_K_M quantization is the recommended balance of quality and size.

QuantizationFile SizeVRAM (GPU)RAM (CPU)Quality LossRecommendation
Q2_K~2.5GB~3GB~4GBHighOnly for extreme constraints
Q4_K_M~3.8GB~4.5GB~6GBMinimalRecommended
Q5_K_M~4.5GB~5.5GB~7GBVery LowGood balance if you have 8GB VRAM
Q8_0~6.4GB~7.5GB~9GBNegligibleNear-lossless, 8GB+ GPU
FP16~12GB~13GB~16GBNoneFull precision, 16GB+ GPU

Memory Usage Over Time

4GB
3GB
2GB
1GB
0GB
0s60s120s

Architecture & the Yi Model Family

Technical Specifications

ArchitectureDense Transformer (Llama-style)
Parameters6.06B
Hidden Size4096
Layers32
Attention Heads32
Vocabulary Size64,000
Context Length4,096 (base)
Extended ContextUp to 200K (NTK-RoPE)
Positional EncodingRoPE
Training Data~3T tokens (English + Chinese)
LicenseApache 2.0

Yi Model Family

01.AI released Yi models in two sizes sharing the same architecture. The 6B variant uses the same Llama-style transformer as Yi-34B but with fewer layers (32 vs 60) and a smaller hidden dimension.

Yi-6B (this page)

6B params, 32 layers. Runs on 6GB RAM. Best for edge/constrained deployment and Chinese-focused tasks.

Yi-34B

34B params, 60 layers. 76% MMLU. Significantly stronger but needs ~20GB VRAM quantized.

Yi-1.5 Series (2024)

Improved training data and alignment. Yi-1.5 6B and 9B variants with better overall performance.

Key Design Choices

  • -- Large vocabulary (64K) for efficient Chinese tokenization
  • -- Grouped Query Attention (GQA) for memory efficiency
  • -- NTK-aware RoPE for context extension without fine-tuning
  • -- Trained on 3T tokens of curated English + Chinese data

Chinese Language Capabilities

Yi-6B's main differentiator at launch was its strong bilingual performance. With 72% on C-Eval (a comprehensive Chinese exam benchmark), it significantly outperformed Western models of similar size on Chinese tasks.

Chinese Strengths

  • +C-Eval 72%: Strong Chinese exam performance, outperforming many 7B+ models at launch
  • +Large CJK vocabulary: 64K vocab with good Chinese character coverage
  • +Simplified + Traditional: Handles both character sets
  • +Bilingual training: Balanced English-Chinese data mix

Limitations

  • !Base model: Yi-6B (base) may not follow instructions well without fine-tuning. Use Yi-6B-Chat for conversational use.
  • !Surpassed by Qwen 2.5: Qwen 2.5 7B now scores higher on both Chinese and English benchmarks
  • !6B limitation: Complex reasoning and multi-step tasks limited compared to 13B+ models
  • !TruthfulQA 42%: Base model prone to confident but inaccurate claims

Ollama Setup Guide

System Requirements

Operating System
Windows 10+, macOS 12+, Ubuntu 20.04+
RAM
6GB minimum (Q4_K_M), 12GB for FP16
Storage
4GB for Q4_K_M, 12GB for FP16
GPU
Optional — runs on CPU, GPU speeds up inference
CPU
4+ cores recommended
1

Install Ollama

Download and install the Ollama runtime

$ curl -fsSL https://ollama.com/install.sh | sh
2

Pull Yi-6B

Download the Yi-6B model (~3.8GB for Q4_K_M)

$ ollama pull yi:6b
3

Run Yi-6B

Start an interactive session

$ ollama run yi:6b
4

Test Chinese capabilities

Try a bilingual prompt to test Chinese understanding

$ ollama run yi:6b "请用中文解释什么是人工智能"

Terminal Demo

Terminal
$ollama pull yi:6b
pulling manifest pulling 3b2b6c82c... 100% 3.8 GB verifying sha256 digest writing manifest success
$ollama run yi:6b "What is the capital of France?"
The capital of France is Paris. It is the largest city in France and serves as the country's political, economic, and cultural center. Paris is located in northern France along the Seine River.
$_

Available Ollama Tags

CommandVariantSize
ollama run yi:6bYi-6B base (Q4_K_M default)~3.8GB
ollama run yi:6b-chatYi-6B Chat (instruction-tuned)~3.8GB
ollama run yi:34bYi-34B base~20GB

Tip: For conversational use, prefer yi:6b-chat. The base model is better suited for completion tasks or further fine-tuning.

Local Model Comparison

All models shown below are free, open-weight, and runnable locally. Quality score is MMLU (5-shot). Speed estimates are for Q4 quantization on a modern CPU.

ModelSizeRAM RequiredSpeedQualityCost/Month
Yi-6B (Q4_K_M)3.8GB6GB~30 tok/s
63%
$0.00
Qwen 2.5 7B (Q4)4.4GB7GB~28 tok/s
74%
$0.00
Gemma 2 2B (FP16)5GB7GB~40 tok/s
51%
$0.00
Llama 2 7B (Q4)4.1GB7GB~25 tok/s
46%
$0.00
Mistral 7B (Q4)4.4GB7GB~28 tok/s
60%
$0.00

When to Choose Yi-6B

Choose Yi-6B when:

  • -- Chinese language is your primary need
  • -- You need Apache 2.0 licensing
  • -- You have under 6GB VRAM
  • -- Fine-tuning for Chinese-specific tasks

Choose Qwen 2.5 7B instead when:

  • -- You want the best Chinese + English performance
  • -- You have 7GB+ VRAM available
  • -- You need coding capabilities
  • -- You want the most current model

Choose Mistral 7B instead when:

  • -- English-only tasks
  • -- Maximum community/ecosystem support
  • -- Sliding window attention needed
  • -- Broad fine-tune availability

Honest Assessment & Alternatives

The Bottom Line

Yi-6B was a milestone model when it launched in November 2023 — it proved that Chinese AI labs could produce competitive open-source models with strong bilingual capabilities. Its 63% MMLU and 72% C-Eval were outstanding for a 6B model at the time.

However, the LLM landscape has moved quickly. As of March 2026, several newer models offer better performance in the same resource envelope:

ModelMMLUChineseVRAM (Q4)License
Yi-6B63%Strong~4.5GBApache 2.0
Qwen 2.5 7B74%Excellent~5GBApache 2.0
Gemma 2 2B51%Limited~2GBGemma Terms
Llama 3.2 3B63%Basic~2.5GBMeta License

Our recommendation: For Chinese-focused tasks in 2026, Qwen 2.5 7B is the stronger choice. Yi-6B remains worth considering if you specifically need Apache 2.0 licensing, are fine-tuning on Chinese data, or are already invested in the Yi ecosystem.

Reading now
Join the discussion

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

Yi-6B Architecture Overview

Yi-6B transformer architecture with 32 layers, GQA attention, RoPE positional encoding, and 64K vocabulary optimized for bilingual Chinese-English processing

👤
You
💻
Your ComputerAI Processing
👤
🌐
🏢
Cloud AI: You → Internet → Company Servers
PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
📅 Published: November 1, 2023🔄 Last Updated: March 13, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

Free Tools & Calculators