Baichuan2-13B: Bilingual Chinese-English LLM
Published: September 25, 2023 | Updated: March 13, 2026
13B parameter bilingual model from Baichuan Intelligence, trained on 2.6T tokens. Real CMMLU 59%, MMLU 59%. Runs locally with Q4 quantization on 8GB VRAM.
Note (March 2026): Baichuan2-13B was released September 2023 and has been surpassed by newer Chinese LLMs like Qwen 2.5-14B. This page covers its real capabilities and historical significance.
Technical Specifications
Baichuan Intelligence & Model Background
Baichuan Intelligence (百川智能) was founded in April 2023 by Wang Xiaochuan, who previously served as CEO of Sogou (a major Chinese search engine acquired by Tencent in 2021). The company quickly became one of China's prominent AI startups, raising over $300 million in funding by late 2023. Baichuan2-13B was their second-generation model release, following the original Baichuan-7B and Baichuan-13B models released in June-July 2023.
Key Technical Details
- Training corpus: 2.6 trillion tokens from a mix of Chinese web data, books, code, and English sources
- Tokenizer: Custom BPE tokenizer with 125,696 vocabulary size (optimized for Chinese character coverage)
- Architecture: Standard decoder-only transformer with RoPE positional embeddings and SwiGLU activation
- Chat variant: Baichuan2-13B-Chat fine-tuned with RLHF for conversational use
- xVal variant: 4-bit quantized version officially provided by Baichuan for resource-constrained deployment
Source Citations
- Baichuan 2: Open Large-scale Language Models (Yang et al., September 2023) - Official technical report with all benchmark numbers
- Baichuan2-13B-Chat on HuggingFace - Official model weights and documentation
- Baichuan2 GitHub Repository - Source code and implementation examples
Real Benchmark Results
Source: All benchmark numbers below are from the official Baichuan 2 technical report (arXiv:2309.10305). Numbers are for the 13B-Chat variant unless noted.
Chinese Benchmarks (13B-Chat)
CMMLU Score (%)
Source: Baichuan2 tech report, Open Compass. Yi-34B is larger (34B params) for reference.
General Benchmarks (13B-Chat)
MMLU Score (%)
Source: Baichuan2 tech report. Llama-2-13B included as same-size Western model baseline.
Full Benchmark Breakdown (Baichuan2-13B)
| Benchmark | Category | Score | Notes |
|---|---|---|---|
| MMLU | General knowledge | 59.2% | 5-shot |
| CMMLU | Chinese knowledge | 59.0% | 5-shot |
| C-Eval | Chinese evaluation | 58.1% | 5-shot |
| GSM8K | Math reasoning | 52.8% | 8-shot |
| HumanEval | Code generation | 17.1% | 0-shot, base model |
| AGIEval | Reasoning | 48.2% | Chinese subset |
All scores from the official Baichuan 2 technical report (arXiv:2309.10305). Base model numbers unless noted.
VRAM Requirements by Quantization
| Quantization | VRAM | File Size | Quality Loss | Recommended GPU |
|---|---|---|---|---|
| Q4_K_M (recommended) | ~8 GB | ~7.5 GB | Minimal | RTX 3060 12GB, RTX 4060 Ti 16GB |
| Q5_K_M | ~10 GB | ~9 GB | Very low | RTX 3060 12GB, RTX 4070 |
| Q8_0 | ~14 GB | ~13 GB | Negligible | RTX 4080 16GB, RTX 3090 |
| FP16 | ~26 GB | ~26 GB | None (full precision) | RTX 3090 24GB, A5000, A6000 |
VRAM estimates include KV cache overhead for 4096 context. Actual usage may vary by framework.
Memory Usage During Inference (Q4_K_M)
Memory Usage Over Time
Measured with Q4_K_M quantization, 4096 token context window. Peak ~8.2 GB VRAM.
Installation & Setup (HuggingFace)
System Requirements
System Requirements
Option 1: HuggingFace Transformers (Recommended)
Install Dependencies
Set up Python environment with required libraries
Download and Run Baichuan2-13B-Chat
Load the model with 4-bit quantization via bitsandbytes
Important: Baichuan2 requires trust_remote_code=True because it uses custom modeling code. Review the code at the HuggingFace repo before enabling this flag in production.
Option 2: llama.cpp (GGUF format)
Community-converted GGUF files are available on HuggingFace for use with llama.cpp. Search for "Baichuan2-13B GGUF" on HuggingFace. Note that Baichuan2 is not natively available on Ollama as of March 2026, but GGUF files work with llama.cpp directly.
Clone and build llama.cpp
Build from source for GPU acceleration
Download GGUF model and run
Download a community GGUF conversion and run inference
Bilingual Chinese-English Capabilities
Chinese Language Strengths
- * Simplified and traditional Chinese text generation
- * Chinese reading comprehension and Q&A
- * Chinese-specific knowledge (history, geography, culture)
- * Formal and informal Chinese writing styles
- * Chinese-to-English translation
CMMLU 59% and C-Eval 58% were competitive for September 2023 among 13B-class models.
English Capabilities
- * Basic English text generation and Q&A
- * English-to-Chinese translation
- * Cross-lingual summarization
- * Bilingual content creation
- * English MMLU: 59% (comparable to Llama-2-13B's 55%)
English is secondary; for English-only tasks, Llama 2 or Mistral 7B are better choices.
Realistic Use Cases
Chinese Customer Support
Handling Chinese-language customer queries with bilingual fallback to English
Translation Drafts
First-pass Chinese-English translation for human review (not production-quality alone)
Chinese Content Generation
Blog posts, marketing copy, and social media content in Chinese
Local Chinese LLM Alternatives
Chinese LLM Comparison (All Locally Runnable)
| Model | Size | RAM Required | Speed | Quality | Cost/Month |
|---|---|---|---|---|---|
| Baichuan2-13B | 13B | 8-26GB | Medium | 59% | Free |
| Qwen 2.5-14B | 14B | 8-28GB | Fast | 79% | Free |
| Yi-34B | 34B | 20-68GB | Slow | 76% | Free |
| ChatGLM3-6B | 6B | 4-12GB | Fast | 50% | Free |
| Qwen 2.5-7B | 7B | 4-14GB | Fast | 74% | Free |
Quality scores are MMLU percentages. All models support Chinese and English. Qwen 2.5 models are available on Ollama.
Best Overall: Qwen 2.5-14B
- * MMLU 79% (vs Baichuan2's 59%)
- * 128K context (vs 4K)
- * Available on Ollama
- * Apache 2.0 license
- * Released December 2024
Budget Pick: Qwen 2.5-7B
- * MMLU 74% at half the size
- * Runs on 4GB VRAM (Q4)
- * 128K context window
- * Excellent Chinese performance
- *
ollama run qwen2.5:7b
Lightweight: ChatGLM3-6B
- * Only 6B parameters
- * 4GB VRAM with quantization
- * Good for basic Chinese chatbots
- * From Zhipu AI / Tsinghua
- * Weaker on benchmarks (MMLU ~50%)
Honest Assessment & Recommendations
When Baichuan2-13B Makes Sense
- * You need a well-documented, tested Chinese LLM with known behavior
- * Your application was built around Baichuan2 and migration is costly
- * You are studying the evolution of Chinese LLMs for research
- * You need a stable model with predictable outputs (no frequent updates)
When to Choose Something Else
- * Starting a new project in 2025-2026 (use Qwen 2.5 instead)
- * Need strong coding ability (HumanEval 17% is very low)
- * Need long context (>4K tokens)
- * Need strong math reasoning (GSM8K 52.8%)
- * English-only tasks (Llama 3 or Mistral are better)
Bottom Line
Baichuan2-13B was a solid Chinese bilingual LLM for September 2023 and helped establish Baichuan Intelligence as a serious Chinese AI player. Its CMMLU 59% and MMLU 59% scores were competitive at the time. However, the field has moved fast: Qwen 2.5-14B now scores 79% on MMLU with 128K context and is available on Ollama, making it the clear choice for new Chinese NLP projects. Baichuan2 remains useful for existing deployments and as a reference point for Chinese LLM development.
Resources & Further Reading
Official Sources
Chinese NLP Benchmarks
- * CMMLU Benchmark - Chinese Massive Multitask Language Understanding
- * C-Eval Benchmark - Chinese evaluation suite
- * LM Evaluation Harness - Open-source benchmark framework
Related Models on This Site
- * Qwen 2.5-14B - Recommended successor for Chinese NLP
- * Qwen 2.5-7B - Lighter Chinese LLM alternative
- * Yi-34B - Larger bilingual model from 01.AI
- * Aquila-7B - Another Chinese AI model
- * GLM-4-5 - ChatGLM series from Zhipu AI
Baichuan2-13B Architecture
Decoder-only transformer with RoPE positional embeddings and SwiGLU activation, 13B parameters, 4096 context
Frequently Asked Questions
What is Baichuan2-13B and who made it?
Baichuan2-13B is a 13-billion parameter bilingual language model developed by Baichuan Intelligence (formerly Baichuan AI), a Chinese AI company founded in 2023 by Wang Xiaochuan, former CEO of Sogou. It was released in September 2023 and is trained on 2.6 trillion tokens with a focus on Chinese and English language tasks.
What are the hardware requirements for running Baichuan2-13B locally?
VRAM depends on quantization: Q4_K_M requires about 8GB VRAM (RTX 3060 12GB or RTX 4060 Ti 16GB works well), Q8_0 needs about 14GB, and full FP16 requires about 26GB (RTX 3090 or A6000). System RAM should be 16GB minimum with 32GB recommended. The Q4 quantized version is the most practical for consumer hardware.
How does Baichuan2-13B compare to Qwen 2.5?
Baichuan2-13B (September 2023) has been surpassed by newer Chinese LLMs. Qwen 2.5-14B scores around 79% on MMLU versus Baichuan2-13B's 59%, has 128K context versus 4K, and is also available under a permissive license. For new projects in 2025-2026, Qwen 2.5 is the recommended choice for Chinese NLP tasks.
Is Baichuan2-13B good for Chinese language tasks?
It was competitive for Chinese NLP when released in September 2023, scoring 59% on CMMLU and 58% on C-Eval. However, it has been surpassed by Qwen 2.5, Yi-1.5, and ChatGLM4 series. It remains useful for understanding the evolution of Chinese LLMs and for environments where a smaller, well-tested model is preferred.
Can Baichuan2-13B be used commercially?
Yes. Baichuan2-13B is released under the Baichuan 2 Community License Agreement, which permits commercial use. Organizations with over 100 million monthly active users need to apply for a separate commercial license from Baichuan Intelligence.
Was this helpful?
Related Guides
Continue your local AI journey with these comprehensive guides
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.