Baichuan2-13B: Bilingual Chinese-English LLM

Published: September 25, 2023 | Updated: March 13, 2026

13B parameter bilingual model from Baichuan Intelligence, trained on 2.6T tokens. Real CMMLU 59%, MMLU 59%. Runs locally with Q4 quantization on 8GB VRAM.

Note (March 2026): Baichuan2-13B was released September 2023 and has been surpassed by newer Chinese LLMs like Qwen 2.5-14B. This page covers its real capabilities and historical significance.

Technical Specifications

*Parameters: 13 billion
*Training Data: 2.6 trillion tokens
*Context Window: 4,096 tokens
*Architecture: Transformer (decoder-only), RoPE, SwiGLU
*Languages: Chinese (primary), English (secondary)
*License: Baichuan 2 Community License (commercial OK)
*Release: September 2023
*Variants: Base, Chat (instruction-tuned)

Baichuan Intelligence & Model Background

Baichuan Intelligence (百川智能) was founded in April 2023 by Wang Xiaochuan, who previously served as CEO of Sogou (a major Chinese search engine acquired by Tencent in 2021). The company quickly became one of China's prominent AI startups, raising over $300 million in funding by late 2023. Baichuan2-13B was their second-generation model release, following the original Baichuan-7B and Baichuan-13B models released in June-July 2023.

Key Technical Details

  • Training corpus: 2.6 trillion tokens from a mix of Chinese web data, books, code, and English sources
  • Tokenizer: Custom BPE tokenizer with 125,696 vocabulary size (optimized for Chinese character coverage)
  • Architecture: Standard decoder-only transformer with RoPE positional embeddings and SwiGLU activation
  • Chat variant: Baichuan2-13B-Chat fine-tuned with RLHF for conversational use
  • xVal variant: 4-bit quantized version officially provided by Baichuan for resource-constrained deployment

Source Citations

Real Benchmark Results

Source: All benchmark numbers below are from the official Baichuan 2 technical report (arXiv:2309.10305). Numbers are for the 13B-Chat variant unless noted.

Chinese Benchmarks (13B-Chat)

CMMLU Score (%)

Baichuan2-13B59 Score
59
Qwen-14B62 Score
62
Yi-34B68 Score
68
ChatGLM3-6B50 Score
50

Source: Baichuan2 tech report, Open Compass. Yi-34B is larger (34B params) for reference.

General Benchmarks (13B-Chat)

MMLU Score (%)

Baichuan2-13B59 Score
59
Qwen-14B66 Score
66
Llama-2-13B55 Score
55
Aquila2-7B42 Score
42

Source: Baichuan2 tech report. Llama-2-13B included as same-size Western model baseline.

Full Benchmark Breakdown (Baichuan2-13B)

BenchmarkCategoryScoreNotes
MMLUGeneral knowledge59.2%5-shot
CMMLUChinese knowledge59.0%5-shot
C-EvalChinese evaluation58.1%5-shot
GSM8KMath reasoning52.8%8-shot
HumanEvalCode generation17.1%0-shot, base model
AGIEvalReasoning48.2%Chinese subset

All scores from the official Baichuan 2 technical report (arXiv:2309.10305). Base model numbers unless noted.

VRAM Requirements by Quantization

QuantizationVRAMFile SizeQuality LossRecommended GPU
Q4_K_M (recommended)~8 GB~7.5 GBMinimalRTX 3060 12GB, RTX 4060 Ti 16GB
Q5_K_M~10 GB~9 GBVery lowRTX 3060 12GB, RTX 4070
Q8_0~14 GB~13 GBNegligibleRTX 4080 16GB, RTX 3090
FP16~26 GB~26 GBNone (full precision)RTX 3090 24GB, A5000, A6000

VRAM estimates include KV cache overhead for 4096 context. Actual usage may vary by framework.

Memory Usage During Inference (Q4_K_M)

Memory Usage Over Time

8GB
6GB
4GB
2GB
0GB
0s30s120s

Measured with Q4_K_M quantization, 4096 token context window. Peak ~8.2 GB VRAM.

Installation & Setup (HuggingFace)

System Requirements

System Requirements

Operating System
Windows 10/11, macOS 12+, Ubuntu 20.04+
RAM
16GB minimum, 32GB recommended
Storage
10GB (Q4) to 28GB (FP16)
GPU
RTX 3060 12GB+ for Q4, RTX 3090 24GB+ for FP16
CPU
4+ cores (for CPU-only inference, expect slow speeds)

Option 1: HuggingFace Transformers (Recommended)

1

Install Dependencies

Set up Python environment with required libraries

$ pip install torch transformers accelerate
2

Download and Run Baichuan2-13B-Chat

Load the model with 4-bit quantization via bitsandbytes

$ pip install bitsandbytes python -c " from transformers import AutoModelForCausalLM, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained('baichuan-inc/Baichuan2-13B-Chat', trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained('baichuan-inc/Baichuan2-13B-Chat', torch_dtype=torch.float16, device_map='auto', trust_remote_code=True, load_in_4bit=True) messages = [{'role': 'user', 'content': 'What is the capital of China?'}] response = model.chat(tokenizer, messages) print(response) "

Important: Baichuan2 requires trust_remote_code=True because it uses custom modeling code. Review the code at the HuggingFace repo before enabling this flag in production.

Option 2: llama.cpp (GGUF format)

Community-converted GGUF files are available on HuggingFace for use with llama.cpp. Search for "Baichuan2-13B GGUF" on HuggingFace. Note that Baichuan2 is not natively available on Ollama as of March 2026, but GGUF files work with llama.cpp directly.

1

Clone and build llama.cpp

Build from source for GPU acceleration

$ git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make LLAMA_CUDA=1
2

Download GGUF model and run

Download a community GGUF conversion and run inference

$ ./llama-cli -m ./baichuan2-13b-chat.Q4_K_M.gguf -p "What is machine learning?" -n 256

Bilingual Chinese-English Capabilities

Chinese Language Strengths

  • * Simplified and traditional Chinese text generation
  • * Chinese reading comprehension and Q&A
  • * Chinese-specific knowledge (history, geography, culture)
  • * Formal and informal Chinese writing styles
  • * Chinese-to-English translation

CMMLU 59% and C-Eval 58% were competitive for September 2023 among 13B-class models.

English Capabilities

  • * Basic English text generation and Q&A
  • * English-to-Chinese translation
  • * Cross-lingual summarization
  • * Bilingual content creation
  • * English MMLU: 59% (comparable to Llama-2-13B's 55%)

English is secondary; for English-only tasks, Llama 2 or Mistral 7B are better choices.

Realistic Use Cases

Chinese Customer Support

Handling Chinese-language customer queries with bilingual fallback to English

Translation Drafts

First-pass Chinese-English translation for human review (not production-quality alone)

Chinese Content Generation

Blog posts, marketing copy, and social media content in Chinese

Local Chinese LLM Alternatives

Chinese LLM Comparison (All Locally Runnable)

ModelSizeRAM RequiredSpeedQualityCost/Month
Baichuan2-13B13B8-26GBMedium
59%
Free
Qwen 2.5-14B14B8-28GBFast
79%
Free
Yi-34B34B20-68GBSlow
76%
Free
ChatGLM3-6B6B4-12GBFast
50%
Free
Qwen 2.5-7B7B4-14GBFast
74%
Free

Quality scores are MMLU percentages. All models support Chinese and English. Qwen 2.5 models are available on Ollama.

Best Overall: Qwen 2.5-14B

  • * MMLU 79% (vs Baichuan2's 59%)
  • * 128K context (vs 4K)
  • * Available on Ollama
  • * Apache 2.0 license
  • * Released December 2024

Budget Pick: Qwen 2.5-7B

  • * MMLU 74% at half the size
  • * Runs on 4GB VRAM (Q4)
  • * 128K context window
  • * Excellent Chinese performance
  • * ollama run qwen2.5:7b

Lightweight: ChatGLM3-6B

  • * Only 6B parameters
  • * 4GB VRAM with quantization
  • * Good for basic Chinese chatbots
  • * From Zhipu AI / Tsinghua
  • * Weaker on benchmarks (MMLU ~50%)

Honest Assessment & Recommendations

When Baichuan2-13B Makes Sense

  • * You need a well-documented, tested Chinese LLM with known behavior
  • * Your application was built around Baichuan2 and migration is costly
  • * You are studying the evolution of Chinese LLMs for research
  • * You need a stable model with predictable outputs (no frequent updates)

When to Choose Something Else

  • * Starting a new project in 2025-2026 (use Qwen 2.5 instead)
  • * Need strong coding ability (HumanEval 17% is very low)
  • * Need long context (>4K tokens)
  • * Need strong math reasoning (GSM8K 52.8%)
  • * English-only tasks (Llama 3 or Mistral are better)

Bottom Line

Baichuan2-13B was a solid Chinese bilingual LLM for September 2023 and helped establish Baichuan Intelligence as a serious Chinese AI player. Its CMMLU 59% and MMLU 59% scores were competitive at the time. However, the field has moved fast: Qwen 2.5-14B now scores 79% on MMLU with 128K context and is available on Ollama, making it the clear choice for new Chinese NLP projects. Baichuan2 remains useful for existing deployments and as a reference point for Chinese LLM development.

Resources & Further Reading

Chinese NLP Benchmarks

Related Models on This Site

Baichuan2-13B Architecture

Decoder-only transformer with RoPE positional embeddings and SwiGLU activation, 13B parameters, 4096 context

👤
You
💻
Your ComputerAI Processing
👤
🌐
🏢
Cloud AI: You → Internet → Company Servers

Frequently Asked Questions

What is Baichuan2-13B and who made it?

Baichuan2-13B is a 13-billion parameter bilingual language model developed by Baichuan Intelligence (formerly Baichuan AI), a Chinese AI company founded in 2023 by Wang Xiaochuan, former CEO of Sogou. It was released in September 2023 and is trained on 2.6 trillion tokens with a focus on Chinese and English language tasks.

What are the hardware requirements for running Baichuan2-13B locally?

VRAM depends on quantization: Q4_K_M requires about 8GB VRAM (RTX 3060 12GB or RTX 4060 Ti 16GB works well), Q8_0 needs about 14GB, and full FP16 requires about 26GB (RTX 3090 or A6000). System RAM should be 16GB minimum with 32GB recommended. The Q4 quantized version is the most practical for consumer hardware.

How does Baichuan2-13B compare to Qwen 2.5?

Baichuan2-13B (September 2023) has been surpassed by newer Chinese LLMs. Qwen 2.5-14B scores around 79% on MMLU versus Baichuan2-13B's 59%, has 128K context versus 4K, and is also available under a permissive license. For new projects in 2025-2026, Qwen 2.5 is the recommended choice for Chinese NLP tasks.

Is Baichuan2-13B good for Chinese language tasks?

It was competitive for Chinese NLP when released in September 2023, scoring 59% on CMMLU and 58% on C-Eval. However, it has been surpassed by Qwen 2.5, Yi-1.5, and ChatGLM4 series. It remains useful for understanding the evolution of Chinese LLMs and for environments where a smaller, well-tested model is preferred.

Can Baichuan2-13B be used commercially?

Yes. Baichuan2-13B is released under the Baichuan 2 Community License Agreement, which permits commercial use. Organizations with over 100 million monthly active users need to apply for a separate commercial license from Baichuan Intelligence.

Was this helpful?

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Reading now
Join the discussion

Related Guides

Continue your local AI journey with these comprehensive guides

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
📅 Published: 2023-09-25🔄 Last Updated: 2026-03-16✓ Manually Reviewed
Free Tools & Calculators