Baichuan2-13B: Bilingual Chinese-English LLM

Published: September 25, 2023 | Updated: March 13, 2026

13B parameter bilingual model from Baichuan Intelligence, trained on 2.6T tokens. Real CMMLU 59%, MMLU 59%. Runs locally with Q4 quantization on 8GB VRAM.

Note (March 2026): Baichuan2-13B was released September 2023 and has been surpassed by newer Chinese LLMs like Qwen 2.5-14B. This page covers its real capabilities and historical significance.

Technical Specifications

*Parameters: 13 billion

*Training Data: 2.6 trillion tokens

*Context Window: 4,096 tokens

*Architecture: Transformer (decoder-only), RoPE, SwiGLU

*Languages: Chinese (primary), English (secondary)

*License: Baichuan 2 Community License (commercial OK)

*Release: September 2023

*Variants: Base, Chat (instruction-tuned)

Baichuan Intelligence & Model Background

Baichuan Intelligence (百川智能) was founded in April 2023 by Wang Xiaochuan, who previously served as CEO of Sogou (a major Chinese search engine acquired by Tencent in 2021). The company quickly became one of China's prominent AI startups, raising over $300 million in funding by late 2023. Baichuan2-13B was their second-generation model release, following the original Baichuan-7B and Baichuan-13B models released in June-July 2023.

Key Technical Details

Training corpus: 2.6 trillion tokens from a mix of Chinese web data, books, code, and English sources
Tokenizer: Custom BPE tokenizer with 125,696 vocabulary size (optimized for Chinese character coverage)
Architecture: Standard decoder-only transformer with RoPE positional embeddings and SwiGLU activation
Chat variant: Baichuan2-13B-Chat fine-tuned with RLHF for conversational use
xVal variant: 4-bit quantized version officially provided by Baichuan for resource-constrained deployment

Source Citations

Baichuan 2: Open Large-scale Language Models (Yang et al., September 2023) - Official technical report with all benchmark numbers
Baichuan2-13B-Chat on HuggingFace - Official model weights and documentation
Baichuan2 GitHub Repository - Source code and implementation examples

Real Benchmark Results

Source: All benchmark numbers below are from the official Baichuan 2 technical report (arXiv:2309.10305). Numbers are for the 13B-Chat variant unless noted.

Chinese Benchmarks (13B-Chat)

CMMLU Score (%)

Baichuan2-13B59 Score

Qwen-14B62 Score

Yi-34B68 Score

ChatGLM3-6B50 Score

Source: Baichuan2 tech report, Open Compass. Yi-34B is larger (34B params) for reference.

General Benchmarks (13B-Chat)

MMLU Score (%)

Baichuan2-13B59 Score

Qwen-14B66 Score

Llama-2-13B55 Score

Aquila2-7B42 Score

Source: Baichuan2 tech report. Llama-2-13B included as same-size Western model baseline.

Full Benchmark Breakdown (Baichuan2-13B)

Benchmark	Category	Score	Notes
MMLU	General knowledge	59.2%	5-shot
CMMLU	Chinese knowledge	59.0%	5-shot
C-Eval	Chinese evaluation	58.1%	5-shot
GSM8K	Math reasoning	52.8%	8-shot
HumanEval	Code generation	17.1%	0-shot, base model
AGIEval	Reasoning	48.2%	Chinese subset

All scores from the official Baichuan 2 technical report (arXiv:2309.10305). Base model numbers unless noted.

VRAM Requirements by Quantization

Quantization	VRAM	File Size	Quality Loss	Recommended GPU
Q4_K_M (recommended)	~8 GB	~7.5 GB	Minimal	RTX 3060 12GB, RTX 4060 Ti 16GB
Q5_K_M	~10 GB	~9 GB	Very low	RTX 3060 12GB, RTX 4070
Q8_0	~14 GB	~13 GB	Negligible	RTX 4080 16GB, RTX 3090
FP16	~26 GB	~26 GB	None (full precision)	RTX 3090 24GB, A5000, A6000

VRAM estimates include KV cache overhead for 4096 context. Actual usage may vary by framework.

Memory Usage During Inference (Q4_K_M)

Memory Usage Over Time

8GB

6GB

4GB

2GB

0GB

0s30s120s

Measured with Q4_K_M quantization, 4096 token context window. Peak ~8.2 GB VRAM.

Installation & Setup (HuggingFace)

System Requirements

▸

Operating System

Windows 10/11, macOS 12+, Ubuntu 20.04+

▸

RAM

16GB minimum, 32GB recommended

▸

Storage

10GB (Q4) to 28GB (FP16)

▸

GPU

RTX 3060 12GB+ for Q4, RTX 3090 24GB+ for FP16

▸

CPU

4+ cores (for CPU-only inference, expect slow speeds)

Option 1: HuggingFace Transformers (Recommended)

Install Dependencies

Set up Python environment with required libraries

$ pip install torch transformers accelerate

Download and Run Baichuan2-13B-Chat

Load the model with 4-bit quantization via bitsandbytes

$ pip install bitsandbytes python -c " from transformers import AutoModelForCausalLM, AutoTokenizer import torch tokenizer = AutoTokenizer.from_pretrained('baichuan-inc/Baichuan2-13B-Chat', trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained('baichuan-inc/Baichuan2-13B-Chat', torch_dtype=torch.float16, device_map='auto', trust_remote_code=True, load_in_4bit=True) messages = [{'role': 'user', 'content': 'What is the capital of China?'}] response = model.chat(tokenizer, messages) print(response) "

Important: Baichuan2 requires trust_remote_code=True because it uses custom modeling code. Review the code at the HuggingFace repo before enabling this flag in production.

Option 2: llama.cpp (GGUF format)

Community-converted GGUF files are available on HuggingFace for use with llama.cpp. Search for "Baichuan2-13B GGUF" on HuggingFace. Note that Baichuan2 is not natively available on Ollama as of March 2026, but GGUF files work with llama.cpp directly.

Clone and build llama.cpp

Build from source for GPU acceleration

$ git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make LLAMA_CUDA=1

Download GGUF model and run

Download a community GGUF conversion and run inference

$ ./llama-cli -m ./baichuan2-13b-chat.Q4_K_M.gguf -p "What is machine learning?" -n 256

Bilingual Chinese-English Capabilities

Chinese Language Strengths

* Simplified and traditional Chinese text generation
* Chinese reading comprehension and Q&A
* Chinese-specific knowledge (history, geography, culture)
* Formal and informal Chinese writing styles
* Chinese-to-English translation

CMMLU 59% and C-Eval 58% were competitive for September 2023 among 13B-class models.

English Capabilities

* Basic English text generation and Q&A
* English-to-Chinese translation
* Cross-lingual summarization
* Bilingual content creation
* English MMLU: 59% (comparable to Llama-2-13B's 55%)

English is secondary; for English-only tasks, Llama 2 or Mistral 7B are better choices.

Realistic Use Cases

Chinese Customer Support

Handling Chinese-language customer queries with bilingual fallback to English

Translation Drafts

First-pass Chinese-English translation for human review (not production-quality alone)

Chinese Content Generation

Blog posts, marketing copy, and social media content in Chinese

Local Chinese LLM Alternatives

Chinese LLM Comparison (All Locally Runnable)

Model	Size	RAM Required	Speed	Quality	Cost/Month
Baichuan2-13B	13B	8-26GB	Medium	59%	Free
Qwen 2.5-14B	14B	8-28GB	Fast	79%	Free
Yi-34B	34B	20-68GB	Slow	76%	Free
ChatGLM3-6B	6B	4-12GB	Fast	50%	Free
Qwen 2.5-7B	7B	4-14GB	Fast	74%	Free

Quality scores are MMLU percentages. All models support Chinese and English. Qwen 2.5 models are available on Ollama.

Best Overall: Qwen 2.5-14B

* MMLU 79% (vs Baichuan2's 59%)
* 128K context (vs 4K)
* Available on Ollama
* Apache 2.0 license
* Released December 2024

Budget Pick: Qwen 2.5-7B

* MMLU 74% at half the size
* Runs on 4GB VRAM (Q4)
* 128K context window
* Excellent Chinese performance
* ollama run qwen2.5:7b

Lightweight: ChatGLM3-6B

* Only 6B parameters
* 4GB VRAM with quantization
* Good for basic Chinese chatbots
* From Zhipu AI / Tsinghua
* Weaker on benchmarks (MMLU ~50%)

Honest Assessment & Recommendations

When Baichuan2-13B Makes Sense

* You need a well-documented, tested Chinese LLM with known behavior
* Your application was built around Baichuan2 and migration is costly
* You are studying the evolution of Chinese LLMs for research
* You need a stable model with predictable outputs (no frequent updates)

When to Choose Something Else

* Starting a new project in 2025-2026 (use Qwen 2.5 instead)
* Need strong coding ability (HumanEval 17% is very low)
* Need long context (>4K tokens)
* Need strong math reasoning (GSM8K 52.8%)
* English-only tasks (Llama 3 or Mistral are better)

Bottom Line

Baichuan2-13B was a solid Chinese bilingual LLM for September 2023 and helped establish Baichuan Intelligence as a serious Chinese AI player. Its CMMLU 59% and MMLU 59% scores were competitive at the time. However, the field has moved fast: Qwen 2.5-14B now scores 79% on MMLU with 128K context and is available on Ollama, making it the clear choice for new Chinese NLP projects. Baichuan2 remains useful for existing deployments and as a reference point for Chinese LLM development.

Resources & Further Reading

Official Sources

* Baichuan 2 Technical Report (arXiv:2309.10305)
* Baichuan2-13B-Chat on HuggingFace
* Baichuan2-13B-Base on HuggingFace
* Baichuan2 GitHub Repository

Chinese NLP Benchmarks

* CMMLU Benchmark - Chinese Massive Multitask Language Understanding
* C-Eval Benchmark - Chinese evaluation suite
* LM Evaluation Harness - Open-source benchmark framework

Related Models on This Site

* Qwen 2.5-14B - Recommended successor for Chinese NLP
* Qwen 2.5-7B - Lighter Chinese LLM alternative
* Yi-34B - Larger bilingual model from 01.AI
* Aquila-7B - Another Chinese AI model
* GLM-4-5 - ChatGLM series from Zhipu AI

Baichuan2-13B Architecture

Decoder-only transformer with RoPE positional embeddings and SwiGLU activation, 13B parameters, 4096 context

👤

You

💻

Your ComputerAI Processing

👤

🌐

🏢

Cloud AI: You → Internet → Company Servers

Frequently Asked Questions

What is Baichuan2-13B and who made it?

Baichuan2-13B is a 13-billion parameter bilingual language model developed by Baichuan Intelligence (formerly Baichuan AI), a Chinese AI company founded in 2023 by Wang Xiaochuan, former CEO of Sogou. It was released in September 2023 and is trained on 2.6 trillion tokens with a focus on Chinese and English language tasks.

What are the hardware requirements for running Baichuan2-13B locally?

VRAM depends on quantization: Q4_K_M requires about 8GB VRAM (RTX 3060 12GB or RTX 4060 Ti 16GB works well), Q8_0 needs about 14GB, and full FP16 requires about 26GB (RTX 3090 or A6000). System RAM should be 16GB minimum with 32GB recommended. The Q4 quantized version is the most practical for consumer hardware.

How does Baichuan2-13B compare to Qwen 2.5?

Baichuan2-13B (September 2023) has been surpassed by newer Chinese LLMs. Qwen 2.5-14B scores around 79% on MMLU versus Baichuan2-13B's 59%, has 128K context versus 4K, and is also available under a permissive license. For new projects in 2025-2026, Qwen 2.5 is the recommended choice for Chinese NLP tasks.

Is Baichuan2-13B good for Chinese language tasks?

It was competitive for Chinese NLP when released in September 2023, scoring 59% on CMMLU and 58% on C-Eval. However, it has been surpassed by Qwen 2.5, Yi-1.5, and ChatGLM4 series. It remains useful for understanding the evolution of Chinese LLMs and for environments where a smaller, well-tested model is preferred.

Can Baichuan2-13B be used commercially?

Yes. Baichuan2-13B is released under the Baichuan 2 Community License Agreement, which permits commercial use. Organizations with over 100 million monthly active users need to apply for a separate commercial license from Baichuan Intelligence.

Was this helpful?

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Explore the Learning Path See pricing

Reading now

Join the discussion

Related Guides

Continue your local AI journey with these comprehensive guides

View All Local AI Guides

🎯

AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Start free Browse courses first

Or own it for life — Lifetime $149 $599, pay once

Training your whole team? Get a team quote →

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

GitHub LinkedIn Twitter

📅 Published: 2023-09-25🔄 Last Updated: 2026-03-16✓ Manually Reviewed

Baichuan2-13B: Bilingual Chinese-English LLM

Technical Specifications

Baichuan Intelligence & Model Background

Key Technical Details

Source Citations

Real Benchmark Results

Chinese Benchmarks (13B-Chat)

CMMLU Score (%)

General Benchmarks (13B-Chat)

MMLU Score (%)

Full Benchmark Breakdown (Baichuan2-13B)

VRAM Requirements by Quantization

Memory Usage During Inference (Q4_K_M)

Memory Usage Over Time

Installation & Setup (HuggingFace)

System Requirements

System Requirements

Option 1: HuggingFace Transformers (Recommended)

Install Dependencies

Download and Run Baichuan2-13B-Chat

Option 2: llama.cpp (GGUF format)

Clone and build llama.cpp

Download GGUF model and run

Bilingual Chinese-English Capabilities

Chinese Language Strengths

English Capabilities

Realistic Use Cases

Chinese Customer Support

Translation Drafts

Chinese Content Generation

Local Chinese LLM Alternatives

Chinese LLM Comparison (All Locally Runnable)

Best Overall: Qwen 2.5-14B

Budget Pick: Qwen 2.5-7B

Lightweight: ChatGLM3-6B

Honest Assessment & Recommendations

When Baichuan2-13B Makes Sense

When to Choose Something Else

Bottom Line

Resources & Further Reading

Official Sources

Chinese NLP Benchmarks

Related Models on This Site

Baichuan2-13B Architecture

Frequently Asked Questions

What is Baichuan2-13B and who made it?

What are the hardware requirements for running Baichuan2-13B locally?

How does Baichuan2-13B compare to Qwen 2.5?

Is Baichuan2-13B good for Chinese language tasks?

Can Baichuan2-13B be used commercially?

Build Real AI on Your Machine

Related Guides

Go from reading about AI to building with AI

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Found your model? Now build something with it.