Aquila 7B
BAAI's Chinese-English Bilingual Model
Updated: March 16, 2026
Historical Context
Aquila 7B was released in June 2023 by BAAI (Beijing Academy of Artificial Intelligence). It was notable as one of the first Chinese open-source LLMs with native bilingual Chinese-English capabilities. In 2026, it has been superseded by models like Qwen 2.5, Yi, and DeepSeek which offer dramatically better Chinese language performance at the same or smaller sizes.
What Is Aquila 7B?
Aquila 7B is a bilingual (Chinese-English) language model developed by BAAI (Beijing Academy of Artificial Intelligence, 智源研究院). It was one of the early Chinese open-source LLMs, released alongside the FlagAI framework for training and deploying large language models.
The key differentiator of Aquila was its training data composition: approximately 40% Chinese text and 60% English text, giving it stronger Chinese language understanding than models like LLaMA which were trained almost entirely on English data. BAAI also released AquilaChat, an instruction-tuned version for conversational tasks.
The model was part of BAAI's broader FlagAI ecosystem, which included Aquila 7B, Aquila 33B, and later Aquila2 models with improved performance and longer context windows.
Technical Architecture
Model Architecture
- Type: Transformer decoder-only (GPT-style)
- Parameters: ~7 billion
- Hidden Size: 4096
- Layers: 32 transformer blocks
- Attention Heads: 32
- Context Length: 2048 tokens
- Vocabulary: ~100,000 tokens (expanded for Chinese)
- Positional Encoding: Rotary Position Embeddings (RoPE)
Training Details
- Training Data: ~600B tokens (Chinese + English)
- Chinese Ratio: ~40% of training corpus
- English Ratio: ~60% of training corpus
- Data Sources: Web crawl, books, academic papers, code
- Framework: FlagAI (BAAI's training framework)
- Organization: BAAI (智源研究院), Beijing
- Release Date: June 2023
- Variants: Aquila 7B (base), AquilaChat 7B (instruction-tuned)
Architecture Notes
Aquila's architecture is similar to LLaMA but with a significantly larger vocabulary (~100K vs LLaMA's 32K) to better handle Chinese characters and subwords. The expanded vocabulary allows more efficient tokenization of Chinese text — fewer tokens per sentence compared to models with English-centric tokenizers. This is a meaningful advantage for Chinese text processing speed and context utilization.
Chinese-English Bilingual Design
Aquila's primary value proposition in 2023 was its bilingual capability. At the time, most open-source LLMs (LLaMA, Falcon, MPT) were trained almost exclusively on English data and performed poorly on Chinese tasks. Aquila addressed this gap:
Chinese Language Strengths
- Native Chinese text understanding (not just translated)
- Chinese vocabulary coverage via expanded tokenizer
- Classical and simplified Chinese support
- Chinese cultural context awareness
- Chinese-to-English and English-to-Chinese translation
Limitations
- 2048 token context — very short for document analysis
- Base model (Aquila 7B) is not instruction-tuned — use AquilaChat instead
- Modest benchmark scores compared to 2024+ models
- Limited code generation capability
- No built-in safety training (RLHF) in base model
Chinese Data Compliance
One advantage of Aquila for organizations operating in China: BAAI is a Chinese institution, and the model's training data was curated with Chinese regulatory requirements in mind. For companies needing AI models that comply with Chinese data governance regulations, BAAI models may have advantages over Western-trained models. However, consult legal counsel for specific compliance questions.
Honest Performance Assessment
Benchmark Context
Aquila 7B was a mid-2023 model. Its benchmark performance was modest compared to contemporary models like LLaMA 2 7B and Mistral 7B (released months later). BAAI published limited benchmark data. The scores below are from BAAI's reported results and community evaluations.
Available Benchmark Data
| Benchmark | Aquila 7B | LLaMA 7B | LLaMA 2 7B | Mistral 7B |
|---|---|---|---|---|
| MMLU (5-shot) | ~27% | 35.1% | 45.3% | 60.1% |
| C-Eval (Chinese) | ~34% | ~25% | ~28% | ~30% |
| CMMLU (Chinese) | ~31% | ~25% | ~27% | ~30% |
| HellaSwag | ~67% | 76.1% | 77.2% | 81.3% |
Sources: BAAI model card (huggingface.co/BAAI/Aquila-7B), Open LLM Leaderboard. Aquila 7B scores are approximate from BAAI reports. Chinese benchmarks (C-Eval, CMMLU) show Aquila's advantage over English-only models, while English benchmarks (MMLU, HellaSwag) show it trailing behind.
Where Aquila Was Useful (2023)
- Chinese text generation and understanding
- Chinese-English bilingual tasks
- Organizations needing Chinese-compliant AI
- Research on bilingual model training
- Basic Chinese NLP when no better option existed
Where Aquila Falls Short
- English-only tasks (LLaMA 2, Mistral much better)
- Complex reasoning and math
- Code generation
- Long-document processing (2048 token limit)
- Modern Chinese tasks (Qwen 2.5 dramatically better)
VRAM Requirements by Quantization
| Quantization | File Size | VRAM Required | Quality Impact | Notes |
|---|---|---|---|---|
| Q4_0 | ~4.0 GB | ~5.0 GB | Noticeable loss | Chinese quality affected more than English |
| Q4_K_M | ~4.3 GB | ~5.3 GB | Acceptable | Best balance for bilingual use |
| Q5_K_M | ~5.0 GB | ~6.0 GB | Minimal loss | Good Chinese text quality |
| Q8_0 | ~7.5 GB | ~8.5 GB | Near-lossless | Recommended for research |
| FP16 | ~14 GB | ~15 GB | Full precision | 24GB+ GPU required |
Note: Aquila's larger vocabulary (~100K tokens vs 32K) means slightly more VRAM compared to LLaMA 7B at the same quantization level due to the larger embedding table.
Running Aquila 7B
Availability Note
Aquila 7B is not available on Ollama. It can be run via HuggingFace Transformers or BAAI's FlagAI framework. Community GGUF conversions may exist on HuggingFace for use with llama.cpp. For Chinese language tasks, ollama run qwen2.5:7b is a far better option available directly on Ollama.
Using HuggingFace Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "BAAI/Aquila-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Chinese prompt example
prompt = "请解释什么是人工智能,以及它在日常生活中的应用。"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Note: trust_remote_code=True is required because Aquila uses custom model code. For the instruction-tuned version, use "BAAI/AquilaChat-7B" instead.
AquilaChat 7B (Instruction-Tuned)
For conversational tasks, use AquilaChat instead of the base Aquila model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "BAAI/AquilaChat-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# AquilaChat uses a specific chat format
prompt = """Human: 用中文解释量子计算的基本原理
Assistant:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Aquila Model Family
| Model | Size | Type | Context | Notes |
|---|---|---|---|---|
| Aquila 7B | 7B | Base | 2048 | This page — bilingual base model |
| AquilaChat 7B | 7B | Chat | 2048 | Instruction-tuned for conversations |
| Aquila 33B | 33B | Base | 2048 | Larger bilingual model |
| Aquila2 7B | 7B | Base | 4096 | Improved version with longer context |
| AquilaChat2 7B | 7B | Chat | 4096 | Improved chat model |
License
BAAI Aquila License
Aquila 7B was released under the BAAI Aquila License, which permits both research and commercial use but includes specific requirements:
- Commercial use is allowed with proper attribution
- Redistribution requires including the license text
- Derivative models must acknowledge BAAI as the original developer
- Some later Aquila2 models were released under Apache 2.0
Check the specific model card on HuggingFace for the exact license terms of each variant.
Modern Alternatives (2026)
For Chinese language tasks or bilingual Chinese-English work, these modern models dramatically outperform Aquila 7B:
| Model | Size | MMLU | C-Eval | Context | License | Ollama |
|---|---|---|---|---|---|---|
| Aquila 7B | 7B | ~27% | ~34% | 2K | BAAI | Not available |
| Qwen 2.5 7B | 7B | ~74% | ~80% | 128K | Apache 2.0 | ollama run qwen2.5:7b |
| Yi 1.5 9B | 9B | ~69% | ~74% | 4K | Apache 2.0 | ollama run yi:9b |
| DeepSeek LLM 7B | 7B | ~49% | ~45% | 4K | Custom | ollama run deepseek-llm:7b |
| GLM-4 9B | 9B | ~72% | ~76% | 128K | Custom | ollama run glm4:9b |
Qwen 2.5 7B is the strongest recommendation for Chinese language tasks — it scores 2-3x higher than Aquila on both English and Chinese benchmarks while being available on Ollama with Apache 2.0 license.
Was this helpful?
Frequently Asked Questions
What is Aquila 7B and who made it?
Aquila 7B is a 7-billion parameter bilingual (Chinese-English) language model created by BAAI (Beijing Academy of Artificial Intelligence, 智源研究院) and released in June 2023. It was one of the first Chinese open-source LLMs with native bilingual capabilities, trained on approximately 40% Chinese and 60% English text data.
Can I run Aquila 7B on Ollama?
Aquila 7B is not available on Ollama. You can run it via HuggingFace Transformers with trust_remote_code=True, or use BAAI's FlagAI framework. For Chinese language tasks on Ollama, use ollama run qwen2.5:7b instead — it's dramatically better at both Chinese and English.
Is Aquila 7B still worth using in 2026?
For practical use, no. Qwen 2.5 7B scores ~74% on MMLU and ~80% on C-Eval compared to Aquila's ~27% and ~34% respectively, while offering 128K context, Ollama support, and Apache 2.0 licensing. Aquila is primarily of historical interest as an early Chinese open-source LLM.
What's the difference between Aquila and AquilaChat?
Aquila 7B is the base (pre-trained) model — it completes text but doesn't follow instructions well. AquilaChat 7B is the instruction-tuned version designed for conversations and following user prompts. For any interactive use, always use AquilaChat rather than the base Aquila model.
How much VRAM does Aquila 7B need?
Aquila 7B requires approximately 5GB VRAM with Q4_K_M quantization, 6GB with Q5_K_M, 8.5GB with Q8_0, or 15GB at full FP16 precision. Its larger vocabulary (~100K tokens) means slightly more memory than LLaMA 7B at the same quantization level.
Sources & References
- BAAI/Aquila-7B — HuggingFace Model Card — Official model page with specifications and license
- BAAI/AquilaChat-7B — HuggingFace Model Card — Instruction-tuned version
- github.com/FlagAI-Open/FlagAI — BAAI's framework for training and deploying large models
- github.com/FlagAI-Open/Aquila2 — Aquila2 series (improved successor models)
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.
Related Guides
Continue your local AI journey with these comprehensive guides