BAAI — June 2023

Aquila 7B

BAAI's Chinese-English Bilingual Model

Updated: March 16, 2026

Historical Context

Aquila 7B was released in June 2023 by BAAI (Beijing Academy of Artificial Intelligence). It was notable as one of the first Chinese open-source LLMs with native bilingual Chinese-English capabilities. In 2026, it has been superseded by models like Qwen 2.5, Yi, and DeepSeek which offer dramatically better Chinese language performance at the same or smaller sizes.

7B
Parameters
2048
Context Tokens
CN+EN
Bilingual
Free
Open Source

What Is Aquila 7B?

Aquila 7B is a bilingual (Chinese-English) language model developed by BAAI (Beijing Academy of Artificial Intelligence, 智源研究院). It was one of the early Chinese open-source LLMs, released alongside the FlagAI framework for training and deploying large language models.

The key differentiator of Aquila was its training data composition: approximately 40% Chinese text and 60% English text, giving it stronger Chinese language understanding than models like LLaMA which were trained almost entirely on English data. BAAI also released AquilaChat, an instruction-tuned version for conversational tasks.

The model was part of BAAI's broader FlagAI ecosystem, which included Aquila 7B, Aquila 33B, and later Aquila2 models with improved performance and longer context windows.

Technical Architecture

Model Architecture

  • Type: Transformer decoder-only (GPT-style)
  • Parameters: ~7 billion
  • Hidden Size: 4096
  • Layers: 32 transformer blocks
  • Attention Heads: 32
  • Context Length: 2048 tokens
  • Vocabulary: ~100,000 tokens (expanded for Chinese)
  • Positional Encoding: Rotary Position Embeddings (RoPE)

Training Details

  • Training Data: ~600B tokens (Chinese + English)
  • Chinese Ratio: ~40% of training corpus
  • English Ratio: ~60% of training corpus
  • Data Sources: Web crawl, books, academic papers, code
  • Framework: FlagAI (BAAI's training framework)
  • Organization: BAAI (智源研究院), Beijing
  • Release Date: June 2023
  • Variants: Aquila 7B (base), AquilaChat 7B (instruction-tuned)

Architecture Notes

Aquila's architecture is similar to LLaMA but with a significantly larger vocabulary (~100K vs LLaMA's 32K) to better handle Chinese characters and subwords. The expanded vocabulary allows more efficient tokenization of Chinese text — fewer tokens per sentence compared to models with English-centric tokenizers. This is a meaningful advantage for Chinese text processing speed and context utilization.

Chinese-English Bilingual Design

Aquila's primary value proposition in 2023 was its bilingual capability. At the time, most open-source LLMs (LLaMA, Falcon, MPT) were trained almost exclusively on English data and performed poorly on Chinese tasks. Aquila addressed this gap:

Chinese Language Strengths

  • Native Chinese text understanding (not just translated)
  • Chinese vocabulary coverage via expanded tokenizer
  • Classical and simplified Chinese support
  • Chinese cultural context awareness
  • Chinese-to-English and English-to-Chinese translation

Limitations

  • 2048 token context — very short for document analysis
  • Base model (Aquila 7B) is not instruction-tuned — use AquilaChat instead
  • Modest benchmark scores compared to 2024+ models
  • Limited code generation capability
  • No built-in safety training (RLHF) in base model

Chinese Data Compliance

One advantage of Aquila for organizations operating in China: BAAI is a Chinese institution, and the model's training data was curated with Chinese regulatory requirements in mind. For companies needing AI models that comply with Chinese data governance regulations, BAAI models may have advantages over Western-trained models. However, consult legal counsel for specific compliance questions.

Honest Performance Assessment

Benchmark Context

Aquila 7B was a mid-2023 model. Its benchmark performance was modest compared to contemporary models like LLaMA 2 7B and Mistral 7B (released months later). BAAI published limited benchmark data. The scores below are from BAAI's reported results and community evaluations.

Available Benchmark Data

BenchmarkAquila 7BLLaMA 7BLLaMA 2 7BMistral 7B
MMLU (5-shot)~27%35.1%45.3%60.1%
C-Eval (Chinese)~34%~25%~28%~30%
CMMLU (Chinese)~31%~25%~27%~30%
HellaSwag~67%76.1%77.2%81.3%

Sources: BAAI model card (huggingface.co/BAAI/Aquila-7B), Open LLM Leaderboard. Aquila 7B scores are approximate from BAAI reports. Chinese benchmarks (C-Eval, CMMLU) show Aquila's advantage over English-only models, while English benchmarks (MMLU, HellaSwag) show it trailing behind.

Where Aquila Was Useful (2023)

  • Chinese text generation and understanding
  • Chinese-English bilingual tasks
  • Organizations needing Chinese-compliant AI
  • Research on bilingual model training
  • Basic Chinese NLP when no better option existed

Where Aquila Falls Short

  • English-only tasks (LLaMA 2, Mistral much better)
  • Complex reasoning and math
  • Code generation
  • Long-document processing (2048 token limit)
  • Modern Chinese tasks (Qwen 2.5 dramatically better)

VRAM Requirements by Quantization

QuantizationFile SizeVRAM RequiredQuality ImpactNotes
Q4_0~4.0 GB~5.0 GBNoticeable lossChinese quality affected more than English
Q4_K_M~4.3 GB~5.3 GBAcceptableBest balance for bilingual use
Q5_K_M~5.0 GB~6.0 GBMinimal lossGood Chinese text quality
Q8_0~7.5 GB~8.5 GBNear-losslessRecommended for research
FP16~14 GB~15 GBFull precision24GB+ GPU required

Note: Aquila's larger vocabulary (~100K tokens vs 32K) means slightly more VRAM compared to LLaMA 7B at the same quantization level due to the larger embedding table.

Running Aquila 7B

Availability Note

Aquila 7B is not available on Ollama. It can be run via HuggingFace Transformers or BAAI's FlagAI framework. Community GGUF conversions may exist on HuggingFace for use with llama.cpp. For Chinese language tasks, ollama run qwen2.5:7b is a far better option available directly on Ollama.

Using HuggingFace Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "BAAI/Aquila-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Chinese prompt example
prompt = "请解释什么是人工智能,以及它在日常生活中的应用。"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: trust_remote_code=True is required because Aquila uses custom model code. For the instruction-tuned version, use "BAAI/AquilaChat-7B" instead.

AquilaChat 7B (Instruction-Tuned)

For conversational tasks, use AquilaChat instead of the base Aquila model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "BAAI/AquilaChat-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# AquilaChat uses a specific chat format
prompt = """Human: 用中文解释量子计算的基本原理
Assistant:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Aquila Model Family

ModelSizeTypeContextNotes
Aquila 7B7BBase2048This page — bilingual base model
AquilaChat 7B7BChat2048Instruction-tuned for conversations
Aquila 33B33BBase2048Larger bilingual model
Aquila2 7B7BBase4096Improved version with longer context
AquilaChat2 7B7BChat4096Improved chat model

License

BAAI Aquila License

Aquila 7B was released under the BAAI Aquila License, which permits both research and commercial use but includes specific requirements:

  • Commercial use is allowed with proper attribution
  • Redistribution requires including the license text
  • Derivative models must acknowledge BAAI as the original developer
  • Some later Aquila2 models were released under Apache 2.0

Check the specific model card on HuggingFace for the exact license terms of each variant.

Modern Alternatives (2026)

For Chinese language tasks or bilingual Chinese-English work, these modern models dramatically outperform Aquila 7B:

ModelSizeMMLUC-EvalContextLicenseOllama
Aquila 7B7B~27%~34%2KBAAINot available
Qwen 2.5 7B7B~74%~80%128KApache 2.0ollama run qwen2.5:7b
Yi 1.5 9B9B~69%~74%4KApache 2.0ollama run yi:9b
DeepSeek LLM 7B7B~49%~45%4KCustomollama run deepseek-llm:7b
GLM-4 9B9B~72%~76%128KCustomollama run glm4:9b

Qwen 2.5 7B is the strongest recommendation for Chinese language tasks — it scores 2-3x higher than Aquila on both English and Chinese benchmarks while being available on Ollama with Apache 2.0 license.

Reading now
Join the discussion

Was this helpful?

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Frequently Asked Questions

What is Aquila 7B and who made it?

Aquila 7B is a 7-billion parameter bilingual (Chinese-English) language model created by BAAI (Beijing Academy of Artificial Intelligence, 智源研究院) and released in June 2023. It was one of the first Chinese open-source LLMs with native bilingual capabilities, trained on approximately 40% Chinese and 60% English text data.

Can I run Aquila 7B on Ollama?

Aquila 7B is not available on Ollama. You can run it via HuggingFace Transformers with trust_remote_code=True, or use BAAI's FlagAI framework. For Chinese language tasks on Ollama, use ollama run qwen2.5:7b instead — it's dramatically better at both Chinese and English.

Is Aquila 7B still worth using in 2026?

For practical use, no. Qwen 2.5 7B scores ~74% on MMLU and ~80% on C-Eval compared to Aquila's ~27% and ~34% respectively, while offering 128K context, Ollama support, and Apache 2.0 licensing. Aquila is primarily of historical interest as an early Chinese open-source LLM.

What's the difference between Aquila and AquilaChat?

Aquila 7B is the base (pre-trained) model — it completes text but doesn't follow instructions well. AquilaChat 7B is the instruction-tuned version designed for conversations and following user prompts. For any interactive use, always use AquilaChat rather than the base Aquila model.

How much VRAM does Aquila 7B need?

Aquila 7B requires approximately 5GB VRAM with Q4_K_M quantization, 6GB with Q5_K_M, 8.5GB with Q8_0, or 15GB at full FP16 precision. Its larger vocabulary (~100K tokens) means slightly more memory than LLaMA 7B at the same quantization level.

Sources & References

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
📅 Published: October 29, 2025🔄 Last Updated: March 16, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

Free Tools & Calculators