★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

BAAI — June 2023

Aquila 7B

Name: Aquila 7B Specifications & Benchmarks
Creator: Local AI Master
License: https://opensource.org/licenses/MIT

BAAI's Chinese-English Bilingual Model

Updated: March 16, 2026

Historical Context

Aquila 7B was released in June 2023 by BAAI (Beijing Academy of Artificial Intelligence). It was notable as one of the first Chinese open-source LLMs with native bilingual Chinese-English capabilities. In 2026, it has been superseded by models like Qwen 2.5, Yi, and DeepSeek which offer dramatically better Chinese language performance at the same or smaller sizes.

Parameters

2048

Context Tokens

CN+EN

Bilingual

Free

Open Source

What Is Aquila 7B?

Aquila 7B is a bilingual (Chinese-English) language model developed by BAAI (Beijing Academy of Artificial Intelligence, 智源研究院). It was one of the early Chinese open-source LLMs, released alongside the FlagAI framework for training and deploying large language models.

The key differentiator of Aquila was its training data composition: approximately 40% Chinese text and 60% English text, giving it stronger Chinese language understanding than models like LLaMA which were trained almost entirely on English data. BAAI also released AquilaChat, an instruction-tuned version for conversational tasks.

The model was part of BAAI's broader FlagAI ecosystem, which included Aquila 7B, Aquila 33B, and later Aquila2 models with improved performance and longer context windows.

Technical Architecture

Model Architecture

Type: Transformer decoder-only (GPT-style)
Parameters: ~7 billion
Hidden Size: 4096
Layers: 32 transformer blocks
Attention Heads: 32
Context Length: 2048 tokens
Vocabulary: ~100,000 tokens (expanded for Chinese)
Positional Encoding: Rotary Position Embeddings (RoPE)

Training Details

Training Data: ~600B tokens (Chinese + English)
Chinese Ratio: ~40% of training corpus
English Ratio: ~60% of training corpus
Data Sources: Web crawl, books, academic papers, code
Framework: FlagAI (BAAI's training framework)
Organization: BAAI (智源研究院), Beijing
Release Date: June 2023
Variants: Aquila 7B (base), AquilaChat 7B (instruction-tuned)

Architecture Notes

Aquila's architecture is similar to LLaMA but with a significantly larger vocabulary (~100K vs LLaMA's 32K) to better handle Chinese characters and subwords. The expanded vocabulary allows more efficient tokenization of Chinese text — fewer tokens per sentence compared to models with English-centric tokenizers. This is a meaningful advantage for Chinese text processing speed and context utilization.

Chinese-English Bilingual Design

Aquila's primary value proposition in 2023 was its bilingual capability. At the time, most open-source LLMs (LLaMA, Falcon, MPT) were trained almost exclusively on English data and performed poorly on Chinese tasks. Aquila addressed this gap:

Chinese Language Strengths

Native Chinese text understanding (not just translated)
Chinese vocabulary coverage via expanded tokenizer
Classical and simplified Chinese support
Chinese cultural context awareness
Chinese-to-English and English-to-Chinese translation

Limitations

2048 token context — very short for document analysis
Base model (Aquila 7B) is not instruction-tuned — use AquilaChat instead
Modest benchmark scores compared to 2024+ models
Limited code generation capability
No built-in safety training (RLHF) in base model

Chinese Data Compliance

One advantage of Aquila for organizations operating in China: BAAI is a Chinese institution, and the model's training data was curated with Chinese regulatory requirements in mind. For companies needing AI models that comply with Chinese data governance regulations, BAAI models may have advantages over Western-trained models. However, consult legal counsel for specific compliance questions.

Honest Performance Assessment

Benchmark Context

Aquila 7B was a mid-2023 model. Its benchmark performance was modest compared to contemporary models like LLaMA 2 7B and Mistral 7B (released months later). BAAI published limited benchmark data. The scores below are from BAAI's reported results and community evaluations.

Available Benchmark Data

Benchmark	Aquila 7B	LLaMA 7B	LLaMA 2 7B	Mistral 7B
MMLU (5-shot)	~27%	35.1%	45.3%	60.1%
C-Eval (Chinese)	~34%	~25%	~28%	~30%
CMMLU (Chinese)	~31%	~25%	~27%	~30%
HellaSwag	~67%	76.1%	77.2%	81.3%

Sources: BAAI model card (huggingface.co/BAAI/Aquila-7B), Open LLM Leaderboard. Aquila 7B scores are approximate from BAAI reports. Chinese benchmarks (C-Eval, CMMLU) show Aquila's advantage over English-only models, while English benchmarks (MMLU, HellaSwag) show it trailing behind.

Where Aquila Was Useful (2023)

Chinese text generation and understanding
Chinese-English bilingual tasks
Organizations needing Chinese-compliant AI
Research on bilingual model training
Basic Chinese NLP when no better option existed

Where Aquila Falls Short

English-only tasks (LLaMA 2, Mistral much better)
Complex reasoning and math
Code generation
Long-document processing (2048 token limit)
Modern Chinese tasks (Qwen 2.5 dramatically better)

VRAM Requirements by Quantization

Quantization	File Size	VRAM Required	Quality Impact	Notes
Q4_0	~4.0 GB	~5.0 GB	Noticeable loss	Chinese quality affected more than English
Q4_K_M	~4.3 GB	~5.3 GB	Acceptable	Best balance for bilingual use
Q5_K_M	~5.0 GB	~6.0 GB	Minimal loss	Good Chinese text quality
Q8_0	~7.5 GB	~8.5 GB	Near-lossless	Recommended for research
FP16	~14 GB	~15 GB	Full precision	24GB+ GPU required

Note: Aquila's larger vocabulary (~100K tokens vs 32K) means slightly more VRAM compared to LLaMA 7B at the same quantization level due to the larger embedding table.

Running Aquila 7B

Availability Note

Aquila 7B is not available on Ollama. It can be run via HuggingFace Transformers or BAAI's FlagAI framework. Community GGUF conversions may exist on HuggingFace for use with llama.cpp. For Chinese language tasks, ollama run qwen2.5:7b is a far better option available directly on Ollama.

Using HuggingFace Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "BAAI/Aquila-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Chinese prompt example
prompt = "请解释什么是人工智能，以及它在日常生活中的应用。"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: trust_remote_code=True is required because Aquila uses custom model code. For the instruction-tuned version, use "BAAI/AquilaChat-7B" instead.

AquilaChat 7B (Instruction-Tuned)

For conversational tasks, use AquilaChat instead of the base Aquila model:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "BAAI/AquilaChat-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# AquilaChat uses a specific chat format
prompt = """Human: 用中文解释量子计算的基本原理
Assistant:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Aquila Model Family

Model	Size	Type	Context	Notes
Aquila 7B	7B	Base	2048	This page — bilingual base model
AquilaChat 7B	7B	Chat	2048	Instruction-tuned for conversations
Aquila 33B	33B	Base	2048	Larger bilingual model
Aquila2 7B	7B	Base	4096	Improved version with longer context
AquilaChat2 7B	7B	Chat	4096	Improved chat model

License

BAAI Aquila License

Aquila 7B was released under the BAAI Aquila License, which permits both research and commercial use but includes specific requirements:

Commercial use is allowed with proper attribution
Redistribution requires including the license text
Derivative models must acknowledge BAAI as the original developer
Some later Aquila2 models were released under Apache 2.0

Check the specific model card on HuggingFace for the exact license terms of each variant.

Modern Alternatives (2026)

For Chinese language tasks or bilingual Chinese-English work, these modern models dramatically outperform Aquila 7B:

Model	Size	MMLU	C-Eval	Context	License	Ollama
Aquila 7B	7B	~27%	~34%	2K	BAAI	Not available
Qwen 2.5 7B	7B	~74%	~80%	128K	Apache 2.0	ollama run qwen2.5:7b
Yi 1.5 9B	9B	~69%	~74%	4K	Apache 2.0	ollama run yi:9b
DeepSeek LLM 7B	7B	~49%	~45%	4K	Custom	ollama run deepseek-llm:7b
GLM-4 9B	9B	~72%	~76%	128K	Custom	ollama run glm4:9b

Qwen 2.5 7B is the strongest recommendation for Chinese language tasks — it scores 2-3x higher than Aquila on both English and Chinese benchmarks while being available on Ollama with Apache 2.0 license.

Reading now

Join the discussion

Was this helpful?

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Explore the Learning Path See pricing

Frequently Asked Questions

What is Aquila 7B and who made it?

Aquila 7B is a 7-billion parameter bilingual (Chinese-English) language model created by BAAI (Beijing Academy of Artificial Intelligence, 智源研究院) and released in June 2023. It was one of the first Chinese open-source LLMs with native bilingual capabilities, trained on approximately 40% Chinese and 60% English text data.

Can I run Aquila 7B on Ollama?

Aquila 7B is not available on Ollama. You can run it via HuggingFace Transformers with trust_remote_code=True, or use BAAI's FlagAI framework. For Chinese language tasks on Ollama, use ollama run qwen2.5:7b instead — it's dramatically better at both Chinese and English.

Is Aquila 7B still worth using in 2026?

For practical use, no. Qwen 2.5 7B scores ~74% on MMLU and ~80% on C-Eval compared to Aquila's ~27% and ~34% respectively, while offering 128K context, Ollama support, and Apache 2.0 licensing. Aquila is primarily of historical interest as an early Chinese open-source LLM.

What's the difference between Aquila and AquilaChat?

Aquila 7B is the base (pre-trained) model — it completes text but doesn't follow instructions well. AquilaChat 7B is the instruction-tuned version designed for conversations and following user prompts. For any interactive use, always use AquilaChat rather than the base Aquila model.

How much VRAM does Aquila 7B need?

Aquila 7B requires approximately 5GB VRAM with Q4_K_M quantization, 6GB with Q5_K_M, 8.5GB with Q8_0, or 15GB at full FP16 precision. Its larger vocabulary (~100K tokens) means slightly more memory than LLaMA 7B at the same quantization level.

Sources & References

BAAI/Aquila-7B — HuggingFace Model Card — Official model page with specifications and license
BAAI/AquilaChat-7B — HuggingFace Model Card — Instruction-tuned version
github.com/FlagAI-Open/FlagAI — BAAI's framework for training and deploying large models
github.com/FlagAI-Open/Aquila2 — Aquila2 series (improved successor models)

🎯

AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Start free Browse courses first

Or own it for life — Lifetime $149 $599, pay once

Training your whole team? Get a team quote →

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

GitHub LinkedIn Twitter

📅 Published: October 29, 2025🔄 Last Updated: March 16, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

View All Local AI Guides

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯

AI Learning Path

Found your model? Now build something with it.

20 hands-on courses — RAG, agents, fine-tuning — all running locally. First chapter free, no card.

Start free Browse courses first

Or own it for life — Lifetime $149 $599, pay once

Training your whole team? Get a team quote →