Manticore 13B: Early Community Multi-Dataset Merge
Honest technical review of Manticore 13B -- a historically interesting 2023 community fine-tune that merged multiple training datasets on LLaMA-1 13B. Real MMLU ~50-53%, 2048 context window. Completely surpassed by modern models.
Table of Contents
Background and Architecture
Important Context
Manticore 13B was released in May-June 2023 by the Open Access AI Collective. It is a LLaMA-1 based community fine-tune that merged multiple training datasets. While historically interesting as an early example of community model merging, it has been completely surpassed by modern models like Llama 3.2, Phi-3, Mistral 7B, and Qwen 2.5 -- many of which are smaller yet far more capable. This page provides an honest technical assessment for historical reference.
What Is Manticore 13B?
Manticore 13B (openaccess-ai-collective/manticore-13b) is a community-created large language model built on Meta's original LLaMA-1 13B base model. It was created by the Open Access AI Collective in mid-2023 as an experiment in multi-dataset fine-tuning -- combining several popular instruction-tuning datasets into a single training run.
Model Details
| Base Model | LLaMA-1 13B (Meta) |
| Parameters | 13 billion |
| Context Length | 2,048 tokens |
| Architecture | Standard LLaMA transformer |
| Release Date | May-June 2023 |
| License | LLaMA License (non-commercial) |
| Creator | Open Access AI Collective |
Key Characteristics
- Multi-dataset merge: combined ShareGPT, Alpaca, coding, and other datasets
- Aimed for versatility rather than specialization
- Relatively uncensored compared to commercial models of the era
- Part of the early wave of community LLaMA fine-tunes
- Non-commercial license inherited from LLaMA-1
Multi-Dataset Merge Training
Manticore 13B's distinguishing feature was its training approach: combining multiple popular instruction-tuning datasets into a single fine-tuning run. This was an early experiment in what the community now calls "dataset merging" -- the idea that exposure to diverse training data could produce a more versatile model than single-dataset fine-tunes like pure Alpaca or pure ShareGPT models.
Training Datasets Used
ShareGPT
Conversations from ChatGPT shared by users -- gave conversational ability
Alpaca
Stanford's instruction-following dataset -- gave instruction compliance
GPT4All
Diverse instruction data -- broadened general knowledge
Coding Datasets
Code-related instruction data -- added basic coding ability
Historical note: In mid-2023, this multi-dataset approach was novel. Today, techniques like DPO, RLHF, and carefully curated synthetic data have largely replaced naive dataset merging. Models like Llama 3 and Qwen 2.5 achieve far better results with more sophisticated training pipelines.
Real Benchmark Performance
Benchmark Honesty Note
Manticore 13B is a community fine-tune of LLaMA-1 13B from 2023. Its MMLU performance is estimated at roughly 50-53%, which is typical for community 13B models of that era. It does not outperform GPT-4, Claude, or any other frontier model -- such claims would be absurd for any 13B community fine-tune from 2023. Benchmarks below compare it against its actual peers: other LLaMA-1 era community models.
MMLU Comparison (Peer Models)
MMLU Score (%) -- LLaMA-1 13B Era Models
Source: Community benchmarks from HuggingFace Open LLM Leaderboard (2023). Exact Manticore numbers are estimated from similar community 13B models.
Capability Estimates
Performance Metrics
Estimated capability scores based on community usage reports and comparable model benchmarks.
What Manticore 13B Can and Cannot Do
Reasonable For:
- +Basic conversational chat
- +Simple creative writing and roleplay
- +Basic question answering
- +Relatively uncensored outputs
Not Suitable For:
- -Production code generation (poor accuracy)
- -Complex reasoning or math
- -Long documents (2048 token limit)
- -Factual accuracy (prone to hallucination)
- -Commercial use (LLaMA-1 license restriction)
VRAM Requirements by Quantization
Manticore 13B GGUF quantized files were provided by TheBloke on HuggingFace. Here are the real VRAM requirements for different quantization levels:
| Quantization | File Size | VRAM Required | Quality Loss | Recommended GPU |
|---|---|---|---|---|
| Q4_K_M | ~7.4 GB | ~8 GB | Moderate | RTX 3060 12GB / RTX 4060 8GB |
| Q5_K_M | ~9.0 GB | ~10 GB | Low | RTX 3060 12GB / RTX 4070 |
| Q8_0 | ~13.8 GB | ~15 GB | Minimal | RTX 4070 Ti / RTX 3090 |
| FP16 | ~26 GB | ~28 GB | None | RTX 3090 / RTX 4090 |
Source: TheBloke/Manticore-13B-GGUF on HuggingFace. VRAM estimates include model weights + KV cache for 2048 context.
Installation Guide
Ollama Availability
Manticore 13B is not available in the official Ollama model library. You will need to use llama.cpp directly with GGUF files from HuggingFace, or create a custom Ollama Modelfile. The recommended approach is llama.cpp.
llama.cpp (Recommended)
The -ngl 35 flag offloads all layers to GPU. Reduce this number if you have less VRAM.
Custom Ollama Modelfile (Alternative)
Python (Transformers)
Note: Full FP16 loading requires ~26GB VRAM. Use load_in_4bit=True with bitsandbytes for reduced memory.
Honest Assessment
Manticore 13B was a product of its time. In mid-2023, the open-source LLM community was rapidly experimenting with fine-tuning Meta's leaked LLaMA-1 weights. Manticore's multi-dataset approach was innovative for its era, but the results were modest by today's standards.
What It Did Right
Demonstrated that combining multiple datasets could produce a more well-rounded model than single-dataset fine-tunes. Contributed to the community understanding of instruction tuning. Provided an accessible, relatively uncensored model for experimentation.
Limitations
Limited to 2048 context (LLaMA-1 limitation). Non-commercial license. No RLHF or preference optimization -- just supervised fine-tuning. Prone to hallucination. Mediocre code generation. Limited reasoning ability compared to even small modern models.
Why You Should Use Something Else Today
A modern 3B parameter model like Llama 3.2 3B or Phi-3 Mini will outperform Manticore 13B on virtually every benchmark while using less than half the VRAM. The only reason to run Manticore today would be historical curiosity or very specific uncensored use cases where you need a LLaMA-1 era model.
Better Modern Alternatives
Model Comparison
| Model | Size | RAM Required | Speed | Quality | Cost/Month |
|---|---|---|---|---|---|
| Manticore-13B | ~7.4GB (Q4_K_M GGUF) | ~10GB total | ~20-35 tok/s (GPU) | 51% | $0 (LLaMA license) |
| Vicuna-13B | ~7.4GB (Q4_K_M GGUF) | ~10GB total | ~20-35 tok/s (GPU) | 52% | $0 (LLaMA license) |
| Llama-3.2-3B | ~2.0GB (Q4_K_M GGUF) | ~4GB total | ~60-90 tok/s (GPU) | 63% | $0 (Meta license) |
| Phi-3 Mini 3.8B | ~2.3GB (Q4_K_M GGUF) | ~4GB total | ~50-80 tok/s (GPU) | 69% | $0 (MIT license) |
Quality scores are MMLU estimates. Modern smaller models significantly outperform legacy 13B community fine-tunes.
Recommended Replacements
For General Chat
Llama 3.2 3B -- Better MMLU (63%), 128K context, 4x less VRAM, permissive license, available on Ollama.
ollama run llama3.2For Coding
Qwen 2.5 Coder 7B -- Dramatically better code generation, 128K context, Apache 2.0 license.
ollama run qwen2.5-coder:7bFor Reasoning
Phi-3 Mini 3.8B -- MMLU 69%, excellent reasoning for its size, MIT license.
ollama run phi3:miniFor Uncensored Use
Mistral 7B -- Much better quality, relatively open, Apache 2.0 license, widely supported.
ollama run mistralHistorical Significance
Manticore 13B holds a place in the history of open-source AI as part of the first wave of community fine-tunes that followed Meta's LLaMA-1 release in early 2023. Along with models like Vicuna, Alpaca, and Koala, it demonstrated that community-driven model development could rapidly iterate on foundational models.
Timeline Context
The multi-dataset merging approach pioneered by models like Manticore evolved into more sophisticated techniques. Today's model merging tools (like mergekit) and training approaches (DPO, RLHF) owe something to these early experiments, even if the specific models have been entirely superseded.
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.
Related Guides
Continue your local AI journey with these comprehensive guides
Continue Learning
Explore modern local AI models that have surpassed early community fine-tunes: