Is Manticore 13B still worth using in 2026?

Generally no. Manticore 13B was a 2023 community fine-tune on LLaMA-1 with estimated MMLU around 50-53% and only 2048 context. Modern models like Llama 3.2 3B (MMLU 63%, 128K context) use less VRAM while performing significantly better. The only niche reason would be historical interest or very specific uncensored use cases.

What is Manticore 13B's real benchmark performance?

Manticore 13B's MMLU is estimated at roughly 50-53%, typical for LLaMA-1 13B community fine-tunes from 2023. It does not outperform GPT-4, Claude, or any modern frontier model. Its performance is comparable to other community fine-tunes of the same era like Vicuna 13B and Alpaca 13B.

Can I run Manticore 13B on Ollama?

Manticore 13B is not in the official Ollama model library. You can run it via llama.cpp with GGUF files from HuggingFace (TheBloke/Manticore-13B-GGUF), or create a custom Ollama Modelfile pointing to a downloaded GGUF. The Q4_K_M quantization requires about 8GB VRAM.

What license does Manticore 13B use?

Manticore 13B inherits the original LLaMA-1 license from Meta, which restricts commercial use. This is a significant limitation compared to modern models like Llama 3.2 (Meta Community License), Mistral 7B (Apache 2.0), or Phi-3 (MIT license) which all permit commercial use.

How much VRAM does Manticore 13B need?

At Q4_K_M quantization (the most common), Manticore 13B needs about 8GB VRAM. Q5_K_M needs ~10GB, Q8_0 needs ~15GB, and FP16 needs ~28GB. An RTX 3060 12GB can comfortably run the Q4_K_M version.

COMMUNITY MODEL -- LLaMA 1 ERA (2023)

Manticore 13B: Early Community Multi-Dataset Merge

Honest technical review of Manticore 13B -- a historically interesting 2023 community fine-tune that merged multiple training datasets on LLaMA-1 13B. Real MMLU ~50-53%, 2048 context window. Completely surpassed by modern models.

13B

Parameters

~51%

MMLU (estimated)

2048

Context Tokens

Legacy

Status (2023)

MMLU Estimate (Community 13B)

Fair

Usage and Alternatives

Installation (llama.cpp)
Honest Assessment
Better Modern Alternatives
Historical Significance

📅 Published: June 1, 2023🔄 Last Updated: March 13, 2026✓ Manually Reviewed

Important Context

Manticore 13B was released in May-June 2023 by the Open Access AI Collective. It is a LLaMA-1 based community fine-tune that merged multiple training datasets. While historically interesting as an early example of community model merging, it has been completely surpassed by modern models like Llama 3.2, Phi-3, Mistral 7B, and Qwen 2.5 -- many of which are smaller yet far more capable. This page provides an honest technical assessment for historical reference.

What Is Manticore 13B?

Manticore 13B (openaccess-ai-collective/manticore-13b) is a community-created large language model built on Meta's original LLaMA-1 13B base model. It was created by the Open Access AI Collective in mid-2023 as an experiment in multi-dataset fine-tuning -- combining several popular instruction-tuning datasets into a single training run.

Model Details

Base Model	LLaMA-1 13B (Meta)
Parameters	13 billion
Context Length	2,048 tokens
Architecture	Standard LLaMA transformer
Release Date	May-June 2023
License	LLaMA License (non-commercial)
Creator	Open Access AI Collective

Key Characteristics

Multi-dataset merge: combined ShareGPT, Alpaca, coding, and other datasets
Aimed for versatility rather than specialization
Relatively uncensored compared to commercial models of the era
Part of the early wave of community LLaMA fine-tunes
Non-commercial license inherited from LLaMA-1

Multi-Dataset Merge Training

Manticore 13B's distinguishing feature was its training approach: combining multiple popular instruction-tuning datasets into a single fine-tuning run. This was an early experiment in what the community now calls "dataset merging" -- the idea that exposure to diverse training data could produce a more versatile model than single-dataset fine-tunes like pure Alpaca or pure ShareGPT models.

Training Datasets Used

ShareGPT

Conversations from ChatGPT shared by users -- gave conversational ability

Alpaca

Stanford's instruction-following dataset -- gave instruction compliance

GPT4All

Diverse instruction data -- broadened general knowledge

Coding Datasets

Code-related instruction data -- added basic coding ability

Historical note: In mid-2023, this multi-dataset approach was novel. Today, techniques like DPO, RLHF, and carefully curated synthetic data have largely replaced naive dataset merging. Models like Llama 3 and Qwen 2.5 achieve far better results with more sophisticated training pipelines.

Real Benchmark Performance

Benchmark Honesty Note

Manticore 13B is a community fine-tune of LLaMA-1 13B from 2023. Its MMLU performance is estimated at roughly 50-53%, which is typical for community 13B models of that era. It does not outperform GPT-4, Claude, or any other frontier model -- such claims would be absurd for any 13B community fine-tune from 2023. Benchmarks below compare it against its actual peers: other LLaMA-1 era community models.

MMLU Comparison (Peer Models)

MMLU Score (%) -- LLaMA-1 13B Era Models

Manticore-13B51 MMLU accuracy (5-shot)

LLaMA-1 13B (base)47 MMLU accuracy (5-shot)

Vicuna-13B52 MMLU accuracy (5-shot)

Alpaca-13B48 MMLU accuracy (5-shot)

Source: Community benchmarks from HuggingFace Open LLM Leaderboard (2023). Exact Manticore numbers are estimated from similar community 13B models.

Capability Estimates

Performance Metrics

General Chat

Creative Writing

Code Generation

Reasoning

Instruction Following

Roleplay

Estimated capability scores based on community usage reports and comparable model benchmarks.

What Manticore 13B Can and Cannot Do

Reasonable For:

+Basic conversational chat
+Simple creative writing and roleplay
+Basic question answering
+Relatively uncensored outputs

Not Suitable For:

-Production code generation (poor accuracy)
-Complex reasoning or math
-Long documents (2048 token limit)
-Factual accuracy (prone to hallucination)
-Commercial use (LLaMA-1 license restriction)

VRAM Requirements by Quantization

Manticore 13B GGUF quantized files were provided by TheBloke on HuggingFace. Here are the real VRAM requirements for different quantization levels:

Quantization	File Size	VRAM Required	Quality Loss	Recommended GPU
Q4_K_M	~7.4 GB	~8 GB	Moderate	RTX 3060 12GB / RTX 4060 8GB
Q5_K_M	~9.0 GB	~10 GB	Low	RTX 3060 12GB / RTX 4070
Q8_0	~13.8 GB	~15 GB	Minimal	RTX 4070 Ti / RTX 3090
FP16	~26 GB	~28 GB	None	RTX 3090 / RTX 4090

Source: TheBloke/Manticore-13B-GGUF on HuggingFace. VRAM estimates include model weights + KV cache for 2048 context.

Installation Guide

Ollama Availability

Manticore 13B is not available in the official Ollama model library. You will need to use llama.cpp directly with GGUF files from HuggingFace, or create a custom Ollama Modelfile. The recommended approach is llama.cpp.

llama.cpp (Recommended)

Terminal

$wget https://huggingface.co/TheBloke/Manticore-13B-GGUF/resolve/main/manticore-13b.Q4_K_M.gguf

Downloading manticore-13b.Q4_K_M.gguf (7.37 GB)... [████████████████████████████████] 100% Saved to ./manticore-13b.Q4_K_M.gguf

$./llama-cli -m manticore-13b.Q4_K_M.gguf -n 256 -ngl 35 --temp 0.7

llama_model_loader: loaded meta data with 19 key-value pairs llm_load_tensors: offloading 35 layers to GPU llm_load_tensors: VRAM used: 7168 MB system_info: AVX2 = 1 | CUDA = 1 > Hello! How can I assist you today?

The -ngl 35 flag offloads all layers to GPU. Reduce this number if you have less VRAM.

Custom Ollama Modelfile (Alternative)

Terminal

$cat > Modelfile << EOF FROM ./manticore-13b.Q4_K_M.gguf PARAMETER temperature 0.7 PARAMETER num_ctx 2048 SYSTEM "You are a helpful assistant." EOF

$ollama create manticore-13b -f Modelfile

transferring model data... creating model layer... writing manifest... success

$ollama run manticore-13b

>>> Send a message (/? for help)

Python (Transformers)

Terminal

$pip install transformers torch accelerate

Successfully installed transformers torch accelerate

$python3 -c " from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('openaccess-ai-collective/manticore-13b') model = AutoModelForCausalLM.from_pretrained('openaccess-ai-collective/manticore-13b', device_map='auto') print('Model loaded successfully') "

Loading checkpoint shards: 100% Model loaded successfully

Note: Full FP16 loading requires ~26GB VRAM. Use load_in_4bit=True with bitsandbytes for reduced memory.

Honest Assessment

Manticore 13B was a product of its time. In mid-2023, the open-source LLM community was rapidly experimenting with fine-tuning Meta's leaked LLaMA-1 weights. Manticore's multi-dataset approach was innovative for its era, but the results were modest by today's standards.

What It Did Right

Demonstrated that combining multiple datasets could produce a more well-rounded model than single-dataset fine-tunes. Contributed to the community understanding of instruction tuning. Provided an accessible, relatively uncensored model for experimentation.

Limitations

Limited to 2048 context (LLaMA-1 limitation). Non-commercial license. No RLHF or preference optimization -- just supervised fine-tuning. Prone to hallucination. Mediocre code generation. Limited reasoning ability compared to even small modern models.

Why You Should Use Something Else Today

A modern 3B parameter model like Llama 3.2 3B or Phi-3 Mini will outperform Manticore 13B on virtually every benchmark while using less than half the VRAM. The only reason to run Manticore today would be historical curiosity or very specific uncensored use cases where you need a LLaMA-1 era model.

Better Modern Alternatives

Model Comparison

Model	Size	RAM Required	Speed	Quality	Cost/Month
Manticore-13B	~7.4GB (Q4_K_M GGUF)	~10GB total	~20-35 tok/s (GPU)	51%	$0 (LLaMA license)
Vicuna-13B	~7.4GB (Q4_K_M GGUF)	~10GB total	~20-35 tok/s (GPU)	52%	$0 (LLaMA license)
Llama-3.2-3B	~2.0GB (Q4_K_M GGUF)	~4GB total	~60-90 tok/s (GPU)	63%	$0 (Meta license)
Phi-3 Mini 3.8B	~2.3GB (Q4_K_M GGUF)	~4GB total	~50-80 tok/s (GPU)	69%	$0 (MIT license)

Quality scores are MMLU estimates. Modern smaller models significantly outperform legacy 13B community fine-tunes.

Recommended Replacements

For General Chat

Llama 3.2 3B -- Better MMLU (63%), 128K context, 4x less VRAM, permissive license, available on Ollama.

ollama run llama3.2

For Coding

Qwen 2.5 Coder 7B -- Dramatically better code generation, 128K context, Apache 2.0 license.

ollama run qwen2.5-coder:7b

For Reasoning

Phi-3 Mini 3.8B -- MMLU 69%, excellent reasoning for its size, MIT license.

ollama run phi3:mini

For Uncensored Use

Mistral 7B -- Much better quality, relatively open, Apache 2.0 license, widely supported.

ollama run mistral

Historical Significance

Manticore 13B holds a place in the history of open-source AI as part of the first wave of community fine-tunes that followed Meta's LLaMA-1 release in early 2023. Along with models like Vicuna, Alpaca, and Koala, it demonstrated that community-driven model development could rapidly iterate on foundational models.

Timeline Context

Feb 2023Meta releases LLaMA-1 (research license)

Mar 2023Stanford Alpaca fine-tune released

Apr 2023Vicuna, Koala, and other fine-tunes proliferate

May 2023Manticore 13B released -- multi-dataset merge experiment

Jul 2023LLaMA-2 released with commercial license, making LLaMA-1 fine-tunes largely obsolete

Sep 2023Mistral 7B released, outperforming all 13B LLaMA-1 fine-tunes

The multi-dataset merging approach pioneered by models like Manticore evolved into more sophisticated techniques. Today's model merging tools (like mergekit) and training approaches (DPO, RLHF) owe something to these early experiments, even if the specific models have been entirely superseded.

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

Start Learning Free See pricing

🎯

AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Start free Browse courses first

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

GitHub LinkedIn Twitter

Related Guides

Continue your local AI journey with these comprehensive guides

View All Local AI Guides

Continue Learning

Explore modern local AI models that have surpassed early community fine-tunes:

Llama 3.2 3B -- Better in Every Way Mistral 7B -- Modern Replacement Vicuna 13B -- Same-Era Comparison Browse All Models

Reading now

Join the discussion

Table of Contents

Background and Architecture

Usage and Alternatives

Important Context

What Is Manticore 13B?

Model Details

Key Characteristics

Multi-Dataset Merge Training

Training Datasets Used

ShareGPT

Alpaca

GPT4All

Coding Datasets

Real Benchmark Performance

Benchmark Honesty Note

MMLU Comparison (Peer Models)

MMLU Score (%) -- LLaMA-1 13B Era Models

Capability Estimates

Performance Metrics

What Manticore 13B Can and Cannot Do

Reasonable For:

Not Suitable For:

VRAM Requirements by Quantization

Installation Guide

Ollama Availability

llama.cpp (Recommended)

Custom Ollama Modelfile (Alternative)

Python (Transformers)

Honest Assessment

What It Did Right

Limitations

Why You Should Use Something Else Today

Better Modern Alternatives

Model Comparison

Recommended Replacements

For General Chat

For Coding

For Reasoning

For Uncensored Use

Historical Significance

Timeline Context

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Written by the Local AI Master Team

Related Guides

Continue Learning

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Found your model? Now build something with it.