Question 1

What is Cohere Command R+ and what is it built for?

Accepted Answer

Cohere Command R+ is a 104B-parameter open-weights model from Cohere, originally released April 2024 with refresh in late 2024. It is purpose-tuned for two workloads: (1) **RAG** with native citation generation, and (2) **multi-step tool use** with structured tool-call output. On RAG benchmarks, it consistently leads Llama 3.1 70B / Mistral / Qwen on grounded answer quality and citation accuracy. License: CC-BY-NC-4.0 — non-commercial use only without a separate Cohere commercial license. For research / personal RAG / agent workloads, Command R+ is the highest-quality open-weights specialist; for commercial production, you need a Cohere license or switch to a permissively-licensed alternative.

Question 2

What hardware does Command R+ need?

Accepted Answer

Command R+ at 104B parameters: ~210 GB BF16. Q4_K_M GGUF: ~63 GB — needs 2x 48 GB cards or 1x H100 80GB. Q3_K_M: ~50 GB — fits on a single H100 / Pro W7900 + offload, or 2x RTX 4090. Q2_K: ~38 GB tight on a single 48GB card. For consumer GPUs, Command R+ is impractical without quantization to Q2 (which sacrifices quality). The smaller **Command R7B** (7B parameters, December 2024 release) runs comfortably on a 12 GB GPU and inherits the RAG/tool-use training. For most local users: use Command R7B, not R+.

Question 3

How does Command R+ compare to Llama 3.1 70B for RAG?

Accepted Answer

For grounded-answer RAG with explicit citation generation: Command R+ wins. Cohere trained it specifically with retrieval-augmented prompts and citation outputs in the format `[1][3]`. Llama 3.1 70B can do RAG but doesn't natively cite — you need prompt engineering or fine-tuning. On hallucination rate in RAG: Command R+ ~6%, Llama 3.1 70B ~14%. For pure chat / general knowledge / reasoning: Llama 3.1 70B is competitive or better. If RAG is the primary workload and you need citations, Command R+ is the right tool.

Question 4

What is Command R7B and should I use it instead?

Accepted Answer

Command R7B (December 2024) is Cohere's 7B-parameter compact version of the Command R family. It inherits the RAG/tool-use focused training but at a fraction of the size. For most local-AI users with 8-16 GB GPUs, Command R7B is the practical choice. Performance on RAG benchmarks: ~85-90% of Command R+ at 7% of the parameters. Same chat template, same tool-use format, same multilingual coverage. License is also CC-BY-NC-4.0 — non-commercial.

Question 5

How do I set up Command R+ or Command R7B?

Accepted Answer

Ollama: `ollama run command-r7b` for the small variant, or `ollama run command-r-plus` for the 104B (only on multi-GPU rigs or H100). llama.cpp: GGUF quants from bartowski or LoneStriker. vLLM: `vllm serve CohereForAI/c4ai-command-r-plus` for BF16 (huge VRAM) or `vllm serve casperhansen/command-r-plus-104b-awq --quantization awq` for AWQ-INT4. The chat template uses Cohere's special role tags (`<|START_OF_TURN_TOKEN|><|USER_TOKEN|>...`).

Question 6

How does Command R+ tool calling work?

Accepted Answer

Command R+ has a custom RAG-and-tools template that produces structured JSON action plans. You provide tool definitions plus retrieved documents; Command R+ outputs a "Plan" section, then either tool calls in JSON or a grounded answer with `[citation]` markers. For OpenAI-compatible API integration, vLLM and Ollama wrap this into the standard `tools` / `tool_calls` format. For best quality on multi-step agentic reasoning, use the native Cohere prompt format and parse the structured output yourself.

Question 7

What languages does Command R+ support?

Accepted Answer

Strong multilingual: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Chinese, Arabic. Cohere positions Command R+ as their multilingual flagship. Compared to Llama 3.1 (English-focused with multilingual capability) or Qwen 2.5 (Chinese-strong): Command R+ has more even multilingual quality across the supported languages. For multilingual RAG specifically (retrieve in one language, answer in another), Command R+ is one of the best options.

Question 8

Is the CC-BY-NC license a deal-breaker for production?

Accepted Answer

For non-commercial use (research, personal, internal tools at non-revenue-generating organizations), CC-BY-NC is fine. For commercial deployment generating revenue, you need a Cohere commercial license (contact Cohere sales) or switch to a permissive alternative: Llama 3.1/3.3 (Meta Community License), Mistral Small 3 (Apache 2.0), Granite 3 (Apache 2.0). For the RAG workloads Command R+ excels at, IBM Granite 3 with Granite Embedding is the strongest commercial-friendly substitute.

Variant	Params	VRAM (BF16/Q4)	Use
Command R+	104B	210 GB / 63 GB	Best RAG/tools, needs serious HW
Command R	35B	70 GB / 21 GB	Mid-tier; Q4 fits 24GB single card
Command R7B	7B	14 GB / 4.5 GB	Practical local choice

Metric	Command R+	Llama 3.1 70B
Native citation generation	✅	❌ (prompt-based)
Hallucination rate (RAG)	6%	14%
Multi-step tool use accuracy	91%	78%
Multilingual RAG	Excellent	Good
MMLU (general)	75.7	86.0
Throughput (Q4, 48GB GPU)	~12 tok/s	~22 tok/s

Cohere Command R+ Local Setup Guide (2026): RAG and Tool-Use Specialist

Want to go deeper than this article?

Table of Contents

Reading articles is good. Building is better.

What Command R+ Is {#what-it-is}

Command R+ vs Command R 35B vs Command R7B {#family}

License (CC-BY-NC) Implications {#license}

Reading articles is good. Building is better.

Hardware Requirements {#requirements}

Command R+ (104B)

Command R 35B

Command R7B

Command R+ vs Llama 3.1 70B for RAG {#vs-llama-rag}

Ollama Setup {#ollama}

llama.cpp Setup {#llamacpp}

vLLM Setup {#vllm}

Native RAG Prompt Format {#rag-format}

Structured Tool Calling {#tool-calling}

Go from reading about AI to building with AI

Liked this? 17 full AI courses are waiting.

LocalAimaster Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ollama Docker Templates

Build Real AI on Your Machine

Related Guides

RAG Local Setup Guide

Granite 3 Local Setup

Mistral Small 3 Setup

Llama 4 Local Setup

Written by Pattanaik Ramswarup

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI