Solar 10.7B Base
Depth Up-Scaling Architecture & Local Deployment Guide
Solar 10.7B is a base language model from Korean AI company Upstage, released in December 2023. Its key innovation is Depth Up-Scaling (DUS): rather than training a 10.7B model from scratch, Upstage duplicated layers from a pretrained Llama 2 base and continued pretraining, producing a larger model that inherits existing knowledge. This is the base (pretrained) version; for the instruction-tuned variant, see Solar 10.7B Instruct.
Technical Overview
Model Specifications
- Developer: Upstage (Seoul, South Korea)
- Release Date: December 2023
- Parameters: 10.7 billion
- Architecture: DUS (Depth Up-Scaling) based on Llama 2
- Layers: 48 transformer layers
- Hidden Dimension: 4,096
- Attention Heads: 32
- Context Window: 4,096 tokens
- Vocabulary: 32,000 tokens (Llama 2 tokenizer)
- License: Apache 2.0 (fully open, commercial use allowed)
- Model Type: Base (pretrained, not instruction-tuned)
What Makes Solar Different
Solar 10.7B stands out for one reason: DUS (Depth Up-Scaling). Instead of training from scratch, Upstage took a pretrained Llama 2 model and duplicated its transformer layers to create a deeper network. They then continued pretraining on additional data.
This approach has a key advantage: training a 10.7B model via DUS is significantly cheaper than training one from random initialization, because the duplicated layers already contain useful representations.
As a base model, Solar 10.7B is pretrained on next-token prediction but not fine-tuned for following instructions. It is primarily useful for:
- - Fine-tuning on your own dataset
- - Text completion tasks
- - Research into DUS architecture
- - Building custom instruction-tuned variants
DUS Architecture Explained
How Depth Up-Scaling Works
DUS is described in Upstage's paper "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (December 2023). The process:
Begin with a pretrained 7B-class Llama 2 model that has 32 transformer layers and already contains general language knowledge from pretraining.
Copy a subset of the transformer layers and stack them on top, increasing the model from 32 to 48 layers. This grows the parameter count from ~7B to 10.7B without random initialization.
Continue pretraining on additional data so the duplicated layers learn to differentiate from their originals and the full model converges to a coherent 48-layer network.
DUS vs Other Scaling Methods
| Method | Approach | Training Cost | Example |
|---|---|---|---|
| DUS (Solar) | Duplicate layers from pretrained model + continue training | Low | Solar 10.7B |
| Train from scratch | Random initialization, full pretraining | Very High | Llama 2, Mistral |
| MoE | Multiple expert sub-networks, sparse activation | Medium-High | Mixtral 8x7B |
| Knowledge Distillation | Smaller model trained to mimic larger teacher | Low-Medium | TinyLlama |
Source: Upstage, "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (arXiv:2312.15166)
Base vs Instruct: Which to Use
Upstage released two versions of Solar 10.7B. This page covers the base model. If you want a chatbot or instruction-following assistant, use the Instruct version instead.
| Feature | Solar 10.7B Base (this page) | Solar 10.7B Instruct |
|---|---|---|
| HuggingFace ID | upstage/SOLAR-10.7B-v1.0 | upstage/SOLAR-10.7B-Instruct-v1.0 |
| Training | Pretrained (next-token prediction) | + SFT + DPO alignment |
| Best For | Fine-tuning, text completion, research | Chat, Q&A, instruction following |
| MMLU | ~66% | ~66.2% (marginal improvement) |
| Ollama | ollama run solar | ollama run solar:10.7b-instruct-v1-q4_K_M |
Recommendation: Most users should use the Instruct version. The base model is primarily for researchers and developers who want to fine-tune on their own data.
Benchmarks
MMLU Comparison (5-shot, base models)
Source: HuggingFace Open LLM Leaderboard (v1). Yi 34B included as upper reference (3x parameters).
Open LLM Leaderboard Scores (Base Model)
| Benchmark | Solar 10.7B | Llama 2 13B | Mistral 7B |
|---|---|---|---|
| MMLU (5-shot) | ~66% | ~55% | ~60.1% |
| ARC-Challenge (25-shot) | ~61% | ~59% | ~60% |
| HellaSwag (10-shot) | ~84% | ~82% | ~83% |
| Winogrande (5-shot) | ~83% | ~76% | ~78% |
Source: HuggingFace Open LLM Leaderboard (v1), Upstage model card. Scores are approximate; check the leaderboard for latest values.
Honest Assessment
Strengths
- - Beats Llama 2 13B on MMLU despite fewer parameters
- - Apache 2.0 license (fully open for commercial use)
- - Good base for fine-tuning custom models
- - DUS approach is cheaper to replicate than training from scratch
- - Compact enough to quantize and run on consumer GPUs
Limitations
- - Only 4,096 context tokens (short by 2024+ standards)
- - Released December 2023; newer models have surpassed it
- - Base model not directly useful for chat without fine-tuning
- - DUS paper does not report Korean-specific benchmarks for base
- - No code-specific training (not competitive for coding tasks)
VRAM by Quantization
| Quantization | Model Size | VRAM Required | Quality Loss | Compatible Hardware |
|---|---|---|---|---|
| FP16 | ~21 GB | ~24 GB | None | RTX 3090/4090, A5000, A100 |
| Q8_0 | ~11 GB | ~13 GB | Minimal | RTX 3090/4090, Apple M2 Pro 16GB |
| Q4_K_M (recommended) | ~6 GB | ~7 GB | Small | RTX 3060 12GB, Apple M1 8GB, RTX 4060 |
| Q4_0 | ~5.5 GB | ~6.5 GB | Moderate | RTX 3060, Apple M1 8GB |
Sizes are approximate. VRAM includes overhead for context/KV cache at short prompts. Apple Silicon uses unified memory. Ollama defaults to Q4_K_M when you run ollama run solar.
Installation with Ollama
System Requirements
Install Ollama
One-line install on macOS/Linux
Pull and Run Solar 10.7B
Downloads the Q4_K_M quantized version (~4GB)
Alternative: HuggingFace (FP16)
Full-precision model via transformers library
Local AI Alternatives
Solar 10.7B was competitive at release (December 2023), but newer models have since surpassed it. Consider these alternatives if you need the best performance in the 7B-14B range:
| Model | Params | MMLU | Context | Why Consider |
|---|---|---|---|---|
| Qwen 2.5 14B | 14B | ~79% | 128K | Much better MMLU + 32x longer context |
| Gemma 2 27B | 27B | ~75% | 8K | Better quality, still runnable quantized on 16GB |
| Mistral Nemo 12B | 12B | ~68% | 128K | Similar size, much longer context |
| Llama 3 8B | 8B | ~66% | 8K | Similar MMLU with fewer params + 2x context |
Solar 10.7B remains a good choice if you specifically need an Apache 2.0 base model for fine-tuning, or are interested in the DUS architecture for research purposes.
Resources & References
Official Sources
- HuggingFace: SOLAR-10.7B-v1.0
Official base model weights and model card
- arXiv: SOLAR 10.7B Paper
"Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (Dec 2023)
- Upstage AI
Developer company (Seoul, South Korea)
- Ollama: Solar
Ollama model library page for Solar
Related Pages on This Site
- Solar 10.7B Instruct
The instruction-tuned version for chat and Q&A
- Mistral 7B Instruct
Popular 7B competitor from Mistral AI
- Llama 3 8B
Newer 8B model from Meta with similar MMLU
- Qwen 2.5 14B
Current leader in the 14B class
Frequently Asked Questions
Technical Questions
What is DUS (Depth Up-Scaling)?
DUS is Upstage's method for creating larger models efficiently. It takes a pretrained model (in this case Llama 2), duplicates some of its transformer layers to increase depth from 32 to 48 layers, then continues pretraining. This is cheaper than training a 10.7B model from scratch because the duplicated layers already contain useful learned representations.
How much VRAM do I need?
With Q4_K_M quantization (Ollama default): about 6-7 GB VRAM. This fits on an RTX 3060 12GB, RTX 4060, or Apple M1 with 8GB unified memory. For FP16 (full precision), you need ~24 GB VRAM (RTX 3090/4090 or A100).
Should I use the base or instruct version?
Use the Instruct version unless you plan to fine-tune on your own dataset. The base model outputs raw text completions and does not follow instructions or engage in conversation without additional training.
Practical Questions
Is Solar 10.7B still worth using in 2026?
For general use, newer models like Qwen 2.5, Llama 3, and Gemma 2 offer better performance. However, Solar 10.7B remains relevant if you need an Apache 2.0 base model for fine-tuning, or are researching DUS as a scaling technique.
Can I fine-tune Solar 10.7B?
Yes, and this is the primary use case for the base model. Use LoRA/QLoRA for efficient fine-tuning on consumer hardware. The Apache 2.0 license allows commercial use of fine-tuned derivatives. Tools like Axolotl or HuggingFace TRL work well with this model.
Does Solar 10.7B support Korean?
Solar uses the Llama 2 tokenizer (32K vocabulary), which is primarily English-focused. While Upstage is a Korean company, the base model's Korean capabilities are limited by the tokenizer. The Instruct version has slightly better Korean support from instruction tuning data. For strong Korean NLP, consider models with dedicated Korean tokenizers.
Was this helpful?
Written by Pattanaik Ramswarup
AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset
I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.
Related Guides
Continue your local AI journey with these comprehensive guides
Solar-10.7B DUS Architecture
Depth Up-Scaling: Llama 2 base (32 layers) duplicated to 48 layers (10.7B parameters), then continued pretraining