Solar 10.7B Base
Depth Up-Scaling Architecture & Local Deployment Guide

Solar 10.7B is a base language model from Korean AI company Upstage, released in December 2023. Its key innovation is Depth Up-Scaling (DUS): rather than training a 10.7B model from scratch, Upstage duplicated layers from a pretrained Llama 2 base and continued pretraining, producing a larger model that inherits existing knowledge. This is the base (pretrained) version; for the instruction-tuned variant, see Solar 10.7B Instruct.

10.7B
Parameters
~66%
MMLU (5-shot)
4,096
Context Tokens
Apache 2.0
License

Technical Overview

Model Specifications

  • Developer: Upstage (Seoul, South Korea)
  • Release Date: December 2023
  • Parameters: 10.7 billion
  • Architecture: DUS (Depth Up-Scaling) based on Llama 2
  • Layers: 48 transformer layers
  • Hidden Dimension: 4,096
  • Attention Heads: 32
  • Context Window: 4,096 tokens
  • Vocabulary: 32,000 tokens (Llama 2 tokenizer)
  • License: Apache 2.0 (fully open, commercial use allowed)
  • Model Type: Base (pretrained, not instruction-tuned)

What Makes Solar Different

Solar 10.7B stands out for one reason: DUS (Depth Up-Scaling). Instead of training from scratch, Upstage took a pretrained Llama 2 model and duplicated its transformer layers to create a deeper network. They then continued pretraining on additional data.

This approach has a key advantage: training a 10.7B model via DUS is significantly cheaper than training one from random initialization, because the duplicated layers already contain useful representations.

As a base model, Solar 10.7B is pretrained on next-token prediction but not fine-tuned for following instructions. It is primarily useful for:

  • - Fine-tuning on your own dataset
  • - Text completion tasks
  • - Research into DUS architecture
  • - Building custom instruction-tuned variants

DUS Architecture Explained

How Depth Up-Scaling Works

DUS is described in Upstage's paper "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (December 2023). The process:

Step 1
Start with Pretrained Llama 2

Begin with a pretrained 7B-class Llama 2 model that has 32 transformer layers and already contains general language knowledge from pretraining.

Step 2
Duplicate Layers

Copy a subset of the transformer layers and stack them on top, increasing the model from 32 to 48 layers. This grows the parameter count from ~7B to 10.7B without random initialization.

Step 3
Continue Pretraining

Continue pretraining on additional data so the duplicated layers learn to differentiate from their originals and the full model converges to a coherent 48-layer network.

DUS vs Other Scaling Methods

MethodApproachTraining CostExample
DUS (Solar)Duplicate layers from pretrained model + continue trainingLowSolar 10.7B
Train from scratchRandom initialization, full pretrainingVery HighLlama 2, Mistral
MoEMultiple expert sub-networks, sparse activationMedium-HighMixtral 8x7B
Knowledge DistillationSmaller model trained to mimic larger teacherLow-MediumTinyLlama

Source: Upstage, "SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling" (arXiv:2312.15166)

Base vs Instruct: Which to Use

Upstage released two versions of Solar 10.7B. This page covers the base model. If you want a chatbot or instruction-following assistant, use the Instruct version instead.

FeatureSolar 10.7B Base (this page)Solar 10.7B Instruct
HuggingFace IDupstage/SOLAR-10.7B-v1.0upstage/SOLAR-10.7B-Instruct-v1.0
TrainingPretrained (next-token prediction)+ SFT + DPO alignment
Best ForFine-tuning, text completion, researchChat, Q&A, instruction following
MMLU~66%~66.2% (marginal improvement)
Ollamaollama run solarollama run solar:10.7b-instruct-v1-q4_K_M

Recommendation: Most users should use the Instruct version. The base model is primarily for researchers and developers who want to fine-tune on their own data.

Benchmarks

MMLU Comparison (5-shot, base models)

Solar 10.7B66 MMLU accuracy %
66
Llama 2 13B55 MMLU accuracy %
55
Mistral 7B60 MMLU accuracy %
60
Yi 34B76 MMLU accuracy %
76

Source: HuggingFace Open LLM Leaderboard (v1). Yi 34B included as upper reference (3x parameters).

Open LLM Leaderboard Scores (Base Model)

BenchmarkSolar 10.7BLlama 2 13BMistral 7B
MMLU (5-shot)~66%~55%~60.1%
ARC-Challenge (25-shot)~61%~59%~60%
HellaSwag (10-shot)~84%~82%~83%
Winogrande (5-shot)~83%~76%~78%

Source: HuggingFace Open LLM Leaderboard (v1), Upstage model card. Scores are approximate; check the leaderboard for latest values.

Honest Assessment

Strengths

  • - Beats Llama 2 13B on MMLU despite fewer parameters
  • - Apache 2.0 license (fully open for commercial use)
  • - Good base for fine-tuning custom models
  • - DUS approach is cheaper to replicate than training from scratch
  • - Compact enough to quantize and run on consumer GPUs

Limitations

  • - Only 4,096 context tokens (short by 2024+ standards)
  • - Released December 2023; newer models have surpassed it
  • - Base model not directly useful for chat without fine-tuning
  • - DUS paper does not report Korean-specific benchmarks for base
  • - No code-specific training (not competitive for coding tasks)

VRAM by Quantization

QuantizationModel SizeVRAM RequiredQuality LossCompatible Hardware
FP16~21 GB~24 GBNoneRTX 3090/4090, A5000, A100
Q8_0~11 GB~13 GBMinimalRTX 3090/4090, Apple M2 Pro 16GB
Q4_K_M (recommended)~6 GB~7 GBSmallRTX 3060 12GB, Apple M1 8GB, RTX 4060
Q4_0~5.5 GB~6.5 GBModerateRTX 3060, Apple M1 8GB

Sizes are approximate. VRAM includes overhead for context/KV cache at short prompts. Apple Silicon uses unified memory. Ollama defaults to Q4_K_M when you run ollama run solar.

Installation with Ollama

System Requirements

Operating System
Ubuntu 20.04+, macOS 12+ (Apple Silicon supported), Windows 11
RAM
16GB minimum (for quantized), 32GB for FP16
Storage
4GB (Q4_K_M) to 22GB (FP16)
GPU
Any GPU with 6GB+ VRAM for Q4_K_M, or 24GB for FP16
CPU
Apple M1/M2/M3 or modern x86_64 (CPU inference supported via Ollama)
1

Install Ollama

One-line install on macOS/Linux

$ curl -fsSL https://ollama.com/install.sh | sh
2

Pull and Run Solar 10.7B

Downloads the Q4_K_M quantized version (~4GB)

$ ollama run solar
3

Alternative: HuggingFace (FP16)

Full-precision model via transformers library

$ pip install torch transformers accelerate python -c "from transformers import AutoModelForCausalLM, AutoTokenizer; model = AutoModelForCausalLM.from_pretrained('upstage/SOLAR-10.7B-v1.0', device_map='auto'); print('Model loaded')"
Terminal
$ollama run solar
pulling manifest pulling 6b0c4... verifying sha256 digest writing manifest success >>> Send a message (/? for help)
$>>> Explain the DUS architecture used in Solar 10.7B
Depth Up-Scaling (DUS) is the architecture used to create Solar 10.7B. The process works as follows: 1. Start with a pretrained Llama 2 model 2. Duplicate certain transformer layers to increase depth 3. Continue pretraining on additional data This approach leverages existing pretrained weights rather than training from scratch, making it more efficient than training a 10.7B model from random initialization. The result is a model with 48 transformer layers that inherits knowledge from the base Llama 2 architecture while gaining additional capacity from the duplicated layers.
$_

Local AI Alternatives

Solar 10.7B was competitive at release (December 2023), but newer models have since surpassed it. Consider these alternatives if you need the best performance in the 7B-14B range:

ModelParamsMMLUContextWhy Consider
Qwen 2.5 14B14B~79%128KMuch better MMLU + 32x longer context
Gemma 2 27B27B~75%8KBetter quality, still runnable quantized on 16GB
Mistral Nemo 12B12B~68%128KSimilar size, much longer context
Llama 3 8B8B~66%8KSimilar MMLU with fewer params + 2x context

Solar 10.7B remains a good choice if you specifically need an Apache 2.0 base model for fine-tuning, or are interested in the DUS architecture for research purposes.

Resources & References

Official Sources

Related Pages on This Site

Frequently Asked Questions

Technical Questions

What is DUS (Depth Up-Scaling)?

DUS is Upstage's method for creating larger models efficiently. It takes a pretrained model (in this case Llama 2), duplicates some of its transformer layers to increase depth from 32 to 48 layers, then continues pretraining. This is cheaper than training a 10.7B model from scratch because the duplicated layers already contain useful learned representations.

How much VRAM do I need?

With Q4_K_M quantization (Ollama default): about 6-7 GB VRAM. This fits on an RTX 3060 12GB, RTX 4060, or Apple M1 with 8GB unified memory. For FP16 (full precision), you need ~24 GB VRAM (RTX 3090/4090 or A100).

Should I use the base or instruct version?

Use the Instruct version unless you plan to fine-tune on your own dataset. The base model outputs raw text completions and does not follow instructions or engage in conversation without additional training.

Practical Questions

Is Solar 10.7B still worth using in 2026?

For general use, newer models like Qwen 2.5, Llama 3, and Gemma 2 offer better performance. However, Solar 10.7B remains relevant if you need an Apache 2.0 base model for fine-tuning, or are researching DUS as a scaling technique.

Can I fine-tune Solar 10.7B?

Yes, and this is the primary use case for the base model. Use LoRA/QLoRA for efficient fine-tuning on consumer hardware. The Apache 2.0 license allows commercial use of fine-tuned derivatives. Tools like Axolotl or HuggingFace TRL work well with this model.

Does Solar 10.7B support Korean?

Solar uses the Llama 2 tokenizer (32K vocabulary), which is primarily English-focused. While Upstage is a Korean company, the base model's Korean capabilities are limited by the tokenizer. The Instruct version has slightly better Korean support from instruction tuning data. For strong Korean NLP, consider models with dedicated Korean tokenizers.

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
📅 Published: December 13, 2023🔄 Last Updated: March 13, 2026✓ Manually Reviewed

Related Guides

Continue your local AI journey with these comprehensive guides

Solar-10.7B DUS Architecture

Depth Up-Scaling: Llama 2 base (32 layers) duplicated to 48 layers (10.7B parameters), then continued pretraining

👤
You
💻
Your ComputerAI Processing
👤
🌐
🏢
Cloud AI: You → Internet → Company Servers
Reading now
Join the discussion
Free Tools & Calculators