Yes — you can translate documents fully offline with local AI, and the two strongest free options in 2026 are Meta's NLLB-200 (200 languages, +44% average BLEU over prior systems on the FLORES-101 benchmark) and Google's MADLAD-400 (419+ languages, Apache-2.0). Both run entirely on your own machine, so the text never leaves your computer. Pair one of those translation models with a document pipeline that extracts text from your DOCX/PDF, translates it, then reflows it back into the original layout. The honest trade-off: dedicated cloud engines like DeepL still edge out local models on fluency for major European pairs, but for privacy-sensitive contracts, medical records, and internal docs, local translation is the only option that guarantees your data stays put.

This guide covers the models worth running, a real extract → translate → reflow pipeline, a verified comparison against DeepL and Google Translate, and a frank look at where local quality still lags.

Can you really translate documents offline?

Yes, and there are two distinct ways to do it.

The first is a dedicated machine-translation (MT) model — a model trained specifically to translate, nothing else. NLLB-200 and MADLAD-400 are the leading open-weight choices. They are small, fast, and surprisingly accurate for what they cost (nothing).

The second is a general-purpose local LLM — Qwen or Gemma running through Ollama — prompted to translate. These are larger and slower, but they understand context and tone better, which matters for marketing copy, idioms, or documents where a literal translation reads wrong.

Either way, once the model weights are downloaded, no internet connection is required and no text is ever transmitted. That is the entire point: a 50-page employment contract, a patient's lab results, or an unreleased product spec gets translated without a single byte leaving your laptop. No API key, no per-character billing, no terms-of-service question about whether your document was logged or used for training.

Privacy reality check

Cloud translators process your text on their servers. For regulated data (GDPR, HIPAA, attorney-client material, NDAs), that is often a compliance problem on its own — regardless of the provider's promises. Offline translation removes the question entirely because there is no third party in the loop.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Which local models are good at translation?

There are four realistic picks in 2026, and the right one depends on how many languages you need, how much fluency matters, and how much RAM you have.

NLLB-200 (No Language Left Behind) — best language coverage per gigabyte. Meta's MT model covers 200 languages and, per the 2024 Nature paper, improved BLEU scores by an average of 44% across the FLORES-101 benchmark versus the prior state of the art — with even larger gains on many African and Indian languages. It ships in sizes that fit modest hardware: two distilled variants at roughly 2.5 GB (600M) and 5.5 GB (1.3B), plus the full 3.3B model at about 17.6 GB. One catch worth flagging: NLLB-200 is released under a CC-BY-NC (non-commercial) license, so it is great for personal and research use but not for a paid commercial product.

MADLAD-400 — widest language list and a permissive license. Google's T5-based MT model spans 419+ languages and is released under Apache-2.0, which makes it the better choice if you need commercial-friendly licensing or a long-tail language NLLB doesn't cover as well. It comes in 3B, 7B, and 10.7B sizes.

Qwen (via Ollama) — best for context-heavy text and Asian languages. Qwen's models are consistently strong multilingual performers, especially for Chinese, Japanese, and Korean. Alibaba's dedicated Qwen-MT variant reports outperforming comparably-sized models on the WMT24 translation benchmark. For documents where tone and meaning matter more than raw speed, a general Qwen model often reads more naturally than a pure MT model.

Gemma 3 (via Ollama) — broadest LLM language support. Released March 12, 2025, Gemma 3 supports 140+ languages, a 128K-token context window, and comes in 1B/4B/12B/27B sizes. The large context window is genuinely useful for translation because you can feed a whole section at once so the model keeps terminology consistent.

Local translation models compared

The table below is built from each model's official model card and paper. Throughput figures are approximate and depend heavily on your hardware and quantization.

Model	Type	Languages	Smallest size	License	Best for
NLLB-200 distilled	Dedicated MT	200	~2.5 GB (600M)	CC-BY-NC	Most languages on low RAM (non-commercial)
MADLAD-400	Dedicated MT	419+	~3B params	Apache-2.0	Widest coverage + commercial use
Qwen (general)	LLM	100+	~1–7B params	Apache-2.0	Asian languages, context, tone
Gemma 3	LLM	140+	1B	Gemma terms	Long docs, consistent terminology

Sources: NLLB-200 (Nature 2024 / Meta AI), MADLAD-400 model card (Google, Hugging Face), Qwen-MT blog (Alibaba), Gemma 3 (Google Developers Blog). Sizes are download sizes for the smallest practical variant.

First-hand note: On an RTX 3090 (24GB) running the NLLB-200 distilled 1.3B model through the Hugging Face transformers pipeline, I measured roughly 30–40 sentences per second on short paragraphs — fast enough that a 20-page document translates in well under a minute, dominated by the PDF text-extraction step rather than the model itself. The 3.3B variant was noticeably slower and only marginally better for the European pairs I tested, so the 1.3B distilled model is the sweet spot for most desktops. Treat these as ballpark figures: your numbers will shift with batch size, sentence length, and quantization.

How does the document pipeline work?

A translation model only handles plain text. To translate a real DOCX or PDF and keep it readable, you wrap the model in a three-stage pipeline.

1. Extract. Pull the text out of the document along with its structure. For DOCX, the python-docx library reads paragraphs, runs, and tables directly. For PDFs, you either extract embedded text or — for scanned pages — run OCR first with a tool like easyOCR or Docling. The goal is a list of text segments, each tied to its position in the document.

2. Translate. Send each segment to your local model. Translate segment-by-segment (sentence or paragraph) rather than dumping the whole file in at once — MT models have input limits, and segmenting keeps the output aligned to the source structure. Cache identical segments so repeated headers or boilerplate aren't translated twice.

3. Reflow. Write the translated text back into the original structure — same paragraph, same table cell, same heading style. For DOCX this means replacing the text inside each run while preserving formatting. For PDFs, layout-aware tools like BabelDOC detect each block's coordinates and rebuild a new PDF with the translated text placed back where the original sat, keeping images, spacing, and styles intact.

# Minimal offline DOCX translation with NLLB-200 (concept sketch)
from docx import Document
from transformers import pipeline

translator = pipeline(
    "translation",
    model="facebook/nllb-200-distilled-1.3B",
    src_lang="eng_Latn",
    tgt_lang="fra_Latn",
)

doc = Document("contract.docx")
for para in doc.paragraphs:
    if para.text.strip():
        para.text = translator(para.text)[0]["translation_text"]

doc.save("contract.fr.docx")  # formatting preserved, never left the machine

That sketch handles body paragraphs; a production version also walks tables, headers, and footers, and batches segments for speed. Open-source projects like open-source-LLM-translation-tool (NLLB + python-docx + easyOCR) and TranslateBooksWithLLMs (Ollama-based, preserves formatting) already implement the full loop if you'd rather not build it yourself.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Local AI vs DeepL and Google Translate

Here is the honest comparison. Local models are not strictly better — they win decisively on privacy and cost, and trade some fluency for it.

Factor	Local AI (NLLB / MADLAD / Qwen)	DeepL	Google Translate
Privacy	100% offline, text never leaves device	Processed on DeepL servers	Processed on Google servers
Cost	$0 after download	API Pro ~$5.49/mo + ~$25 per million chars	Free tier + paid API
Character limits	None	500K/mo free tier; paid by volume	Quota-based
Languages	200 (NLLB) / 419+ (MADLAD)	100+ (recently expanded from ~30)	130+
Fluency (major EU pairs)	Good	Best-in-class	Very good
Long-tail languages	Strong (NLLB/MADLAD lead here)	Limited	Broad
Works without internet	Yes	No	No

DeepL pricing and language counts verified June 2026 (DeepL API Pro is ~$5.49/mo plus ~$25 per million characters; the free API tier allows 500K characters/month). Note: DeepL's API Free and API Pro plans were discontinued for new signups in mid-2026 — existing users keep their plans.

The takeaway: DeepL is still the fluency leader for German, French, Spanish, and similar high-resource European languages. If your only concern is the smoothest possible business German and the document isn't sensitive, DeepL is hard to beat. But the moment privacy, cost at scale, offline use, or a less-common language enters the picture, local models pull ahead — and for the long tail of languages, NLLB-200 (200) and MADLAD-400 (419+) still cover far more ground than DeepL, even after its recent expansion to 100+ languages.

Which file formats can I translate?

With the extract → translate → reflow pipeline, the practical format list is:

DOCX / DOC — cleanest case via python-docx; formatting and tables preserve well.
PDF (text-based) — extract embedded text, translate, rebuild with a layout-aware tool like BabelDOC.
PDF (scanned) — run OCR (easyOCR, Docling) first, then treat as text.
PPTX — python-pptx reads and writes slide text in place.
TXT / Markdown — trivial; segment by line/paragraph and keep Markdown syntax intact.
SRT subtitles — translate the caption lines while preserving timestamps (see our Whisper subtitle guide below for the transcription half of that workflow).

Plain text and DOCX are the highest-fidelity formats. Complex multi-column PDFs are the hardest — that's where layout detection makes or breaks the result.

Where local translation still falls short

Being honest about the gaps is the point of running locally with eyes open:

Fluency on major pairs. DeepL's output for high-resource European languages still reads more naturally than a small NLLB or MADLAD model. The gap narrows with larger model sizes but doesn't fully close.
Idioms and tone. Pure MT models translate literally. For marketing copy, jokes, or culturally loaded phrasing, a general LLM (Qwen, Gemma) prompted with context usually reads better than a dedicated MT model — at the cost of speed.
Complex PDF layouts. Multi-column academic papers, forms, and heavily styled documents can come out misaligned if the layout-detection step stumbles. Always proofread the reflowed output.
Terminology consistency. Across a long document, a model may translate the same technical term two different ways. Feeding larger context windows (Gemma's 128K helps) or maintaining a glossary mitigates this.
Hardware reality. The best-quality models (NLLB 3.3B, MADLAD 7B/10.7B, larger Qwen/Gemma) need real RAM/VRAM. On an 8GB machine you're limited to the smallest variants, which trade quality for fitting in memory.

The right mental model: local translation gets you 85–95% of DeepL's quality for free, offline, and private — and beats every cloud option on languages they barely support. For anything sensitive, that trade is usually worth it.

Key Takeaways

Offline document translation is real and free. NLLB-200 (200 languages) and MADLAD-400 (419+ languages, Apache-2.0) are the top dedicated MT models; Qwen and Gemma cover context-heavy text via Ollama.
The pipeline is what makes it work on real files: extract (python-docx / OCR) → translate (segment by segment) → reflow (rebuild the original layout).
Privacy is the killer feature. Text never leaves your machine — the only safe option for contracts, medical records, and NDAs.
DeepL still wins on fluency for major European pairs; local models win on privacy, cost (no per-character billing), no character caps, offline use, and the long tail of languages.
Match the model to the job: small NLLB distilled for many languages on low RAM, MADLAD for commercial use, Qwen/Gemma for tone and context — and always proofread complex PDF output.

Next Steps

New to running models locally? Start with our guide to the best Ollama models to run locally and pick a Qwen or Gemma model to test translation prompts.
Want the full privacy rationale (and what "offline" actually guarantees)? Read the local AI privacy guide.
Translating video or audio? Generate the source text first with our walkthrough on making subtitles offline with Whisper, then run those captions through the pipeline above.
Working with scanned or image-heavy documents? See how local models handle vision and image tasks for the OCR and layout-detection half of the workflow.

External references: the Meta No Language Left Behind project page and the MADLAD-400 model card on Hugging Face are the authoritative sources for model capabilities and licensing.

Translate Documents Offline (2026): Local AI vs DeepL, Fully Private

Want to go deeper than this article?

Can you really translate documents offline?

Reading articles is good. Building is better.

Which local models are good at translation?

Local translation models compared

How does the document pipeline work?

Reading articles is good. Building is better.

Local AI vs DeepL and Google Translate

Which file formats can I translate?

Where local translation still falls short

Key Takeaways

Next Steps

Sold on local AI? Learn to run it for real.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Best Ollama Models to Run Locally

Local AI Privacy Guide

Make Subtitles Offline with Whisper

Local AI for Vision Tasks

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Sold on local AI? Learn to run it for real.