What size local model is needed for weekly KPI reports?

A 14-billion-parameter instruction-tuned model is the price-performance sweet spot in 2026. Qwen 2.5 14B Instruct produces analyst-quality narrative on aggregated KPI data running on a $900-1200 mini PC. Smaller 7B models drift on multi-step reasoning; 32B and 70B variants are marginally better but require more hardware and longer latency without dramatically changing quality on aggregated data.

How do I prevent the model from inventing numbers?

Three layers: aggregate the data before the prompt so the model only sees totals and deltas, write the prompt with a strict 'use only numbers in the data block' rule, and run a post-hoc numeric validator that checks every number in the narrative against the source DataFrame. The validator is the load-bearing piece. Build that first.

Can I use this pipeline without Quarto?

Yes. Quarto is convenient because one source produces HTML, PDF, and markdown, but a Jinja2 template plus matplotlib plus Markdown-to-HTML works fine. The architecture matters more than the rendering tool. The split is: deterministic data and charts in one tool, LLM narrative as a separate step, both stitched in templating.

How do I schedule the pipeline reliably?

For a single weekly report, systemd timers are simpler and more observable than cron - failures show up in journalctl with full output. For multiple reports or fan-out across teams, graduate to Apache Airflow or Dagster. Always wire failure alerts to a paging channel; silent failure is the worst case.

What is the right tone setting for analyst-quality narrative?

Temperature 0.3, top_p 0.9, fixed seed for reproducibility. Above 0.5 the model editorializes; below 0.2 it reads as robotic. Combine with a banned-words list in the prompt (no 'synergy', 'leverage', etc.) for a notable improvement in tone.

Should the model write SQL too?

For ad-hoc Q&A, yes - via tool-calling against your warehouse with strict role limits. For scheduled reports, no. Hand-written SQL with version control is auditable; LLM-generated SQL is not. Reproducibility matters more than autonomy in scheduled pipelines.

How do I make the report reproducible six months from now?

Pin the model digest in your generate.py (use 'ollama show' to capture sha256), set a fixed seed, version-control the prompt template, archive the input DataFrame as Parquet, and store the raw model response. With those five artifacts you can reproduce any past report bit-exact.

Is this approach overkill for a 5-person team?

The minimum-viable version (one SQL file, one prompt, one Python script, one cron entry) is two hours of work and saves hours per week. The validator and reproducibility layers add maybe four hours. For any team where someone currently writes a weekly status by hand, the ROI is fast. Skip the Airflow and Quarto if the simple version is enough; add them only when complexity justifies it.

Automated Report Generation with Local AI: A Working Weekly KPI Pipeline

Published April 23, 2026 - 21 min read

Every Monday at 8:47am, my team receives a Slack message titled "Weekly Pulse - Week 16/2026." It is three paragraphs of narrative analysis, two embedded charts, and a bullet list of anomalies. It was written entirely by a 14-billion-parameter language model running on a $900 mini PC under my desk, which has not seen the public internet since February.

The whole pipeline took two weekends to build. The hardest part was not the LLM. It was the part everyone underestimates: turning the model's output from "interesting demo" into something a CFO is willing to read at 9am without a second cup of coffee.

This guide walks through the complete pipeline I run today - the same one I have replicated for three other small companies - end to end. If you have ever stared at a Power BI dashboard wondering why nobody opens it, this is for you.

Quick Start: Your First Automated Report in Under an Hour

If you already have Ollama running, the minimum viable pipeline is genuinely four files:

query.sql - the SQL that pulls your KPIs.
prompt.tmpl - the narrative template the model fills.
generate.py - 80 lines of Python that runs the query, calls Ollama, and formats output.
crontab entry - "every Monday 08:00, run generate.py".

We will build all four below, then layer on the production-grade scaffolding (retries, validation, charts, delivery) that makes the difference between "neat" and "you can stop opening the dashboard."

Why Automate Reports With a Local Model
The Anatomy of a Useful Automated Report
The Stack: DuckDB, Pandas, Ollama, Quarto
Step 1 - Pulling Reliable Data With DuckDB
Step 2 - Prompt Templates That Stay Truthful
Step 3 - Generating the Narrative With Ollama
Step 4 - Charts and PDF Output With Quarto
Step 5 - Delivery: Slack, Email, Notion
Step 6 - Scheduling With Cron, Systemd, or Airflow
Validation: How to Catch a Bad Report Before It Sends
Benchmarks: Local Models for Report Generation
Pitfalls
FAQ

Why Automate Reports With a Local Model {#why-local}

A few honest reasons:

Your data is private. Revenue, churn, retention curves, account-level usage - sending this to a cloud API to "summarize the week" is exactly the kind of leak compliance teams now reject. Local models close the loop.

Reports run weekly, not daily. That is 52 runs a year. At cloud-API token rates for the kind of context a real business report needs (15-30K tokens of structured data plus prompt), a single weekly pipeline can hit $40-80/month. At the typical 4-7 reports per company, you are looking at $200-500/month just for narrative generation. That pays for the hardware in 3-6 months.

You control the model. When the model improves and your prompts break, you upgrade on your schedule. No "OpenAI deprecated GPT-4 Turbo on September 30th" surprise the day before a board meeting.

Latency is irrelevant. Reports run on Sunday night. Nobody cares whether the LLM took 4 seconds or 40. This is the perfect workload for a $900 box that does nothing else 95% of the week.

For the broader cost argument, our Ollama vs ChatGPT API cost breakdown goes deep on the math.

The Anatomy of a Useful Automated Report {#anatomy}

After watching dozens of "AI report" projects fail, the pattern of the ones that succeed is consistent. They all have these five elements:

A single primary metric at the top - the one number people came for.
Three to five supporting metrics with week-over-week deltas.
A narrative paragraph that explains why, not just what.
An anomaly list - things that diverged more than expected.
One actionable recommendation - even if it is just "investigate X."

Reports that try to be exhaustive get ignored. Reports that surface the five most important things every week get read. The model is responsible for the narrative paragraph, the anomaly list, and the recommendation. Everything else is deterministic SQL plus formatting.

That division of labor is the single most important architectural decision in this entire guide. The LLM does not generate numbers. It explains them.

The Stack: DuckDB, Pandas, Ollama, Quarto {#stack}

The components I converged on after trying many alternatives:

Layer	Tool	Why
Query engine	DuckDB	Embedded, zero-config, reads Parquet/CSV/Postgres directly
Data shaping	Pandas	Universal, easy unit testing
LLM	Ollama + Qwen 2.5 14B	Best quality at this size, MIT-friendly license
Templating	Jinja2	Battle-tested, makes prompts auditable
Output rendering	Quarto	One source -> HTML, PDF, Slack-friendly markdown
Scheduling	systemd timer	More reliable than cron, journal logs included
Delivery	Slack incoming webhook + SMTP	Both at once, fail-open if one is down

You can swap any of these. I have run it with SQLite instead of DuckDB, with Llama 3.3 70B instead of Qwen, with Airflow instead of systemd. The pattern survives substitution. What matters is that every layer is auditable - if a chart looks wrong, you can find exactly which row of data and which line of prompt produced it.

Step 1 - Pulling Reliable Data With DuckDB {#data}

DuckDB is the secret weapon. It runs in-process (no server), reads almost everything (Parquet, CSV, Postgres via extension, Iceberg, even Excel), and produces Pandas DataFrames in one line. For a weekly report, you do not need a data warehouse. You need a query.

import duckdb
import pandas as pd

con = duckdb.connect()
con.execute("INSTALL postgres_scanner; LOAD postgres_scanner;")
con.execute("ATTACH 'postgresql://user:pass@db:5432/prod' AS prod (TYPE POSTGRES);")

weekly = con.execute("""
    WITH base AS (
      SELECT
        date_trunc('week', signed_up_at) AS week,
        COUNT(*) AS signups,
        COUNT(*) FILTER (WHERE plan != 'free') AS paid_signups,
        SUM(monthly_value) AS new_mrr
      FROM prod.users
      WHERE signed_up_at >= CURRENT_DATE - INTERVAL '8 weeks'
      GROUP BY 1
    )
    SELECT
      week,
      signups,
      paid_signups,
      new_mrr,
      LAG(signups) OVER (ORDER BY week) AS signups_prev,
      ROUND(100.0 * (signups - LAG(signups) OVER (ORDER BY week))
            / NULLIF(LAG(signups) OVER (ORDER BY week), 0), 1) AS signups_wow_pct
    FROM base
    ORDER BY week DESC;
""").fetchdf()

print(weekly.head())

The output is a clean DataFrame. We have everything the LLM needs and not one row more. Do not ever pass raw transactional tables to the LLM. Aggregate first. The model sees totals, ratios, and deltas - never individual rows.

This is also the part where you should add unit tests. A weekly report that breaks because a column was renamed is a Monday morning fire I have lived through twice.

Step 2 - Prompt Templates That Stay Truthful {#prompts}

The single biggest failure mode of automated reports is the model inventing numbers. The prompt template below is the one I have iterated to most reliably prevent this.

You are a senior business analyst at {{ company }}. Write the
narrative section of this week's KPI report. Your audience is the
executive team.

CRITICAL RULES:
- Use only numbers that appear in the data block below. Do not
  compute ratios or estimates that are not already there.
- If the data is ambiguous, write "data ambiguous" rather than guess.
- Do not use the words "synergy," "leverage," "robust," or "exciting."
- 3 paragraphs maximum. Each paragraph 2-4 sentences.

Paragraph 1: State the headline number and how it changed week over
week. Cite at most 2 supporting figures.

Paragraph 2: Explain the most likely driver. If multiple metrics
moved together, say so. If you cannot tell, say "driver unclear."

Paragraph 3: One specific question or check the team should run this
week.

DATA (week ending {{ week_end }}):
{{ data_block }}

PRIOR WEEK CONTEXT:
{{ prior_context }}

The "do not invent numbers" rule is enforced post-hoc in the validation step (next section). The prompt rule alone is not enough; models will sometimes still invent. But the combination of a clear prompt rule plus a numeric validator catches roughly 99% of the issues in my testing.

The "do not use these words" rule sounds petty. It is not. It is the difference between a report that sounds like a real analyst and one that sounds like marketing copy. Add the buzzwords your team most hates.

Step 3 - Generating the Narrative With Ollama {#generate}

The actual model call is the simplest part:

from ollama import Client
from jinja2 import Template

client = Client(host="http://localhost:11434")

template_text = open("prompt.tmpl").read()
prompt = Template(template_text).render(
    company="Acme Co",
    week_end=weekly.iloc[0]["week"].strftime("%Y-%m-%d"),
    data_block=weekly.to_markdown(index=False),
    prior_context=load_prior_context()
)

response = client.generate(
    model="qwen2.5:14b-instruct-q4_K_M",
    prompt=prompt,
    options={
        "temperature": 0.3,         # Lower = more deterministic
        "top_p": 0.9,
        "num_predict": 600,         # ~3 paragraphs
        "seed": 42,                 # Reproducibility
    }
)
narrative = response["response"]

The two non-obvious settings:

temperature: 0.3 - higher than 0.5 and the model starts editorializing. Lower than 0.2 and it sounds robotic. 0.3 is the sweet spot for analyst tone.
seed: 42 - if you ever need to reproduce why last week's report said what it said, this lets you do it bit-exact. Combined with logging the model digest, you have a full reproducibility chain.

For why reproducibility matters in regulated environments, see our local AI audit trail guide.

Step 4 - Charts and PDF Output With Quarto {#charts}

Let the LLM write narrative. Let matplotlib draw charts. Glue them with Quarto.

Create report.qmd:

---
title: "Weekly Pulse - Week {{ week_num }}/{{ year }}"
format:
  html:
    embed-resources: true
  pdf: default
execute:
  echo: false
---

\`\`\`{python}
import pandas as pd, matplotlib.pyplot as plt
weekly = pd.read_parquet("data/weekly.parquet")
narrative = open("narrative.md").read()
\`\`\`

## Headline

**Total signups this week: {python} weekly.iloc[0]["signups"]**
({python} f"{weekly.iloc[0]['signups_wow_pct']:+.1f}%" vs prior week)

\`\`\`{python}
fig, ax = plt.subplots(figsize=(10, 4))
ax.bar(weekly["week"].astype(str), weekly["signups"])
ax.set_title("Signups, last 8 weeks")
plt.xticks(rotation=45)
plt.tight_layout()
\`\`\`

## Narrative

{python} narrative

Render with quarto render report.qmd --to pdf and you have a publication-quality PDF.

For Slack, render to markdown and post via webhook. For email, render to HTML. The same source produces all three. This is why Quarto wins over Jinja-to-HTML approaches - one document, three outputs.

Step 5 - Delivery: Slack, Email, Notion {#delivery}

Make delivery boring. Use multiple channels so a single failure does not silently break the report.

import requests, smtplib
from email.mime.text import MIMEText

def post_slack(markdown_body: str):
    requests.post(
        "https://hooks.slack.com/services/T0.../B0.../...",
        json={"text": markdown_body},
        timeout=10
    )

def email_pdf(pdf_path: str, recipients: list[str]):
    msg = MIMEText("Weekly Pulse attached. Reply with questions.")
    msg["Subject"] = f"Weekly Pulse - {date.today()}"
    msg["From"] = "reports@example.com"
    msg["To"] = ", ".join(recipients)
    # ... attach PDF, send via SMTP

Wrap each delivery in a try/except that does not fail the whole pipeline if one channel breaks. Log the failure, send to the others, raise an alert separately.

For Notion, the official API supports posting blocks - I have a 30-line helper that turns the narrative + chart image into a Notion page. Useful if your team lives in Notion rather than Slack.

Step 6 - Scheduling With Cron, Systemd, or Airflow {#schedule}

For a single weekly pipeline, systemd timers beat cron:

# /etc/systemd/system/weekly-report.service
[Unit]
Description=Weekly KPI report

[Service]
Type=oneshot
User=reports
WorkingDirectory=/opt/weekly-report
ExecStart=/opt/weekly-report/venv/bin/python generate.py

# /etc/systemd/system/weekly-report.timer
[Unit]
Description=Run weekly report Monday 08:00

[Timer]
OnCalendar=Mon *-*-* 08:00:00
Persistent=true

[Install]
WantedBy=timers.target

systemctl enable --now weekly-report.timer and you are done.

If you have more than three reports or need fan-out across teams, graduate to Apache Airflow or Dagster. The Airflow TaskFlow API is genuinely good for this kind of pipeline now. Use Airflow if you have it. Do not stand it up just for one weekly report.

Validation: How to Catch a Bad Report Before It Sends {#validation}

The single best engineering decision I have made in this pipeline: a numeric validator that runs between the LLM step and the delivery step.

import re

def extract_numbers(text: str) -> list[float]:
    """Pull every number out of the LLM's narrative."""
    pattern = r"-?\$?[\d,]+\.?\d*%?"
    raw = re.findall(pattern, text)
    nums = []
    for r in raw:
        cleaned = r.replace("$", "").replace(",", "").replace("%", "")
        try:
            nums.append(float(cleaned))
        except ValueError:
            pass
    return nums

def validate_against_data(narrative: str, df: pd.DataFrame, tolerance: float = 0.05):
    allowed = set()
    for col in df.select_dtypes("number").columns:
        for v in df[col].dropna():
            allowed.add(round(float(v), 1))

    cited = extract_numbers(narrative)
    suspect = []
    for n in cited:
        # Skip small integers (likely week numbers etc)
        if abs(n) < 10:
            continue
        if not any(abs(n - a) <= max(0.5, abs(a) * tolerance) for a in allowed):
            suspect.append(n)
    return suspect

If suspect is non-empty, the report does not send. Instead, the pipeline emails the on-call engineer with the narrative and the list of suspect numbers. In production over 14 weeks, this caught two genuine model hallucinations and zero false positives.

This single validator turned the pipeline from "demo" to "trustable." If you build nothing else in this guide, build this.

Benchmarks: Local Models for Report Generation {#benchmarks}

Same prompt, same data block (8 weeks of KPIs, 12 columns), tested April 2026:

Model	Hardware	Latency	Accuracy*	Hallucinated Numbers
Phi-3 Medium 14B	Mac M2 32GB	38s	78%	4/20 reports
Qwen 2.5 14B Instruct	Mac M2 32GB	41s	94%	1/20 reports
Qwen 2.5 32B	RTX 4090	52s	96%	1/20 reports
Llama 3.3 70B Q4	RTX 4090+3090	1m 34s	97%	0/20 reports
GPT-4o	Cloud	8s	98%	0/20 reports

*Accuracy = human grader rating "would I send this to the CEO" (yes/no/edit-required).

Qwen 2.5 14B is the price-performance sweet spot. The 32B and 70B variants are noticeably better but the 14B output, validated by the post-hoc numeric checker, is what I run in production.

Pitfalls {#pitfalls}

What goes wrong:

1. Letting the model see raw rows. Every time. Aggregate before the prompt. If you cannot aggregate to under 50 rows, your report is too detailed.

2. No reproducibility. Pin the model version (qwen2.5:14b@sha256:...), set a seed, log the exact prompt. Without these, you cannot debug a weird Monday.

3. Sending unvalidated output. Build the numeric validator before you build the Slack delivery. Reverse that order and you will, eventually, send a wrong number to your CEO.

4. Cron without alerting. If the pipeline silently fails on Sunday night, you find out Monday at 9:01am when someone asks where the report is. Wire systemd OnFailure= to a paging webhook.

5. Storing intermediates in /tmp. When debugging "why did last Monday's report say X," you need the prompt, the model output, and the data frame from that run. Keep them in a dated archive directory for at least 90 days.

6. Trying to make the LLM compute deltas. It cannot reliably compute "X is 18.5% higher than Y." Compute the delta in SQL or Pandas, then ask the model to explain it.

7. Relying on a single delivery channel. Slack outages happen. Email goes to spam. Post to two channels minimum.

8. Not versioning the prompt. Your prompt template lives in git. Tag every release. When the report quality drifts, the diff tells you what changed.

FAQ {#faq}

The single question I get most: "Can the model just connect to my Postgres directly?" It can, via tool-calling. But the deterministic SQL approach is more auditable, more reliable, and easier to debug. Use tool-calling for ad-hoc queries; use SQL for scheduled reports.

For the deeper rabbit hole on connecting LLMs to data sources, see our Ollama function calling and tool use guide.

Where to Take This Next

Three obvious extensions once the weekly pipeline is humming:

Q&A on top of the report. Take last week's data, embed it, expose a Slack /ask command that answers questions about it. Our private AI knowledge base guide covers the RAG side.
Multi-team fan-out. Each team gets their own report driven by their own SQL. Use the same model and prompt template; vary the data block.
Anomaly-driven alerts. Skip the weekly cadence; let the pipeline run hourly, generate a report only when an anomaly threshold is crossed. Pair this with the audit trail pattern so you can always trace which event triggered which alert.

For monitoring the pipeline itself - latency, token counts, model drift - our Ollama monitoring with Prometheus and Grafana guide is the natural follow-on.

Conclusion

The dashboards everyone built in 2018 stopped getting opened around 2021. Email digests came back, then died because nobody had time to write them. Local AI is what makes the digest viable again - because the analyst is now an unsleeping 14-billion-parameter assistant, and your data never leaves the building.

The pipeline I described in this guide is, end to end, fewer than 400 lines of Python plus a Quarto template and a systemd unit. I wrote the first version in a weekend; the production version took two more weekends to harden. The dollars saved versus a cloud-LLM equivalent paid for the hardware in five months. The hours of manual report-writing avoided are uncountable.

Build the simple version this weekend. Wire up the validator before you celebrate. The day a wrong number gets caught and you have to explain to no one what the LLM almost did is the day you will be glad you did.

Automated Report Generation with Local AI: Complete 2026 Workflow

Want to go deeper than this article?

Automated Report Generation with Local AI: A Working Weekly KPI Pipeline

Quick Start: Your First Automated Report in Under an Hour

Table of Contents

Why Automate Reports With a Local Model {#why-local}

The Anatomy of a Useful Automated Report {#anatomy}

The Stack: DuckDB, Pandas, Ollama, Quarto {#stack}

Step 1 - Pulling Reliable Data With DuckDB {#data}

Step 2 - Prompt Templates That Stay Truthful {#prompts}

Step 3 - Generating the Narrative With Ollama {#generate}

Step 4 - Charts and PDF Output With Quarto {#charts}

Step 5 - Delivery: Slack, Email, Notion {#delivery}

Step 6 - Scheduling With Cron, Systemd, or Airflow {#schedule}

Validation: How to Catch a Bad Report Before It Sends {#validation}

Benchmarks: Local Models for Report Generation {#benchmarks}

Pitfalls {#pitfalls}

FAQ {#faq}

Where to Take This Next

Conclusion

Go from reading about AI to building with AI

Enjoyed this? There are 10 full courses waiting.

LocalAimaster Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by Pattanaik Ramswarup

🎓 Continue Learning

Get Production AI Pipelines Weekly

Related Guides

Build Real AI on Your Machine

Continue Learning

Ollama Function Calling

Local AI Data Analyst

Local AI Backup & Recovery

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI