Automated Report Generation with Local AI: Complete 2026 Workflow
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Automated Report Generation with Local AI: A Working Weekly KPI Pipeline
Published April 23, 2026 - 21 min read
Every Monday at 8:47am, my team receives a Slack message titled "Weekly Pulse - Week 16/2026." It is three paragraphs of narrative analysis, two embedded charts, and a bullet list of anomalies. It was written entirely by a 14-billion-parameter language model running on a $900 mini PC under my desk, which has not seen the public internet since February.
The whole pipeline took two weekends to build. The hardest part was not the LLM. It was the part everyone underestimates: turning the model's output from "interesting demo" into something a CFO is willing to read at 9am without a second cup of coffee.
This guide walks through the complete pipeline I run today - the same one I have replicated for three other small companies - end to end. If you have ever stared at a Power BI dashboard wondering why nobody opens it, this is for you.
Quick Start: Your First Automated Report in Under an Hour
If you already have Ollama running, the minimum viable pipeline is genuinely four files:
query.sql- the SQL that pulls your KPIs.prompt.tmpl- the narrative template the model fills.generate.py- 80 lines of Python that runs the query, calls Ollama, and formats output.crontabentry - "every Monday 08:00, rungenerate.py".
We will build all four below, then layer on the production-grade scaffolding (retries, validation, charts, delivery) that makes the difference between "neat" and "you can stop opening the dashboard."
Table of Contents
- Why Automate Reports With a Local Model
- The Anatomy of a Useful Automated Report
- The Stack: DuckDB, Pandas, Ollama, Quarto
- Step 1 - Pulling Reliable Data With DuckDB
- Step 2 - Prompt Templates That Stay Truthful
- Step 3 - Generating the Narrative With Ollama
- Step 4 - Charts and PDF Output With Quarto
- Step 5 - Delivery: Slack, Email, Notion
- Step 6 - Scheduling With Cron, Systemd, or Airflow
- Validation: How to Catch a Bad Report Before It Sends
- Benchmarks: Local Models for Report Generation
- Pitfalls
- FAQ
Why Automate Reports With a Local Model {#why-local}
A few honest reasons:
Your data is private. Revenue, churn, retention curves, account-level usage - sending this to a cloud API to "summarize the week" is exactly the kind of leak compliance teams now reject. Local models close the loop.
Reports run weekly, not daily. That is 52 runs a year. At cloud-API token rates for the kind of context a real business report needs (15-30K tokens of structured data plus prompt), a single weekly pipeline can hit $40-80/month. At the typical 4-7 reports per company, you are looking at $200-500/month just for narrative generation. That pays for the hardware in 3-6 months.
You control the model. When the model improves and your prompts break, you upgrade on your schedule. No "OpenAI deprecated GPT-4 Turbo on September 30th" surprise the day before a board meeting.
Latency is irrelevant. Reports run on Sunday night. Nobody cares whether the LLM took 4 seconds or 40. This is the perfect workload for a $900 box that does nothing else 95% of the week.
For the broader cost argument, our Ollama vs ChatGPT API cost breakdown goes deep on the math.
The Anatomy of a Useful Automated Report {#anatomy}
After watching dozens of "AI report" projects fail, the pattern of the ones that succeed is consistent. They all have these five elements:
- A single primary metric at the top - the one number people came for.
- Three to five supporting metrics with week-over-week deltas.
- A narrative paragraph that explains why, not just what.
- An anomaly list - things that diverged more than expected.
- One actionable recommendation - even if it is just "investigate X."
Reports that try to be exhaustive get ignored. Reports that surface the five most important things every week get read. The model is responsible for the narrative paragraph, the anomaly list, and the recommendation. Everything else is deterministic SQL plus formatting.
That division of labor is the single most important architectural decision in this entire guide. The LLM does not generate numbers. It explains them.
The Stack: DuckDB, Pandas, Ollama, Quarto {#stack}
The components I converged on after trying many alternatives:
| Layer | Tool | Why |
|---|---|---|
| Query engine | DuckDB | Embedded, zero-config, reads Parquet/CSV/Postgres directly |
| Data shaping | Pandas | Universal, easy unit testing |
| LLM | Ollama + Qwen 2.5 14B | Best quality at this size, MIT-friendly license |
| Templating | Jinja2 | Battle-tested, makes prompts auditable |
| Output rendering | Quarto | One source -> HTML, PDF, Slack-friendly markdown |
| Scheduling | systemd timer | More reliable than cron, journal logs included |
| Delivery | Slack incoming webhook + SMTP | Both at once, fail-open if one is down |
You can swap any of these. I have run it with SQLite instead of DuckDB, with Llama 3.3 70B instead of Qwen, with Airflow instead of systemd. The pattern survives substitution. What matters is that every layer is auditable - if a chart looks wrong, you can find exactly which row of data and which line of prompt produced it.
Step 1 - Pulling Reliable Data With DuckDB {#data}
DuckDB is the secret weapon. It runs in-process (no server), reads almost everything (Parquet, CSV, Postgres via extension, Iceberg, even Excel), and produces Pandas DataFrames in one line. For a weekly report, you do not need a data warehouse. You need a query.
import duckdb
import pandas as pd
con = duckdb.connect()
con.execute("INSTALL postgres_scanner; LOAD postgres_scanner;")
con.execute("ATTACH 'postgresql://user:pass@db:5432/prod' AS prod (TYPE POSTGRES);")
weekly = con.execute("""
WITH base AS (
SELECT
date_trunc('week', signed_up_at) AS week,
COUNT(*) AS signups,
COUNT(*) FILTER (WHERE plan != 'free') AS paid_signups,
SUM(monthly_value) AS new_mrr
FROM prod.users
WHERE signed_up_at >= CURRENT_DATE - INTERVAL '8 weeks'
GROUP BY 1
)
SELECT
week,
signups,
paid_signups,
new_mrr,
LAG(signups) OVER (ORDER BY week) AS signups_prev,
ROUND(100.0 * (signups - LAG(signups) OVER (ORDER BY week))
/ NULLIF(LAG(signups) OVER (ORDER BY week), 0), 1) AS signups_wow_pct
FROM base
ORDER BY week DESC;
""").fetchdf()
print(weekly.head())
The output is a clean DataFrame. We have everything the LLM needs and not one row more. Do not ever pass raw transactional tables to the LLM. Aggregate first. The model sees totals, ratios, and deltas - never individual rows.
This is also the part where you should add unit tests. A weekly report that breaks because a column was renamed is a Monday morning fire I have lived through twice.
Step 2 - Prompt Templates That Stay Truthful {#prompts}
The single biggest failure mode of automated reports is the model inventing numbers. The prompt template below is the one I have iterated to most reliably prevent this.
You are a senior business analyst at {{ company }}. Write the
narrative section of this week's KPI report. Your audience is the
executive team.
CRITICAL RULES:
- Use only numbers that appear in the data block below. Do not
compute ratios or estimates that are not already there.
- If the data is ambiguous, write "data ambiguous" rather than guess.
- Do not use the words "synergy," "leverage," "robust," or "exciting."
- 3 paragraphs maximum. Each paragraph 2-4 sentences.
Paragraph 1: State the headline number and how it changed week over
week. Cite at most 2 supporting figures.
Paragraph 2: Explain the most likely driver. If multiple metrics
moved together, say so. If you cannot tell, say "driver unclear."
Paragraph 3: One specific question or check the team should run this
week.
DATA (week ending {{ week_end }}):
{{ data_block }}
PRIOR WEEK CONTEXT:
{{ prior_context }}
The "do not invent numbers" rule is enforced post-hoc in the validation step (next section). The prompt rule alone is not enough; models will sometimes still invent. But the combination of a clear prompt rule plus a numeric validator catches roughly 99% of the issues in my testing.
The "do not use these words" rule sounds petty. It is not. It is the difference between a report that sounds like a real analyst and one that sounds like marketing copy. Add the buzzwords your team most hates.
Step 3 - Generating the Narrative With Ollama {#generate}
The actual model call is the simplest part:
from ollama import Client
from jinja2 import Template
client = Client(host="http://localhost:11434")
template_text = open("prompt.tmpl").read()
prompt = Template(template_text).render(
company="Acme Co",
week_end=weekly.iloc[0]["week"].strftime("%Y-%m-%d"),
data_block=weekly.to_markdown(index=False),
prior_context=load_prior_context()
)
response = client.generate(
model="qwen2.5:14b-instruct-q4_K_M",
prompt=prompt,
options={
"temperature": 0.3, # Lower = more deterministic
"top_p": 0.9,
"num_predict": 600, # ~3 paragraphs
"seed": 42, # Reproducibility
}
)
narrative = response["response"]
The two non-obvious settings:
temperature: 0.3- higher than 0.5 and the model starts editorializing. Lower than 0.2 and it sounds robotic. 0.3 is the sweet spot for analyst tone.seed: 42- if you ever need to reproduce why last week's report said what it said, this lets you do it bit-exact. Combined with logging the model digest, you have a full reproducibility chain.
For why reproducibility matters in regulated environments, see our local AI audit trail guide.
Step 4 - Charts and PDF Output With Quarto {#charts}
Let the LLM write narrative. Let matplotlib draw charts. Glue them with Quarto.
Create report.qmd:
---
title: "Weekly Pulse - Week {{ week_num }}/{{ year }}"
format:
html:
embed-resources: true
pdf: default
execute:
echo: false
---
\`\`\`{python}
import pandas as pd, matplotlib.pyplot as plt
weekly = pd.read_parquet("data/weekly.parquet")
narrative = open("narrative.md").read()
\`\`\`
## Headline
**Total signups this week: {python} weekly.iloc[0]["signups"]**
({python} f"{weekly.iloc[0]['signups_wow_pct']:+.1f}%" vs prior week)
\`\`\`{python}
fig, ax = plt.subplots(figsize=(10, 4))
ax.bar(weekly["week"].astype(str), weekly["signups"])
ax.set_title("Signups, last 8 weeks")
plt.xticks(rotation=45)
plt.tight_layout()
\`\`\`
## Narrative
{python} narrative
Render with quarto render report.qmd --to pdf and you have a publication-quality PDF.
For Slack, render to markdown and post via webhook. For email, render to HTML. The same source produces all three. This is why Quarto wins over Jinja-to-HTML approaches - one document, three outputs.
Step 5 - Delivery: Slack, Email, Notion {#delivery}
Make delivery boring. Use multiple channels so a single failure does not silently break the report.
import requests, smtplib
from email.mime.text import MIMEText
def post_slack(markdown_body: str):
requests.post(
"https://hooks.slack.com/services/T0.../B0.../...",
json={"text": markdown_body},
timeout=10
)
def email_pdf(pdf_path: str, recipients: list[str]):
msg = MIMEText("Weekly Pulse attached. Reply with questions.")
msg["Subject"] = f"Weekly Pulse - {date.today()}"
msg["From"] = "reports@example.com"
msg["To"] = ", ".join(recipients)
# ... attach PDF, send via SMTP
Wrap each delivery in a try/except that does not fail the whole pipeline if one channel breaks. Log the failure, send to the others, raise an alert separately.
For Notion, the official API supports posting blocks - I have a 30-line helper that turns the narrative + chart image into a Notion page. Useful if your team lives in Notion rather than Slack.
Step 6 - Scheduling With Cron, Systemd, or Airflow {#schedule}
For a single weekly pipeline, systemd timers beat cron:
# /etc/systemd/system/weekly-report.service
[Unit]
Description=Weekly KPI report
[Service]
Type=oneshot
User=reports
WorkingDirectory=/opt/weekly-report
ExecStart=/opt/weekly-report/venv/bin/python generate.py
# /etc/systemd/system/weekly-report.timer
[Unit]
Description=Run weekly report Monday 08:00
[Timer]
OnCalendar=Mon *-*-* 08:00:00
Persistent=true
[Install]
WantedBy=timers.target
systemctl enable --now weekly-report.timer and you are done.
If you have more than three reports or need fan-out across teams, graduate to Apache Airflow or Dagster. The Airflow TaskFlow API is genuinely good for this kind of pipeline now. Use Airflow if you have it. Do not stand it up just for one weekly report.
Validation: How to Catch a Bad Report Before It Sends {#validation}
The single best engineering decision I have made in this pipeline: a numeric validator that runs between the LLM step and the delivery step.
import re
def extract_numbers(text: str) -> list[float]:
"""Pull every number out of the LLM's narrative."""
pattern = r"-?\$?[\d,]+\.?\d*%?"
raw = re.findall(pattern, text)
nums = []
for r in raw:
cleaned = r.replace("$", "").replace(",", "").replace("%", "")
try:
nums.append(float(cleaned))
except ValueError:
pass
return nums
def validate_against_data(narrative: str, df: pd.DataFrame, tolerance: float = 0.05):
allowed = set()
for col in df.select_dtypes("number").columns:
for v in df[col].dropna():
allowed.add(round(float(v), 1))
cited = extract_numbers(narrative)
suspect = []
for n in cited:
# Skip small integers (likely week numbers etc)
if abs(n) < 10:
continue
if not any(abs(n - a) <= max(0.5, abs(a) * tolerance) for a in allowed):
suspect.append(n)
return suspect
If suspect is non-empty, the report does not send. Instead, the pipeline emails the on-call engineer with the narrative and the list of suspect numbers. In production over 14 weeks, this caught two genuine model hallucinations and zero false positives.
This single validator turned the pipeline from "demo" to "trustable." If you build nothing else in this guide, build this.
Benchmarks: Local Models for Report Generation {#benchmarks}
Same prompt, same data block (8 weeks of KPIs, 12 columns), tested April 2026:
| Model | Hardware | Latency | Accuracy* | Hallucinated Numbers |
|---|---|---|---|---|
| Phi-3 Medium 14B | Mac M2 32GB | 38s | 78% | 4/20 reports |
| Qwen 2.5 14B Instruct | Mac M2 32GB | 41s | 94% | 1/20 reports |
| Qwen 2.5 32B | RTX 4090 | 52s | 96% | 1/20 reports |
| Llama 3.3 70B Q4 | RTX 4090+3090 | 1m 34s | 97% | 0/20 reports |
| GPT-4o | Cloud | 8s | 98% | 0/20 reports |
*Accuracy = human grader rating "would I send this to the CEO" (yes/no/edit-required).
Qwen 2.5 14B is the price-performance sweet spot. The 32B and 70B variants are noticeably better but the 14B output, validated by the post-hoc numeric checker, is what I run in production.
Pitfalls {#pitfalls}
What goes wrong:
1. Letting the model see raw rows. Every time. Aggregate before the prompt. If you cannot aggregate to under 50 rows, your report is too detailed.
2. No reproducibility. Pin the model version (qwen2.5:14b@sha256:...), set a seed, log the exact prompt. Without these, you cannot debug a weird Monday.
3. Sending unvalidated output. Build the numeric validator before you build the Slack delivery. Reverse that order and you will, eventually, send a wrong number to your CEO.
4. Cron without alerting. If the pipeline silently fails on Sunday night, you find out Monday at 9:01am when someone asks where the report is. Wire systemd OnFailure= to a paging webhook.
5. Storing intermediates in /tmp. When debugging "why did last Monday's report say X," you need the prompt, the model output, and the data frame from that run. Keep them in a dated archive directory for at least 90 days.
6. Trying to make the LLM compute deltas. It cannot reliably compute "X is 18.5% higher than Y." Compute the delta in SQL or Pandas, then ask the model to explain it.
7. Relying on a single delivery channel. Slack outages happen. Email goes to spam. Post to two channels minimum.
8. Not versioning the prompt. Your prompt template lives in git. Tag every release. When the report quality drifts, the diff tells you what changed.
FAQ {#faq}
The single question I get most: "Can the model just connect to my Postgres directly?" It can, via tool-calling. But the deterministic SQL approach is more auditable, more reliable, and easier to debug. Use tool-calling for ad-hoc queries; use SQL for scheduled reports.
For the deeper rabbit hole on connecting LLMs to data sources, see our Ollama function calling and tool use guide.
Where to Take This Next
Three obvious extensions once the weekly pipeline is humming:
- Q&A on top of the report. Take last week's data, embed it, expose a Slack
/askcommand that answers questions about it. Our private AI knowledge base guide covers the RAG side. - Multi-team fan-out. Each team gets their own report driven by their own SQL. Use the same model and prompt template; vary the data block.
- Anomaly-driven alerts. Skip the weekly cadence; let the pipeline run hourly, generate a report only when an anomaly threshold is crossed. Pair this with the audit trail pattern so you can always trace which event triggered which alert.
For monitoring the pipeline itself - latency, token counts, model drift - our Ollama monitoring with Prometheus and Grafana guide is the natural follow-on.
Conclusion
The dashboards everyone built in 2018 stopped getting opened around 2021. Email digests came back, then died because nobody had time to write them. Local AI is what makes the digest viable again - because the analyst is now an unsleeping 14-billion-parameter assistant, and your data never leaves the building.
The pipeline I described in this guide is, end to end, fewer than 400 lines of Python plus a Quarto template and a systemd unit. I wrote the first version in a weekend; the production version took two more weekends to harden. The dollars saved versus a cloud-LLM equivalent paid for the hardware in five months. The hours of manual report-writing avoided are uncountable.
Build the simple version this weekend. Wire up the validator before you celebrate. The day a wrong number gets caught and you have to explain to no one what the LLM almost did is the day you will be glad you did.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!