Free course — 2 free chapters of every course. No credit card.Start learning free
Production Deployment

Ollama on Kubernetes: Production Team Deployment Guide (2026)

April 23, 2026
22 min read
LocalAimaster Research Team

Want to go deeper than this article?

The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.

Ollama on Kubernetes: Production Team Deployment Guide

Published April 23, 2026 • 22 min read

Most Ollama tutorials stop at docker run. That works for one developer on one laptop. The minute a second engineer asks for the same endpoint, or you need to survive a node reboot without a 40 GB model re-download, you need Kubernetes. This guide walks the whole path: a working manifest set, a Helm chart you can fork, GPU scheduling that does not silently fall back to CPU, and the operational decisions I learned the hard way running Ollama on a 4-node K3s cluster for a 22-person team.

I am writing this from a setup that has been serving an internal coding assistant for 11 months. Not a homelab demo. The numbers and YAML below are pulled from the live cluster.

Quick Start: Ollama on Kubernetes in 8 Minutes

If you already have a cluster with the NVIDIA device plugin installed, paste this and you have a working Ollama endpoint:

kubectl create namespace ollama
kubectl apply -n ollama -f https://raw.githubusercontent.com/otwld/ollama-helm/main/examples/quickstart.yaml
kubectl wait --for=condition=ready pod -l app=ollama -n ollama --timeout=300s
kubectl exec -n ollama ollama-0 -- ollama pull llama3.1:8b
kubectl port-forward -n ollama svc/ollama 11434:11434
curl http://localhost:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"hello"}'

Three minutes for the StatefulSet to come up, four minutes for the model pull on a 200 Mbps connection. After that, the model is cached on the PVC and pod restarts take 12 seconds.

That gets you running. The rest of this article is what you do after the demo, when you actually have to run it.

Table of Contents

  1. Why Kubernetes for Ollama
  2. Cluster Prerequisites
  3. GPU Setup with NVIDIA Device Plugin
  4. The StatefulSet Manifest Explained Line by Line
  5. Service, Ingress, and TLS
  6. Helm Chart Walkthrough
  7. Autoscaling with KEDA
  8. Multi-Model Routing
  9. Monitoring and Logs
  10. Security Hardening
  11. Real Cluster Benchmarks
  12. Common Pitfalls
  13. FAQs

Why Kubernetes for Ollama {#why-k8s}

If you are alone, you do not need this article. brew install ollama is fine. The case for Kubernetes shows up when any of these become true:

  • More than one engineer hits the same endpoint and you stop wanting to debug "is your IP allowed."
  • You need the model server to survive node reboots, OS updates, and OOM kills without manual recovery.
  • You have GPU nodes mixed with CPU nodes and want the scheduler to put inference where the silicon is.
  • You need to run multiple model variants (a coding model and a chat model) without one starving the other.
  • You want metrics, logs, and access control without bolting them on with shell scripts.

Compared to running Ollama bare on a server, Kubernetes gives you self-healing, declarative state, native ingress, and a real story for upgrades. Compared to managed inference services like Bedrock or Together, you keep weights on disk you control and pay zero per-token egress.

The cost: a learning curve, a control plane to maintain, and a real network policy story. For a team of five or more, it is worth it. We migrated from a single systemctl start ollama server to K3s in a weekend after the third "who restarted it?" Slack message.

For deeper architectural context, our Ollama production deployment guide covers the single-node Docker Compose path, and load balancing Ollama with Nginx is the right next read once you have multiple replicas.


Cluster Prerequisites {#prerequisites}

You need a cluster that meets these baseline conditions:

ComponentMinimumRecommendedWhy
Kubernetes version1.271.30+Sidecar containers, native sidecar lifecycle
Node OSUbuntu 22.04 / Debian 12Ubuntu 24.04 LTSNVIDIA driver 535+ packages
GPU driver535.x550.xRequired for CUDA 12.4 used by Ollama 0.3+
Container runtimecontainerd 1.7containerd 2.0NVIDIA container toolkit support
StorageAny CSI with RWOlocal-path or Ceph RBD on SSDModel cache I/O
NetworkingCNI with NetworkPolicyCilium 1.15+API isolation

I run K3s on Ubuntu 24.04 with the NVIDIA GPU Operator and Cilium. Total install time on a fresh node is under 15 minutes. For managed clusters, EKS, GKE, and AKS all support GPU node pools — just pick one with H100, A100, A10, L4, or RTX 6000 Ada nodes depending on budget.

The official Kubernetes documentation on managing devices covers the device plugin model in depth if you want the upstream reference.

Verifying GPU visibility

Before deploying anything, confirm the cluster sees GPUs:

kubectl get nodes -o json | jq '.items[].status.allocatable["nvidia.com/gpu"]'
# Expected: "1" or higher per GPU node, null for CPU nodes

If you see null everywhere, the device plugin is not running. Install it:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.15.0/deployments/static/nvidia-device-plugin.yml
kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds

Wait 30 seconds, re-run the allocatable check. If it still shows null, you have a driver or toolkit issue — nvidia-smi on the node should work before you debug Kubernetes.


GPU Setup with NVIDIA Device Plugin {#gpu-setup}

The device plugin advertises GPUs as a Kubernetes resource. There are two paths:

Option A: Standalone device plugin (simpler, works for single-node and homelab clusters):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds
  template:
    metadata:
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
        - image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
          name: nvidia-device-plugin-ctr
          securityContext:
            privileged: true

Option B: NVIDIA GPU Operator (managed driver, toolkit, and DCGM metrics — recommended for production):

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm install --wait gpu-operator nvidia/gpu-operator -n gpu-operator --create-namespace

The operator runs DCGM exporter on every GPU node, which we will scrape with Prometheus later for VRAM and utilization metrics. On a fresh cluster this saves about three hours of work versus wiring it all by hand.

Tainting GPU nodes

Stop CPU workloads from landing on expensive GPU hardware:

kubectl taint nodes gpu-node-1 nvidia.com/gpu=true:NoSchedule
kubectl label nodes gpu-node-1 hardware=gpu accelerator=rtx-4090

Then add a matching toleration in the Ollama pod spec. CPU pods skip the node by default; only pods that explicitly tolerate the taint can land there.


The StatefulSet Manifest Explained Line by Line {#statefulset}

Here is the manifest that has been running our cluster for 11 months. Every flag is there for a reason — I will annotate the load-bearing ones below.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ollama
  namespace: ollama
spec:
  serviceName: ollama
  replicas: 2
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      nodeSelector:
        accelerator: rtx-4090
      containers:
        - name: ollama
          image: ollama/ollama:0.5.7
          ports:
            - containerPort: 11434
              name: http
          env:
            - name: OLLAMA_HOST
              value: "0.0.0.0:11434"
            - name: OLLAMA_KEEP_ALIVE
              value: "24h"
            - name: OLLAMA_NUM_PARALLEL
              value: "4"
            - name: OLLAMA_MAX_LOADED_MODELS
              value: "2"
            - name: OLLAMA_FLASH_ATTENTION
              value: "1"
            - name: OLLAMA_KV_CACHE_TYPE
              value: "q8_0"
          resources:
            requests:
              cpu: "2"
              memory: "8Gi"
              nvidia.com/gpu: "1"
            limits:
              cpu: "8"
              memory: "32Gi"
              nvidia.com/gpu: "1"
          volumeMounts:
            - name: models
              mountPath: /root/.ollama
          livenessProbe:
            httpGet:
              path: /
              port: 11434
            initialDelaySeconds: 30
            periodSeconds: 30
            timeoutSeconds: 10
          readinessProbe:
            httpGet:
              path: /
              port: 11434
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 5
          startupProbe:
            httpGet:
              path: /
              port: 11434
            failureThreshold: 30
            periodSeconds: 10
  volumeClaimTemplates:
    - metadata:
        name: models
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: local-path-ssd
        resources:
          requests:
            storage: 200Gi

What matters here

  • OLLAMA_KEEP_ALIVE: 24h — by default Ollama unloads models after 5 minutes of inactivity. On Kubernetes, that means a 30-second cold reload on every Slack-bot-after-lunch query. Pin it to 24 hours. VRAM cost is a non-issue when the model is the only resident thing.
  • OLLAMA_FLASH_ATTENTION: 1 — 15-25% throughput gain on Ampere and Ada GPUs. No reason to leave it off.
  • OLLAMA_KV_CACHE_TYPE: q8_0 — quantizes the KV cache to 8-bit. Cuts VRAM usage by ~40% with negligible quality loss for most use cases. Use q4_0 if you are tight on VRAM, f16 if you cannot tolerate any quality drop.
  • OLLAMA_NUM_PARALLEL: 4 — number of concurrent requests per model. Higher = more throughput, more VRAM. 4 is a good default for an 8B model on a 24 GB GPU.
  • Startup probe with 30 failures × 10 seconds = 5 minute window — accommodates first-time model pulls. Liveness probe fires only after startup succeeds.
  • volumeClaimTemplates with 200 Gi — each replica gets its own PVC. Models pull once per replica, cached forever after.

Apply this and you have a real deployment. The next sections add networking, security, and observability.


Service, Ingress, and TLS {#networking}

Headless Service for direct pod access

apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ollama
spec:
  clusterIP: None
  selector:
    app: ollama
  ports:
    - port: 11434
      name: http

Headless because StatefulSet pods get DNS names like ollama-0.ollama.ollama.svc.cluster.local, useful for sticky sessions if you implement them later.

ClusterIP for load-balanced access

apiVersion: v1
kind: Service
metadata:
  name: ollama-lb
  namespace: ollama
spec:
  type: ClusterIP
  selector:
    app: ollama
  ports:
    - port: 11434
      targetPort: 11434

Internal apps hit ollama-lb.ollama.svc.cluster.local:11434 and Kubernetes round-robins.

Ingress with TLS and API key auth (Nginx)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ollama
  namespace: ollama
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
    nginx.ingress.kubernetes.io/configuration-snippet: |
      if ($http_authorization !~ "^Bearer (sk-team-key-1|sk-team-key-2)$") {
        return 401;
      }
spec:
  ingressClassName: nginx
  tls:
    - hosts: [ollama.internal.example.com]
      secretName: ollama-tls
  rules:
    - host: ollama.internal.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ollama-lb
                port:
                  number: 11434

The proxy-read-timeout of 600 seconds is critical. Long generations on a 70B model can take 4-5 minutes. Default 60-second timeout will kill the request mid-token.

The configuration-snippet is a quick API key gate. For production, replace it with an OAuth2-proxy or Pomerium ext-auth filter — never inline secrets like that for real keys.


Helm Chart Walkthrough {#helm}

If you do not want to maintain raw manifests, the community otwld/ollama-helm chart is solid. Install it like this:

helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update

helm install ollama ollama-helm/ollama \
  --namespace ollama \
  --create-namespace \
  --set replicaCount=2 \
  --set ollama.gpu.enabled=true \
  --set ollama.gpu.type=nvidia \
  --set ollama.gpu.number=1 \
  --set ollama.models.pull[0]=llama3.1:8b \
  --set ollama.models.pull[1]=qwen2.5-coder:7b \
  --set persistentVolume.enabled=true \
  --set persistentVolume.size=200Gi \
  --set ingress.enabled=true \
  --set ingress.className=nginx \
  --set ingress.hosts[0].host=ollama.internal.example.com

The chart handles probes, volume claim templates, the Service, and ingress. You can fork it and add your custom configuration-snippet for auth.

For a values.yaml-based setup:

replicaCount: 2

ollama:
  gpu:
    enabled: true
    type: nvidia
    number: 1
  models:
    pull:
      - llama3.1:8b
      - qwen2.5-coder:7b
      - nomic-embed-text:latest

persistentVolume:
  enabled: true
  size: 200Gi
  storageClass: local-path-ssd

resources:
  limits:
    cpu: 8
    memory: 32Gi
  requests:
    cpu: 2
    memory: 8Gi

extraEnv:
  - name: OLLAMA_KEEP_ALIVE
    value: "24h"
  - name: OLLAMA_FLASH_ATTENTION
    value: "1"
  - name: OLLAMA_NUM_PARALLEL
    value: "4"

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: ollama.internal.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - hosts: [ollama.internal.example.com]
      secretName: ollama-tls

helm install ollama ollama-helm/ollama -f values.yaml -n ollama --create-namespace. Done.


Autoscaling with KEDA {#autoscaling}

CPU-based HPA is useless here — Ollama is GPU-bound and CPU stays at 5-10% even during heavy generation. Use KEDA with a Prometheus scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ollama-scaler
  namespace: ollama
spec:
  scaleTargetRef:
    name: ollama
    kind: StatefulSet
  minReplicaCount: 1
  maxReplicaCount: 6
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus.monitoring.svc:9090
        metricName: ollama_requests_in_flight
        threshold: "8"
        query: |
          sum(rate(nginx_ingress_controller_requests{ingress="ollama"}[1m]))

Scales up when sustained RPS exceeds 8 per replica. The 5-minute cooldown prevents flapping when traffic is bursty. minReplicaCount=1 keeps a warm pod with the model resident so cold starts are rare.

For predictable workloads (8am-6pm office hours), use a CronHPA instead. We pre-scale to 4 replicas at 8:55am and back to 1 at 6:30pm. Costs us nothing extra, latency stays under 200ms TTFB during peak.


Multi-Model Routing {#multi-model}

When the team needs both a chat model and a coding model, do not stuff them into one StatefulSet — VRAM thrashing destroys throughput. Run two StatefulSets, route by Host header:

- host: chat.ollama.internal.example.com
  http:
    paths:
      - backend: { service: { name: ollama-chat-lb, port: { number: 11434 } } }
- host: code.ollama.internal.example.com
  http:
    paths:
      - backend: { service: { name: ollama-code-lb, port: { number: 11434 } } }

Or by path prefix if you prefer one host:

- path: /v1/chat
  backend: { service: { name: ollama-chat-lb, port: { number: 11434 } } }
- path: /v1/code
  backend: { service: { name: ollama-code-lb, port: { number: 11434 } } }

Each model gets its own VRAM, its own scaling envelope, and its own SLO. We run llama3.1:8b for chat at 4 replicas and qwen2.5-coder:7b at 2 replicas — load patterns are completely different and decoupling them was the single biggest stability win.


Monitoring and Logs {#monitoring}

Prometheus scrape config

Ollama itself does not export Prometheus metrics yet, so we scrape the NVIDIA DCGM exporter (installed by the GPU Operator) for GPU metrics, and Nginx ingress for request metrics:

- job_name: 'dcgm'
  kubernetes_sd_configs:
    - role: pod
  relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      regex: nvidia-dcgm-exporter
      action: keep

Useful queries for a Grafana dashboard:

MetricQueryAlert threshold
GPU utilizationavg(DCGM_FI_DEV_GPU_UTIL{pod=~"ollama.*"})>95% for 10 min
VRAM usedDCGM_FI_DEV_FB_USED{pod=~"ollama.*"} / DCGM_FI_DEV_FB_FREE * 100>90%
RPSsum(rate(nginx_ingress_controller_requests{ingress="ollama"}[5m]))
p95 latencyhistogram_quantile(0.95, rate(nginx_ingress_controller_request_duration_seconds_bucket{ingress="ollama"}[5m]))>5s
Pod restartsincrease(kube_pod_container_status_restarts_total{namespace="ollama"}[1h])>0

Centralized logs with Loki

helm install loki grafana/loki-stack -n monitoring \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi

Promtail picks up Ollama's stdout automatically. Useful queries:

{namespace="ollama"} |= "error"
{namespace="ollama"} |~ "out of memory|CUDA error"
{namespace="ollama"} | json | duration > 5

For deeper Prometheus + Grafana setup specific to local AI, the Ollama production deployment guide has the dashboard JSON and alert rules.


Security Hardening {#security}

Five layers, in order of importance:

1. NetworkPolicy — block everything except the namespaces that should reach Ollama:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ollama-allow-from-apps
  namespace: ollama
spec:
  podSelector:
    matchLabels:
      app: ollama
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: apps
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - port: 11434
          protocol: TCP

2. ServiceAccount with no token mount — Ollama does not call the K8s API, so do not give it credentials:

spec:
  template:
    spec:
      automountServiceAccountToken: false

3. Pod Security Standards — set the namespace to restricted:

kubectl label namespace ollama \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/warn=restricted

4. ReadOnlyRootFilesystem — Ollama only writes to /root/.ollama (the PVC), so the rest of the filesystem can be read-only:

securityContext:
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000
volumeMounts:
  - name: tmp
    mountPath: /tmp
volumes:
  - name: tmp
    emptyDir: {}

5. Secret-backed API keys instead of inline:

apiVersion: v1
kind: Secret
metadata:
  name: ollama-api-keys
  namespace: ollama
type: Opaque
stringData:
  keys.conf: |
    sk-team-eng-2026-xxx
    sk-team-product-2026-yyy

Mount it into the ingress controller with an OAuth2-proxy ext-auth filter. Never bake keys into ingress annotations in source control.


Real Cluster Benchmarks {#benchmarks}

Numbers from our 4-node K3s cluster, measured April 2026:

Hardware: 3× nodes with RTX 4090 24 GB, 1× node with 2× A5000 24 GB. 64 GB RAM each, NVMe SSDs.

Workload: 22 engineers, 6,400 requests/day, 70/30 chat to code.

MetricValue
llama3.1:8b TTFB180 ms p50, 420 ms p95
llama3.1:8b throughput92 tok/s per replica
qwen2.5-coder:7b TTFB220 ms p50, 510 ms p95
qwen2.5-coder:7b throughput88 tok/s per replica
Pod cold start (warm PVC)12 s
Pod cold start (fresh PVC)78 s
Cluster GPU utilization avg34%
Cluster GPU utilization peak91%
Pod restarts per month3 (all OOM-related, fixed by raising memory limit)
Power draw (4 nodes)1.4 kW avg, 2.8 kW peak

The 3 OOM kills happened during a model swap when both 8B and 70B were briefly resident. Setting OLLAMA_MAX_LOADED_MODELS=2 and adding a memory request of 16 Gi prevented recurrence.


Common Pitfalls {#pitfalls}

1. Using a Deployment instead of a StatefulSet. Pods reschedule, PVCs detach, models re-download. Use StatefulSet.

2. Forgetting OLLAMA_HOST=0.0.0.0:11434. Ollama binds to localhost by default, which means the pod accepts connections only from itself. Probe fails, Service does not route, you spend an hour staring at a green pod that is unreachable.

3. Not tainting GPU nodes. General workloads schedule onto your $1,800 GPU and starve Ollama of CPU/RAM. Taint and tolerate.

4. ReadWriteMany on the model PVC. Ollama writes its blob index, two pods fighting over it corrupts the cache. Stick to ReadWriteOnce per replica.

5. Default ingress timeout. 60 seconds is fine for chat, fails for 70B generations. Set proxy-read-timeout to 600+.

6. HPA on CPU. Doesn't scale because GPU is the bottleneck. Use KEDA on RPS or queue depth.

7. No keep-alive. Models unload after 5 minutes, every burst pays the reload cost. Set OLLAMA_KEEP_ALIVE=24h.

8. Pulling models inside the manifest with initContainer. Works, but blocks pod readiness for minutes on every restart. Pre-pull via Job once, then let StatefulSet pods come up against the warm PVC.

9. Skipping NetworkPolicy. Default Kubernetes networking lets every pod talk to every pod. A compromised pod in another namespace can hit your unauthenticated 11434 port. Lock it down on day one.

10. Mismatched model names across replicas. Pull the same exact tag (llama3.1:8b, not latest) into every replica's PVC, otherwise you get inconsistent responses depending on which pod the request lands on.


Conclusion

Ollama on Kubernetes is the right move once you have more than two engineers or one production workload. The setup is more involved than brew install — but the manifests above are battle-tested, the benchmarks are real, and the failure modes are documented. Start with the StatefulSet, layer on ingress and TLS, add KEDA when you actually need scaling, and bolt on monitoring before you hit your first incident.

The biggest mindset shift coming from single-server Ollama is treating models as data, not as code. Models live on PVCs. Pods are cattle. The cluster heals itself when nodes reboot. Once that clicks, running a private 22-person AI platform feels almost boring — which is exactly what production should feel like.

Compared to the single-node Ollama production deployment, Kubernetes adds an order of magnitude more moving parts but pays for itself the first time a node dies at 2am and nobody gets paged. Combine this with Ollama load balancing for the routing layer and the Ollama production checklist for the security review, and you have a stack you can confidently put in front of a paying customer.


Want updates as we roll out the Kubernetes monitoring dashboards and the OAuth2-proxy auth template? Join the Local AI Master newsletter — one email a week, all production playbooks.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Enjoyed this? There are 10 full courses waiting.

10 complete AI courses. From fundamentals to production. Everything runs on your hardware.

Reading now
Join the discussion

LocalAimaster Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: April 23, 2026🔄 Last Updated: April 23, 2026✓ Manually Reviewed
PR

Written by Pattanaik Ramswarup

Creator of Local AI Master

I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

Was this helpful?

Related Guides

Continue your local AI journey with these comprehensive guides

Production Local AI, Delivered Weekly

Get the next deployment playbooks (Prometheus dashboards, OAuth2-proxy templates, multi-cluster failover) before they go public.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators