Can I claim SOC 2 if my AI runs on AWS or GCP?

Yes, using the carve-out method. The cloud provider's SOC 2 covers the infrastructure layer; your controls over the AI workload are documented separately.

How long should AI audit logs be retained for SOC 2?

Typically 365 days for audit metadata and 30-90 days for full prompt content. HIPAA and SOX add 6-7 years for logs in their scope.

Are LLM evaluations part of processing integrity evidence?

Yes. A maintained evaluation harness run before every model change with archived results satisfies PI1.1 cleanly.

SOC 2 for Self-Hosted AI: What Auditors Actually Want to See

Q: Does SOC 2 require self-hosted AI to be a separate certification?

No. Self-hosted AI is part of your existing SOC 2 scope. You add it to the asset inventory, map controls, and provide evidence. There is no separate AI certification.

Q: How do auditors test AI access controls?

They request a user list, sample 5-10 users, verify provisioning evidence (manager-approved tickets), and confirm deprovisioned users no longer have valid API tokens. Expect them to test a revoked token.

Q: What evidence works best for prompt-injection incident response?

A documented runbook plus a recorded annual tabletop exercise. Real incident history is not required, but tested response capability is.

Q: Do open-source models reduce SOC 2 burden?

Slightly. You remove vendor SLA dependency for the model but still own every infrastructure control. Savings come mostly from simpler vendor management.

Q: How do I handle SOC 2 for fine-tuned models trained on customer data?

Document the lineage of training data, treat fine-tuned weights as controlled artifacts, and honor deletion requests by retraining or pruning within a defined SLA. This is increasingly scrutinized in 2026 audits.

Published April 23, 2026 - 23 min read

I have sat through eight SOC 2 Type II audits over the last six years, three of them for organizations that ran self-hosted LLMs. The first time an auditor asked, "Where is your evidence that the inference server enforces least-privilege access?", we had no answer. The second time we did. The difference is what this guide is about.

The AICPA's SOC 2 framework never mentions "AI" or "LLM" anywhere. That is not a loophole - it means every Trust Service Criterion you are already required to meet now applies to the inference layer too. If your customers ask for a SOC 2 report, your self-hosted Ollama (or vLLM, TGI, llama.cpp) deployment must be in scope.

This is the audit-readiness checklist I wish someone had handed me in 2023. It maps Trust Service Criteria to concrete self-hosted AI controls, shows the evidence auditors actually accept, and flags the questions that come up in every Type II.

Quick Start: The Auditor's First Five Questions

Within 20 minutes of opening the AI portion of an audit, every auditor asks:

"Show me the inventory of every model you run and where the weights live."
"Show me access logs for the inference endpoint over the last 90 days."
"Walk me through how a user is provisioned and deprovisioned for AI access."
"Show me the change management record for the most recent model version bump."
"Show me the incident response runbook for a confirmed prompt injection."

If you can answer all five with a click, the rest of the audit is mechanical. If any one of them takes more than a few minutes to find, expect the engagement to drag for weeks.

SOC 2 Trust Service Criteria Mapped to AI
Scoping the AI Boundary
Security (CC1-CC9) for Inference Servers
Availability for Self-Hosted Models
Confidentiality and Prompt Data
Processing Integrity for AI Outputs
Privacy When Models Touch User Data
Evidence Templates
The 12-Week Readiness Plan
Pitfalls That Fail Audits

SOC 2 Trust Service Criteria Mapped to AI {#tsc-mapping}

SOC 2 has five Trust Service Criteria. Every one applies to self-hosted AI:

TSC	What It Means for Self-Hosted AI
Security (CC1-CC9, common)	Authenticated, encrypted, monitored access to the inference endpoint
Availability (A1)	SLAs on AI service uptime, capacity planning, DR plans
Confidentiality (C1)	Prompts and completions handled like other classified data
Processing Integrity (PI1)	Outputs are complete, valid, accurate, timely, authorized
Privacy (P1-P8)	If models process PII, full GDPR/CCPA-aligned controls

Most companies start with Security only (Type II Common Criteria). If you sell to enterprise, customers often request Confidentiality too. Healthcare and financial services demand all five.

For the GDPR-specific overlay, see our GDPR-compliant local AI guide. For HIPAA, see HIPAA-compliant local AI.

Scoping the AI Boundary {#scoping}

Auditors will not accept "the model is local so it's out of scope." If the model touches customer data or affects a customer-facing decision, it's in scope. Documenting scope is your first deliverable.

Sample scope statement (from a real readiness package):

In scope:
- Ollama 0.6.4 inference cluster (3 nodes, us-east-1 colo)
- nginx reverse proxy with mTLS and audit logging
- Model registry (S3 bucket with object lock)
- Embedding pipeline (workers + Postgres pgvector)
- Customer-facing AI features in app.example.com (5 endpoints)

Out of scope:
- Internal experimentation cluster (no customer data)
- Marketing demos (sandbox only, fake data)
- HuggingFace Hub (only used for one-time model downloads)

Pin this. The boundary is what every other control will be evaluated against.

Security (CC1-CC9) for Inference Servers {#security}

Security is the largest section and where most AI audits go sideways. Here is what each control number actually requires for a self-hosted AI box:

CC6.1 - Logical and Physical Access

Auditor asks: "How do you control who can access the inference server?"

Required evidence:

IAM policy file showing who can SSH, who can call the API, separated by role
mTLS certificates with documented issuance and revocation procedures
Quarterly access reviews with sign-off (screenshot of completed reviews)
Termination checklist that includes revoking AI bearer tokens within 24h

Sample script for the access review evidence:

#!/bin/bash
# /opt/ops/quarterly_access_review.sh
DATE=$(date +%Y-%m-%d)
{
  echo "# Q$(date +%q) $(date +%Y) Inference Access Review"
  echo
  echo "## Active mTLS clients"
  openssl x509 -in /etc/nginx/clients/*.crt -noout -subject -enddate
  echo
  echo "## Active bearer tokens (hashed)"
  jq '.tokens[] | {id, owner, last_used, expires}' /etc/ollama-proxy/tokens.json
} > /var/audit/reviews/$DATE-access-review.md

CC6.6 - Encryption in Transit

TLS 1.2+ on every API call. Auditors run testssl.sh against your endpoint. If they get a B grade, that's a finding.

docker run --rm drwetter/testssl.sh https://ai.example.com

Aim for A+. The full nginx hardening config is covered in Securing Ollama: Auth, TLS, Network Isolation.

CC6.7 - Encryption at Rest

Model weights, prompt logs, and embedding databases must all be encrypted at rest. Acceptable evidence:

LUKS or dm-crypt for the model storage volume (cryptsetup status output)
S3 SSE-KMS for any cloud-stored model artifacts
Postgres TDE or full-disk for pgvector
Documented key rotation procedure (typically 90 days for KMS keys)

CC7.2 - System Monitoring

Every API request must be logged with user identity, request size, response size, latency, and outcome. Bare logs are not enough; you need alerts:

# /etc/prometheus/alerts/ai.yml
groups:
- name: ai_security
  rules:
  - alert: HighAuthFailureRate
    expr: rate(nginx_http_requests_total{status="401"}[5m]) > 0.5
    for: 5m
    annotations:
      summary: "Sustained auth failures on AI endpoint"
  - alert: AnomalousPromptVolume
    expr: rate(ai_requests_total[10m]) > 3 * avg_over_time(rate(ai_requests_total[1d])[7d:1h])
    for: 10m
    annotations:
      summary: "Prompt volume anomaly - possible scraping"

CC7.3 - Incident Response

You need a runbook specifically for AI incidents. Generic infosec runbooks will not pass. The categories auditors expect:

Confirmed prompt injection escalating privilege
Model output disclosing training data verbatim
Inference service unavailable beyond SLA
Tampering with model weights detected (SHA mismatch)
Data exfiltration via long prompts or tool use

For each, document detection signals, containment, eradication, recovery, and post-mortem template. Test annually with tabletop exercises and keep notes.

CC8.1 - Change Management

Every model swap is a change. The artifact every auditor wants is a ticket showing: who requested, what changed (model name + SHA), peer review, test results, rollback plan, deployment timestamp.

Sample CHANGE-1284:

Title:    Bump primary chat model from llama3.1:70b to llama3.3:70b
Author:   alice@example.com
Reviewer: bob@example.com (approved 2026-04-12)
Tested:   Eval suite v3.2 - 87.3% pass (vs 84.1% prev), 14 regressions reviewed
Risk:     Medium - new model, similar architecture, identical context length
SHA:     b8c7e4...d2a1f9
Deployed: 2026-04-15 14:22 UTC by ops
Rollback: ollama pull llama3.1:70b && systemctl reload ai-router
Outcome:  No incidents in 7-day post-deployment window

Availability for Self-Hosted Models {#availability}

A1 controls require:

Documented SLA (e.g., 99.5% monthly availability for AI features)
Capacity model (tokens/second supported, peak headroom)
DR plan with RTO/RPO (typical: RTO 4h, RPO 1h)
Quarterly DR test with evidence

Capacity model template:

Component       Capacity         Peak observed     Headroom
Ollama node 1   220 tok/s        140 tok/s         36%
Ollama node 2   220 tok/s        135 tok/s         39%
Ollama node 3   220 tok/s        128 tok/s         42%
Cluster total   660 tok/s        403 tok/s         39%

If headroom drops below 20%, capacity expansion ticket auto-fires. Auditors love this.

For DR, document model artifact backups, configuration backups, and a tested restore procedure. Most production teams use S3 with object lock for model weights and Velero or restic for system state.

Confidentiality and Prompt Data {#confidentiality}

This is where AI is genuinely different from traditional SaaS. Prompts and completions can contain:

Customer credentials they accidentally pasted
Source code with proprietary algorithms
PHI, PII, financial data, legal data
Model outputs that themselves are derivative IP

Confidentiality controls auditors expect:

DLP scanning on inbound prompts (regex + ML classifier)
Field-level redaction before persistence in audit logs
Tiered retention: 30 days for full prompts, 365 days for redacted metadata
Clear data classification policy that includes "AI prompt content" as a class
Customer disclosure: "Your prompts are logged for X days, retained in Y location, accessed only by Z roles"

Sample retention policy excerpt:

AI prompt and completion data is classified as Confidential.
- Full prompt content is retained for 30 days in encrypted log storage.
- Redacted metadata (user, latency, tokens, model) is retained 365 days.
- Access requires "ai-auditor" role assignment.
- Quarterly review of access list by Security Officer.

Processing Integrity for AI Outputs {#processing-integrity}

PI1 is the criterion most teams fumble for AI. The question is: "How do you know your AI output is complete, valid, accurate, and authorized?"

You cannot promise the model is correct - it's a probabilistic system. You can promise:

Inputs are validated against a schema
Outputs go through guardrails (Llama Guard 3, regex denylists)
Tool calls are validated and authorized before execution
Results are stored with provenance (model version, timestamp, input hash)
Human-in-the-loop for high-stakes decisions

Sample provenance record persisted with every completion:

{
  "request_id": "01HW8K...",
  "user": "alice@example.com",
  "model": "llama3.3:70b",
  "model_sha": "b8c7e4...d2a1f9",
  "ts": "2026-04-23T14:01:33Z",
  "input_hash": "sha256:f3d2...",
  "guardrail_pass": true,
  "tool_calls": ["search.docs", "ticket.create"],
  "tool_authorization": "user_self_ticket_only"
}

This single JSON, persisted alongside outputs, satisfies PI1 evidence requests without breaking a sweat.

Privacy When Models Touch User Data {#privacy}

If your scope includes Privacy criterion (P1-P8), you need:

A signed Data Processing Agreement covering AI processing
Data subject access procedures that include AI-derived data
Documented purpose limitation (AI used only for stated purposes)
Mechanism to honor right-to-erasure across prompt logs and any fine-tuned weights
Cross-border transfer assessments if data leaves jurisdiction (none, ideally, for self-hosted)

The "right to erasure" for fine-tuned weights is particularly thorny. The defensible answer: maintain a documented fine-tune lineage with training data references. If a deletion request matches data used in training, retraining or pruning is required within a defined window (typically 30-60 days).

Evidence Templates {#evidence}

Auditors love templates. Here are the four they ask for most.

1. Model Inventory

Model name      Version   SHA256             Source              Approved by   Date
llama3.3:70b    v1.0      b8c7e4...d2a1f9    ollama.com/library  alice@        2026-03-12
nomic-embed     v1.5      3a2c1d...e7f8b2    ollama.com/library  alice@        2026-02-04
guard-3:8b      v1.0      9d8e7f...c1b4a3    ollama.com/library  bob@          2026-03-12

2. Audit Log Sample

JSON Lines format, one event per line:

{"ts":"2026-04-23T14:01:33Z","user":"alice@example.com","ip":"10.20.30.10","model":"llama3.3:70b","input_tokens":450,"output_tokens":1240,"latency_ms":4823,"status":200,"guardrail_pass":true}

3. Access Review Snapshot (quarterly)

A markdown file generated by the script earlier, signed off by the security officer in your ticketing system.

4. Incident Post-Mortem (when applicable)

Use a standard template with sections for timeline, impact, root cause, remediation, prevention. AI incidents specifically should call out: prompt or input that triggered, model version, guardrail outcome, customer notification status.

The 12-Week Readiness Plan {#readiness}

This is the schedule I run when a customer says "we're going for SOC 2 Type II in 6 months and AI needs to be in scope."

Week	Deliverable
1	Scope definition, asset inventory, model registry
2	TLS hardening, mTLS rollout, network isolation
3	Authentication and authorization (SSO + token rotation)
4	Audit logging pipeline + retention policy
5	Encryption at rest verification + key rotation procedure
6	Change management workflow + first model bump as test case
7	Incident response runbooks + tabletop exercise
8	DR plan + first restore test
9	DLP and guardrails for prompts and outputs
10	Access reviews + termination procedures
11	Vendor management for any third-party tools
12	Internal audit + remediation

Then start the formal audit observation window. Most Type II audits cover a 3-month window minimum, 12 months ideal.

Pitfalls That Fail Audits {#pitfalls}

Mistakes I've watched cost real money:

1. Treating "self-hosted" as automatic compliance It is not. Self-hosted shifts the responsibility from a vendor's SOC 2 report to yours. You absorb every control.

2. No model version pinning "Latest" is not a version. Auditors will mark this as a change-management finding within minutes.

3. Audit logs without integrity protection Logs that anyone can edit are worthless. Use append-only sinks (Loki + S3 object lock, or AWS CloudTrail equivalents).

4. No defined retention for prompt data Indefinite retention is a privacy violation. Zero retention loses you incident-response evidence. Pick a defined window and document it.

5. Sharing service accounts "The team uses one Ollama bearer token" is a CC6.1 fail. Per-user tokens or SSO-mapped tokens, period.

6. Skipping the tabletop Untested incident runbooks are paperwork, not controls. Run the exercise once a year and document the outcome.

7. Forgetting fine-tuning data If you fine-tune on customer data, the fine-tuned weights inherit that data's classification. Treat them like a derivative database.

8. No DLP on prompts Customer accidentally pastes their AWS root key. It lands in your logs. Now you have a data-handling incident. DLP at ingress prevents this.

Frequently Asked Questions

Does SOC 2 require self-hosted AI to be a separate certification?

No. Self-hosted AI is just another part of your existing SOC 2 scope. You add it to your asset inventory, map controls to it, and provide evidence like any other system. There is no separate AI certification, though some customers ask for AI-specific addendums to your standard report.

Can I claim SOC 2 if my AI runs on someone else's infrastructure?

If you're running Ollama on AWS or GCP, the cloud provider's SOC 2 covers the underlying compute, but your controls over the AI workload are still yours to evidence. Use the carve-out method: rely on the cloud provider's report for infrastructure, document your controls for everything you operate.

How do auditors test AI access controls?

They request a list of users with access, sample 5-10 of them, ask for evidence those users were properly provisioned (ticket with manager approval), and verify deprovisioned users no longer have tokens valid against the API. Expect them to actually try a revoked token.

What evidence works best for prompt-injection incident response?

A documented runbook plus a recorded tabletop exercise transcript. Auditors do not require you to have had a real incident - they require you to have a tested response capability.

How long should I retain AI audit logs for SOC 2?

Common practice is 365 days for security-relevant audit metadata and 30-90 days for full prompt content. Anything shorter requires explicit risk acceptance. HIPAA and SOX add 6-7 years of retention for any logs in their scope.

Do open-source models reduce SOC 2 burden?

Slightly - open weights mean no vendor SLA dependency for the model itself. You still own every infrastructure control. The savings is mostly in vendor management (CC9) where you can carve out the model provider.

How do I handle SOC 2 for fine-tuned models trained on customer data?

Document the lineage: which customer data was included, who authorized it, when. Treat fine-tuned weights as a controlled artifact. Honor deletion requests by retraining or pruning within your stated SLA. Auditors increasingly ask about this in 2026.

Are LLM benchmarks part of processing integrity evidence?

Yes. An evaluation harness with checked-in test cases, run before every model bump with results stored, satisfies PI1.1 nicely. We use LMEval and our own task-specific suite. Keep the test results forever; they are gold during audits.

Bottom Line

SOC 2 for self-hosted AI is not exotic; it's the same controls you already owe customers, applied to a new system. The hard part is recognizing that the AI server isn't out of scope just because the model file lives on your disk. Once you accept that, the readiness work is straightforward: inventory, isolate, encrypt, authenticate, log, review, document, repeat.

Auditors will not punish you for honest gaps. They will punish you for missing inventories, lazy access controls, or "AI is special, we don't apply normal controls" hand-waving. Treat your inference cluster like a database that occasionally writes English instead of SQL, and you'll pass.

Bookmark this page, walk it once a quarter, and your next audit becomes a paperwork exercise instead of a fire drill.

SOC 2 for Self-Hosted AI: What Auditors Actually Want to See

Want to go deeper than this article?

SOC 2 for Self-Hosted AI: What Auditors Actually Want to See

Quick Start: The Auditor's First Five Questions

Table of Contents

SOC 2 Trust Service Criteria Mapped to AI {#tsc-mapping}

Scoping the AI Boundary {#scoping}

Security (CC1-CC9) for Inference Servers {#security}

CC6.1 - Logical and Physical Access

CC6.6 - Encryption in Transit

CC6.7 - Encryption at Rest

CC7.2 - System Monitoring

CC7.3 - Incident Response

CC8.1 - Change Management

Availability for Self-Hosted Models {#availability}

Confidentiality and Prompt Data {#confidentiality}

Processing Integrity for AI Outputs {#processing-integrity}

Privacy When Models Touch User Data {#privacy}

Evidence Templates {#evidence}

1. Model Inventory

2. Audit Log Sample

3. Access Review Snapshot (quarterly)

4. Incident Post-Mortem (when applicable)

The 12-Week Readiness Plan {#readiness}

Pitfalls That Fail Audits {#pitfalls}

Frequently Asked Questions

Does SOC 2 require self-hosted AI to be a separate certification?

Can I claim SOC 2 if my AI runs on someone else's infrastructure?

How do auditors test AI access controls?

What evidence works best for prompt-injection incident response?

How long should I retain AI audit logs for SOC 2?

Do open-source models reduce SOC 2 burden?

How do I handle SOC 2 for fine-tuned models trained on customer data?

Are LLM benchmarks part of processing integrity evidence?

Bottom Line

Go from reading about AI to building with AI

Enjoyed this? There are 10 full courses waiting.

LocalAimaster Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by Pattanaik Ramswarup

🎓 Continue Learning

Get the SOC 2 Evidence Pack

Related Guides

Local AI vs ChatGPT: Complete Comparison

5 Compelling Reasons to Run AI Locally

How to Fine-tune Local AI Models for Your Business

Install Your First Local AI Model

Build Real AI on Your Machine

Continue Learning

Securing Ollama

HIPAA Compliant Local AI

EU AI Act Local Compliance

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI