SOC 2 (Service Organization Control 2) is the trust report that matters to enterprise buyers. If your SaaS product or cloud infrastructure is sold to enterprises, you're asked for SOC 2 Type 2 every single sales cycle.
SOC 2 focuses on five Trust Services Criteria: Security, Availability, Processing Integrity, Confidentiality, and Privacy. An auditor reviews your controls, tests whether they actually work, and issues a report that companies show to their risk and compliance teams.
For the past two years, SOC 2 auditors have started asking pointed questions about AI coding tools:
- "Do your developers use AI tools? Which ones?"
- "How is AI-generated code reviewed and tested differently from human code?"
- "What happens if a vulnerability is discovered in code an AI agent wrote?"
- "Are your dependencies themselves products of AI-generated code at your vendors?"
These are not yes/no questions. They're questions that probe your AI software supply chain risk. Here's how to answer them with evidence that satisfies auditors.
SOC 2 and Code Security
Two SOC 2 Trust Services Criteria are most relevant to AI-generated code:
CC8.1: Change Management
The Control (AICPA 2017 TSC, with 2022 revisions):
The entity authorizes, designs, develops or acquires, configures, documents, tests, approves, and implements changes to infrastructure, data, software, and procedures to meet its objectives.
What auditors look for:
- Code review records (who reviewed this code?)
- Change logs (when was this deployed?)
- Testing results (what testing was performed?)
- Approval workflows (who authorized this change?)
The AI question: If a change was made by an AI agent, how do you prove it went through the same review and approval process as human code?
Evidence to provide:
Change ID: CHG-2026-04-29-001
Component: Auth service, login handler
Author: copilot-agent / version 2026-q1-v4
Human Author: [email protected] (prompt writer)
Reviewer: [email protected] (code review approval)
Review Date: 2026-04-29 10:15 UTC
Testing: SCA clean, SAST clean, 3 new unit tests, integration test passed
Approved By: [email protected] (change advisory board)
Deployed: 2026-04-29 14:30 UTC
This shows the auditor that even though an agent generated the code, a human reviewed it and an approval authority blessed the change.
CC6.6: External Access Boundary Protection
The Control (AICPA 2017 TSC, with 2022 revisions):
The entity implements logical access security measures to protect against threats from sources outside its system boundaries.
What auditors look for:
- Who has access to production systems?
- How is access provisioned?
- What prevents unauthorized deployments?
The AI question: Can your AI agents access production systems? Can they deploy code directly? Can they modify critical configuration?
Evidence to provide:
| Control | Evidence |
|---|---|
| Agents cannot access production | Policy document + proof that agent service accounts are limited to dev/staging |
| Agents cannot modify secrets | Audit logs showing no secret updates from agent accounts |
| Agents cannot deploy directly | Approval workflow: agent submits PR, human reviews, human or CI/CD (not agent) deploys |
| Agents cannot access customer data | IAM policy showing agent accounts have zero PII permissions |
Broader SOC 2 Implications
Security (CC Criterion Class)
AI-relevant controls (AICPA 2017 TSC):
- CC6.1: Logical access security — The entity implements logical access security software, infrastructure, and architectures over protected information assets to protect them from security events. Treat AI agents as identities subject to the same logical access controls as humans (no shared service accounts, no implicit "root").
- CC6.6: External boundary protection — Implements logical access security measures to protect against threats from sources outside its system boundaries. Agent runtimes that reach external models or tools sit on this boundary.
- CC7.2: Anomaly monitoring — Monitors system components for anomalies indicative of malicious acts, natural disasters, and errors. Add alerts for agents attempting unauthorized actions.
- CC8.1: Change management — Authorizes, tests, and implements changes to infrastructure, data, software, and procedures. AI-generated code is in scope: same review and approval evidence as human code.
- CC9.1: Risk mitigation for business disruption — Identifies, selects, and develops risk mitigation activities for risks arising from potential business disruptions. Failure modes for autonomous agents (run-away loops, accidental destructive actions) belong here.
Availability (A Criterion Class)
AI-relevant controls (AICPA 2017 TSC, Additional Criteria):
- A1.1: Capacity management — Maintains, monitors, and evaluates current processing capacity and use of system components to manage capacity demand. Track agent concurrency and rate limits as capacity inputs.
- A1.2: Environmental protections / recovery infrastructure — Authorizes, designs, develops, implements, operates, approves, maintains, and monitors environmental protections, software, data backup processes, and recovery infrastructure to meet availability objectives. Treat the agent platform itself as recoverable infrastructure.
- A1.3: Recovery plan testing — Tests recovery plan procedures to support system availability commitments and requirements. Include a "what if the agent platform is down" scenario.
Processing Integrity (PI Criterion Class)
AI-relevant controls (AICPA 2017 TSC, Additional Criteria — PI1 family):
- PI1.1: Definitions and specifications — Obtains or generates, uses, and communicates relevant, quality information regarding the objectives related to processing, including definitions of data and product specifications. Document what AI-generated code is supposed to do.
- PI1.2: System inputs — Implements policies and procedures over system inputs to result in products, services, and reporting that meet objectives. Validate the prompts and context that drive agent code generation.
- PI1.3: System processing — Implements policies and procedures over system processing. Includes evidence that AI-generated code went through the documented review pipeline.
- PI1.4: Output delivery — Implements policies and procedures over output delivery. Track which releases contain AI-generated components.
- PI1.5: Data storage — Implements policies and procedures to store inputs, items in processing, and outputs completely, accurately, and timely. Retention applies to AI-generated artefacts and review records too.
What Auditors Actually Ask
Here's a real SOC 2 audit conversation (paraphrased):
Auditor: "I see from your change logs that commit abc123 was authored by 'copilot-agent' and merged by Alice on April 15. What is 'copilot-agent' and how is that reviewed?"
You (unprepared): "Oh, that's GitHub Copilot Coding Agent. It generates code. We... probably reviewed it?"
Auditor: "Probably? Can you show me the review record?"
You (worse): "It's in GitHub. Let me look... here's the PR. Bob left a comment. Does that count as a review?"
Auditor: "What did Bob review? Did he test the code? Did he understand what the AI generated? Did he specifically approve it as AI-generated code?"
You (in trouble): "Um... not specifically, no."
This is how your SOC 2 gets flagged. A finding written along these lines (illustrative — not an actual audit template) would read:
FINDING (illustrative): Organization lacks documented evidence of review for AI-generated code changes. While peer review processes exist for human-authored code, AI-generated code is treated identically without additional verification. Recommend organization implement explicit AI-code review checklist and evidence capture.
You (prepared): "We use GitHub Copilot Coding Agent, version 2026-q1-v4. Every agent PR is tagged with an 'ai-generated' label. Our code review process has an additional checklist for AI code: functional correctness, security testing, test coverage. Here are 10 examples of agent PRs from the past quarter, each with reviewer notes confirming they checked the AI-generated code specifically. And here's our policy document defining AI code review requirements."
Auditor: "Good. Shall we spot-check a few of those?"
This is how you pass.
SOC 2 Type 1 vs. Type 2 for AI Code
Type 1: Point-in-Time
Type 1 is a snapshot: "As of January 31, 2026, the organization's controls are suitably designed."
For AI-generated code: Show auditors your AI code review process exists and is documented.
Evidence needed:
- Policy document on AI tool usage
- Process flowchart showing how AI code is reviewed
- Example change records from recent releases
- Audit logs showing AI code is tracked
Effort: Moderate. You're proving controls exist, not that they've worked for a year.
Type 2: Operating Effectiveness Over Time
Type 2 covers 6–12 months: "The organization's controls operated effectively throughout the period."
For AI-generated code: Show auditors that every agent-generated code change was reviewed before deployment for the entire audit period.
Evidence needed:
- 6–12 months of change records
- Each record shows: AI agent version, reviewer name, review date, approval
- Audit logs proving reviewed code was deployed and unreviewed code was rejected
- Metrics: X% of agent PRs reviewed before merge, Y% of those had sign-off
Effort: High. You're proving a pattern over time, not just describing a process.
Recommendation: Start AI code tracking now, even if your next SOC 2 is 6 months away. Type 2 auditors will ask for historical data.
Building AI-Code Evidence Library
To pass SOC 2, you need machine-generated evidence showing every agent-generated code change was reviewed.
Automated Evidence Collection
Integrate this into your CI/CD pipeline:
1. Tag agent-generated commits:
git commit -m "Fix: parse error in user validation" \
-m "AI-Generated: copilot/coding-agent/2026-q1-v4" \
-m "Reviewed-By: [email protected]" \
-m "Review-Date: 2026-04-29T10:15:00Z"
2. Validate before merge:
# .github/workflows/validate-ai-code.yml
on: pull_request
jobs:
validate-ai-code:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check AI-code review requirements
run: |
# Fail if this is an AI-generated PR without a review approval
if grep -q "AI-Generated" PR_DESCRIPTION && ! grep -q "Reviewed-By" PR_DESCRIPTION; then
echo "ERROR: AI-generated code requires explicit review approval"
exit 1
fi
3. Collect evidence at release:
# At release time, generate an audit report
./tools/generate-evidence-report.sh v2.14.1 > evidence/release-v2.14.1.json
4. Archive evidence: Store in compliance-evidence repository with retention policy (3+ years for SOC 2).
SOC 2 Audit Checklist for AI Code
When auditors arrive, they'll ask for:
- List of all AI tools used (Copilot? Cursor? Custom agents?)
- Policy document on AI code review requirements
- Sample of 20 recent agent-generated commits showing review records
- Metrics on agent code merge rate and human review coverage
- Evidence that agent-code is tested same as human code (test coverage metrics)
- Description of how AI agents are limited (what can they access? what can they deploy?)
- Incident history: have any AI-generated code defects made it to production? (Answer: "yes, and here's how we detected and fixed it")
- SBOM showing which dependencies have AI-generated patches
- Policy on agent code in critical paths (auth, payments, data access)
Pro tip: Have this in one folder before the audit starts. Auditors are more efficient and confident if evidence is organized.
FAQ: Common Auditor Questions
Q: Do we need to review AI code differently than human code?
A: No, not differently — just explicitly. Same peer review, same testing, same approval. The difference is documenting that this specific code came from an AI and was intentionally reviewed as such.
Q: What if an AI-generated code change caused a production incident?
A: That's a story auditors want to hear. It proves your detection and response worked. Describe the incident, how it was detected, and what the fix was. If the response was good, auditors see maturity.
Q: Can we use automated testing as a substitute for human review on AI code?
A: Per SOC 2, no. CC8.1 (Change Management) requires documented authorization, design, testing, and approval of changes — not just test passage. Automated testing is required; human review and approval are required. Both.
Q: If we use AI agents, does that automatically fail us?
A: No. Auditors care about control, not tool choice. If you control and track AI code the same way you control human code, you pass.
References
- AICPA 2017 Trust Services Criteria (with revised points of focus, 2022) — Official framework
- AICPA-CIMA Trust Services criteria resources — Implementation guidance
- CISA 2025 SBOM Minimum Elements (Draft for Comment, comment period closed Oct 2025) — For evidence collection
- Vanta State of Trust report — industry estimates suggest a typical compliance burden of roughly twelve working weeks per engineer per year (TODO: confirm exact figure / year against the gated full report)