SOC 2 Compliance and AI Coding Tools: What Auditors Are Asking

SOC 2 (Service Organization Control 2) is the trust report that matters to enterprise buyers. If your SaaS product or cloud infrastructure is sold to enterprises, you're asked for SOC 2 Type 2 every single sales cycle.

SOC 2 focuses on five Trust Services Criteria: Security, Availability, Processing Integrity, Confidentiality, and Privacy. An auditor reviews your controls, tests whether they actually work, and issues a report that companies show to their risk and compliance teams.

For the past two years, SOC 2 auditors have started asking pointed questions about AI coding tools:

"Do your developers use AI tools? Which ones?"
"How is AI-generated code reviewed and tested differently from human code?"
"What happens if a vulnerability is discovered in code an AI agent wrote?"
"Are your dependencies themselves products of AI-generated code at your vendors?"

These are not yes/no questions. They're questions that probe your AI software supply chain risk. Here's how to answer them with evidence that satisfies auditors.

SOC 2 and Code Security

Two SOC 2 Trust Services Criteria are most relevant to AI-generated code:

CC8.1: Change Management

The Control (AICPA 2017 TSC, with 2022 revisions):

The entity authorizes, designs, develops or acquires, configures, documents, tests, approves, and implements changes to infrastructure, data, software, and procedures to meet its objectives.

What auditors look for:

Code review records (who reviewed this code?)
Change logs (when was this deployed?)
Testing results (what testing was performed?)
Approval workflows (who authorized this change?)

The AI question: If a change was made by an AI agent, how do you prove it went through the same review and approval process as human code?

Evidence to provide:

Change ID:     CHG-2026-04-29-001
Component:     Auth service, login handler
Author:        copilot-agent / version 2026-q1-v4
Human Author:  [email protected] (prompt writer)
Reviewer:      [email protected] (code review approval)
Review Date:   2026-04-29 10:15 UTC
Testing:       SCA clean, SAST clean, 3 new unit tests, integration test passed
Approved By:   [email protected] (change advisory board)
Deployed:      2026-04-29 14:30 UTC

This shows the auditor that even though an agent generated the code, a human reviewed it and an approval authority blessed the change.

CC6.1 / CC6.3: Logical Access and Least Privilege

The Controls (AICPA 2017 TSC, with 2022 revisions):

CC6.1 — The entity implements logical access security software, infrastructure, and architectures over protected information assets to protect them from security events to meet the entity's objectives.

CC6.3 — The entity authorizes, modifies, or removes access to data, software, functions, and other protected information assets based on roles, responsibilities, or the system design and changes, giving consideration to the concepts of least privilege and segregation of duties, to meet the entity's objectives.

What auditors look for:

Who (or what) has access to production systems?
How is access provisioned, scoped, and removed?
What prevents unauthorized deployments?

The AI question: Can your AI agents access production systems? Can they deploy code directly? Can they modify critical configuration? Are their service accounts scoped to least privilege?

(Restricting what an agent identity can reach is a logical access / least privilege question — CC6.1 and CC6.3 — not CC6.6, which is specifically about protecting against threats from outside the system boundary.)

Evidence to provide:

Control	Evidence
Agents cannot access production	Policy document + proof that agent service accounts are limited to dev/staging
Agents cannot modify secrets	Audit logs showing no secret updates from agent accounts
Agents cannot deploy directly	Approval workflow: agent submits PR, human reviews, human or CI/CD (not agent) deploys
Agents cannot access customer data	IAM policy showing agent accounts have zero PII permissions

Broader SOC 2 Implications

Security (CC Criterion Class)

AI-relevant controls (AICPA 2017 TSC):

CC6.1: Logical access security — The entity implements logical access security software, infrastructure, and architectures over protected information assets to protect them from security events. Treat AI agents as identities subject to the same logical access controls as humans (no shared service accounts, no implicit "root").
CC6.6: External boundary protection — Implements logical access security measures to protect against threats from sources outside its system boundaries. Agent runtimes that reach external models or tools sit on this boundary.
CC7.2: Anomaly monitoring — Monitors system components for anomalies indicative of malicious acts, natural disasters, and errors. Add alerts for agents attempting unauthorized actions.
CC8.1: Change management — Authorizes, tests, and implements changes to infrastructure, data, software, and procedures. AI-generated code is in scope: same review and approval evidence as human code.
CC9.1: Risk mitigation for business disruption — Identifies, selects, and develops risk mitigation activities for risks arising from potential business disruptions. Failure modes for autonomous agents (run-away loops, accidental destructive actions) belong here.

Availability (A Criterion Class)

AI-relevant controls (AICPA 2017 TSC, Additional Criteria):

A1.1: Capacity management — Maintains, monitors, and evaluates current processing capacity and use of system components to manage capacity demand. Track agent concurrency and rate limits as capacity inputs.
A1.2: Environmental protections / recovery infrastructure — Authorizes, designs, develops, implements, operates, approves, maintains, and monitors environmental protections, software, data backup processes, and recovery infrastructure to meet availability objectives. Treat the agent platform itself as recoverable infrastructure.
A1.3: Recovery plan testing — Tests recovery plan procedures to support system availability commitments and requirements. Include a "what if the agent platform is down" scenario.

Processing Integrity (PI Criterion Class)

AI-relevant controls (AICPA 2017 TSC, Additional Criteria — PI1 family):

PI1.1: Definitions and specifications — Obtains or generates, uses, and communicates relevant, quality information regarding the objectives related to processing, including definitions of data and product specifications. Document what AI-generated code is supposed to do.
PI1.2: System inputs — Implements policies and procedures over system inputs to result in products, services, and reporting that meet objectives. Validate the prompts and context that drive agent code generation.
PI1.3: System processing — Implements policies and procedures over system processing. Includes evidence that AI-generated code went through the documented review pipeline.
PI1.4: Output delivery — Implements policies and procedures over output delivery. Track which releases contain AI-generated components.
PI1.5: Data storage — Implements policies and procedures to store inputs, items in processing, and outputs completely, accurately, and timely. Retention applies to AI-generated artefacts and review records too.

What Auditors Actually Ask

Here's a real SOC 2 audit conversation (paraphrased):

Auditor: "I see from your change logs that commit abc123 was authored by 'copilot-agent' and merged by Alice on April 15. What is 'copilot-agent' and how is that reviewed?"

You (unprepared): "Oh, that's GitHub Copilot Coding Agent. It generates code. We... probably reviewed it?"

Auditor: "Probably? Can you show me the review record?"

You (worse): "It's in GitHub. Let me look... here's the PR. Bob left a comment. Does that count as a review?"

Auditor: "What did Bob review? Did he test the code? Did he understand what the AI generated? Did he specifically approve it as AI-generated code?"

You (in trouble): "Um... not specifically, no."

This is how your SOC 2 gets flagged. A finding written along these lines (illustrative — not an actual audit template) would read:

FINDING (illustrative): Organization lacks documented evidence of review for AI-generated code changes. While peer review processes exist for human-authored code, AI-generated code is treated identically without additional verification. Recommend organization implement explicit AI-code review checklist and evidence capture.

You (prepared): "We use GitHub Copilot Coding Agent, version 2026-q1-v4. Every agent PR is tagged with an 'ai-generated' label. Our code review process has an additional checklist for AI code: functional correctness, security testing, test coverage. Here are 10 examples of agent PRs from the past quarter, each with reviewer notes confirming they checked the AI-generated code specifically. And here's our policy document defining AI code review requirements."

Auditor: "Good. Shall we spot-check a few of those?"

This is how you pass.

SOC 2 Type 1 vs. Type 2 for AI Code

Type 1: Point-in-Time

Type 1 is a snapshot: "As of January 31, 2026, the organization's controls are suitably designed."

For AI-generated code: Show auditors your AI code review process exists and is documented.

Evidence needed:

Policy document on AI tool usage
Process flowchart showing how AI code is reviewed
Example change records from recent releases
Audit logs showing AI code is tracked

Effort: Moderate. You're proving controls exist, not that they've worked for a year.

Type 2: Operating Effectiveness Over Time

Type 2 covers 6–12 months: "The organization's controls operated effectively throughout the period."

For AI-generated code: Show auditors that every agent-generated code change was reviewed before deployment for the entire audit period.

Evidence needed:

6–12 months of change records
Each record shows: AI agent version, reviewer name, review date, approval
Audit logs proving reviewed code was deployed and unreviewed code was rejected
Metrics: X% of agent PRs reviewed before merge, Y% of those had sign-off

Effort: High. You're proving a pattern over time, not just describing a process.

Recommendation: Start AI code tracking now, even if your next SOC 2 is 6 months away. Type 2 auditors will ask for historical data.

Building AI-Code Evidence Library

To pass SOC 2, you need machine-generated evidence showing every agent-generated code change was reviewed.

Automated Evidence Collection

Integrate this into your CI/CD pipeline:

1. Tag agent-generated commits:

git commit -m "Fix: parse error in user validation" \
  -m "AI-Generated: copilot/coding-agent/2026-q1-v4" \
  -m "Reviewed-By: [email protected]" \
  -m "Review-Date: 2026-04-29T10:15:00Z"

2. Validate before merge:

# .github/workflows/validate-ai-code.yml
on: pull_request
jobs:
  validate-ai-code:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check AI-code review requirements
        run: |
          # Fail if this is an AI-generated PR without a review approval
          if grep -q "AI-Generated" PR_DESCRIPTION && ! grep -q "Reviewed-By" PR_DESCRIPTION; then
            echo "ERROR: AI-generated code requires explicit review approval"
            exit 1
          fi

3. Collect evidence at release:

# At release time, generate an audit report
./tools/generate-evidence-report.sh v2.14.1 > evidence/release-v2.14.1.json

4. Archive evidence: Store in compliance-evidence repository with retention policy (3+ years for SOC 2).

SOC 2 Audit Checklist for AI Code

When auditors arrive, they'll ask for:

List of all AI tools used (Copilot? Cursor? Custom agents?)
Policy document on AI code review requirements
Sample of 20 recent agent-generated commits showing review records
Metrics on agent code merge rate and human review coverage
Evidence that agent-code is tested same as human code (test coverage metrics)
Description of how AI agents are limited (what can they access? what can they deploy?)
Incident history: have any AI-generated code defects made it to production? (Answer: "yes, and here's how we detected and fixed it")
SBOM showing which dependencies have AI-generated patches
Policy on agent code in critical paths (auth, payments, data access)

Pro tip: Have this in one folder before the audit starts. Auditors are more efficient and confident if evidence is organized.

FAQ: Common Auditor Questions

Q: Do we need to review AI code differently than human code?

A: No, not differently — just explicitly. Same peer review, same testing, same approval. The difference is documenting that this specific code came from an AI and was intentionally reviewed as such.

Q: What if an AI-generated code change caused a production incident?

A: That's a story auditors want to hear. It proves your detection and response worked. Describe the incident, how it was detected, and what the fix was. If the response was good, auditors see maturity.

Q: Can we use automated testing as a substitute for human review on AI code?

A: Per SOC 2, no. CC8.1 (Change Management) requires documented authorization, design, testing, and approval of changes — not just test passage. Automated testing is required; human review and approval are required. Both.

Q: If we use AI agents, does that automatically fail us?

A: No. Auditors care about control, not tool choice. If you control and track AI code the same way you control human code, you pass.

References

AICPA 2017 Trust Services Criteria (with revised points of focus, 2022) — Official framework
AICPA-CIMA Trust Services criteria resources — Implementation guidance
CISA 2025 SBOM Minimum Elements (Draft for Comment, comment period closed Oct 2025) — For evidence collection
Vanta State of Trust 2025 (press release) — per the 2025 report, security teams spend roughly twelve working weeks per year on compliance tasks such as policy reviews and evidence collection (up from eleven the prior year)
Vanta State of Trust report (overview) — report landing page (full report is gated)