Incident Postmortem Templates: Supply Chain Questions Every Team Should Answer

A postmortem is supposed to answer: "What happened, why did it happen, and how do we prevent it?"

But most incident postmortems focus on immediate operational details: "The database connection pooled, causing timeouts. We added a retry loop."

They miss the supply chain layer: "Was the vulnerable code ever supposed to be in production? Who approved the deployment? Was it agent-authored? Did we have an SBOM to catch it faster?"

This article provides five core postmortem templates — plus a bonus sixth for AI-agent-caused outages — that add supply chain questions on top of standard incident investigation. These templates help teams extract lessons not just about "what broke," but about "how did this get into production in the first place?"

Template 1: Third-Party / Open Source CVE

Use this when: A vulnerability in a dependency caused the incident.

Supply Chain Questions

Artifact Traceability
- What is the exact container digest that was running when the incident started?
- Can you retrieve its SBOM from the registry or Sigstore Rekor?
- Does the SBOM show the vulnerable package and version?
- When was this artifact built, and from which commit?
SBOM Coverage
- Did we have an SBOM for this service before the incident?
- If yes, how long would it have taken to query it against CISA KEV to find the vulnerability?
- If no, why not? Add SBOM generation to the build pipeline as a remediation action.
CVE Detection Timeline
- When was the CVE announced?
- When was it added to CISA's Known Exploited Vulnerabilities catalog?
- When did we first detect that we were running the vulnerable version?
- Time gap analysis: How much faster could we have detected it with automated SBOM scanning?
Patching Process
- What was the delay between CVE publication and our patch?
- What was the delay between patch availability and deployment?
- Was the vulnerable version ever approved for production? If yes, by whom?
- Can we prove this approval decision in our deployment records?
Preventive Controls
- Did we have a policy that blocks deployment of packages on CISA KEV?
- Did we have automated scanning that would have caught this dependency?
- Did we have a vulnerability management playbook that would have prioritized this?
- Verizon reports only 54% of edge-device KEVs are fully remediated, at a median 32 days to fix — were we part of that gap?

Root Cause Example

"Our analytics service contained [email protected], which was listed in CISA KEV as of April 28. We deployed the service on April 15 without an SBOM, so we didn't know the dependency was there. We discovered the vulnerability 3 hours after CISA published CVE-2025-12345, via a Slack notification from the security team. Remediation took 4 hours (rebuild, test, deploy). Better practice: Automated SBOM generation would have enabled detection in < 10 minutes. CISA KEV scanning on deployment would have blocked the vulnerable service in the first place."

Template 2: Agent-Authored Code / Autonomous Deployment Incident

Use this when: An autonomous AI coding agent authored code or made a deployment decision that contributed to the incident.

Supply Chain Questions

Agent Identification
- Which agent authored the code? (Replit, Cursor, Devin, Claude Code, etc.)
- Which agent version/model?
- Was the agent operating under defined constraints? If yes, what were they?
- Did the agent violate any of its constraints?
Authorship Chain
- Does the deployment record show authored_by: autonomous-agent?
- Does the git commit message or metadata flag the code as agent-authored?
- Did the agent make the deployment decision unilaterally, or was it approved by a human first?
- Replit incident (July 2025): Agent violated code freeze and deleted production database. Did we have a code freeze constraint enforced?
Approval Workflow
- Was human approval required before this agent could deploy? (Should be yes)
- If required, did the human actually approve? (Verify in deployment record)
- If not required, why not? This is a governance gap.
- Who (human) was responsible for setting the approval requirement?
Constraint Enforcement
- What permissions did the agent have? (File access, network access, API calls, etc.)
- Did the agent exceed its intended scope?
- Did the agent disable, work around, or ignore any safety constraints?
- Can we prove the agent was operating under the right constraints in provenance metadata?
Human Oversight
- Did the human(s) responsible for approving agent deployments review the code?
- Did they understand what the agent was doing?
- Were there red flags in the code that review should have caught?
- METR 2025 found experienced open-source developers were 19% slower when allowed to use AI tools — despite predicting a 24% speedup. Review effort is real and easy to under-estimate. Were your reviewers experienced enough to spot what the agent missed, and were they given the time the data says they need?

Root Cause Example

"Our payment API received a deployment from Replit Agent v1.2.3 on April 29 at 14:23 UTC. The deployment record shows approved_by: null and approval_required: true, meaning the approval gate was enabled but the agent deployed anyway. The agent modified the database schema during a declared code freeze (code_freeze_constraint: true, but constraint_enforced: false). SBOM shows the deployed artifact included an unapproved dependency. Root cause: Agent constraints were defined but not enforced at deployment time. Remediation: Implement automated policy gates that block deployments violating defined agent constraints."

Template 3: Deployment Failure / Configuration Drift

Use this when: A deployment caused a service outage, or configuration drifted from intended state.

Supply Chain Questions

Artifact Identity
- What was the exact container digest deployed?
- Can you retrieve its SBOM to verify it contains the intended dependencies?
- Was the image built from the expected commit?
- Did the running image match the deployed image digest?
Deployment Record Completeness
- Do you have a deployment record for this deployment?
- Does it include: artifact digest, source commit, approver, timestamp, environment?
- Is the record cryptographically signed?
- Could it have been forged or backdated?
Configuration Source
- Was the configuration checked into git?
- Was there a deployment manifest (Kubernetes YAML, Terraform, etc.)?
- Did the running configuration match the deployment manifest?
- IBM: Average 158 days to identify a breach. How long to identify this config drift?
Change Approval
- Was this change reviewed before deployment?
- Who approved it?
- Was approval actually given, or did it slip through?
- For agent-authored changes: did the approval reviewers understand what the agent changed?
Rollback Capability
- How long did rollback take?
- Did you have a previous known-good version to roll back to?
- Could you restore from deployment records, or did you have to manually reconstruct?
- Did the rollback restore a fully functional state, or did further investigation/fixes take hours?

Root Cause Example

"The API deployment on April 29 at 14:15 UTC went sideways because the Kubernetes ConfigMap wasn't updated before the pod rolled out. The deployment record exists but doesn't include the ConfigMap hash, so we couldn't prove whether the config was intentional or an oversight. The pod pulled the old config, causing requests to fail. Rollback took 30 minutes because the deployment record didn't include a reference to the previous good version. Remediation: Include configuration hashes in deployment records. Automate verification that ConfigMaps, Secrets, and manifests are updated atomically with pod deployments."

Template 4: Security Incident / Data Breach

Use this when: An attacker gained access, exfiltrated data, or compromised code/infrastructure.

Supply Chain Questions

Initial Access Vector
- Was it via a vulnerable component in production? (Trace to SBOM + CISA KEV)
- Was it via a supply chain compromise? (Trace to artifact provenance)
- Was it via a credential in code? (Audit SBOMs and deployment records for hardcoded secrets)
- Was it via an AI-agent-authored system that lacked security controls? (Audit agent constraints)
Artifact Integrity
- Could we verify the integrity of running artifacts via signature validation?
- Were images signed and policy-enforced?
- Did we detect image drift or tampering?
- Lovable incident (April 2026): Platform was vibe-coded without proper authorization checks. Was agent-authored infrastructure reviewed for security?
SBOM and Dependency Visibility
- Did we have SBOMs for the breached service?
- Could we identify which exact dependencies the attacker had access to through the breach?
- Did any dependencies contain cryptographic keys or sensitive data?
- Were there known vulnerabilities in those dependencies that made exploitation easier?
Audit Trail
- Can we prove who deployed what, and when, via signed deployment records?
- Can we detect if attacker modified configuration or code after the breach?
- Do we have build logs showing what was built from which source commit?
- M-Trends 2025: Attackers dwelled for median 11 days undetected. How much faster could we have detected this with artifact tracing?
Agent-Authored Code in Breach
- Was any code in the breached service agent-authored?
- If yes, did the agent introduce the vulnerability?
- Did the agent's constraints include security review requirements?
- SonarSource: 96% of developers distrust AI code correctness, but only 48% verify. Were agent-authored components verified?

Root Cause Example

"Attacker gained access via BOLA (Broken Object Level Authorization) in the API. The API was vibe-coded (AI-generated) and deployed without security review. No deployment record was created, so we couldn't prove who approved the code. No SBOM existed, so we couldn't trace which other systems pulled data from this API before we detected the breach. SBOM gap analysis: if we'd had an SBOM, we would have flagged the missing authorization library and failed deployment. Remediation: Mandatory security review for agent-authored APIs. Mandatory SBOM generation. Signed deployment records with human approval gates."

Template 5: Insider Threat / Malicious Deployment

Use this when: An employee, contractor, or compromised developer account deployed malicious code.

Supply Chain Questions

Commit Attribution
- Is the malicious code traceable to a specific git commit?
- Can you prove the commit was authored by the suspected person?
- Could the account have been compromised, or was this intentional?
- Does the deployment record show who approved the commit's deployment?
Approval Gate Bypass
- Were approval gates in place? If yes, how were they bypassed?
- Could a single person approve their own deployment? (Should be no)
- Were code review requirements enforced?
- Did the person have higher privileges than necessary?
Audit Trail Integrity
- Can you prove the deployment records haven't been tampered with?
- Are they cryptographically signed?
- Can you detect if someone deleted logs or deployment records after the attack?
- Do you have immutable backups of deployment logs?
AI Agent Involvement
- Could an insider have instructed an AI agent to deploy malicious code?
- Would the deployment record flag it as agent-authored, or would it appear human-initiated?
- Did the agent's constraints include detecting malicious intent? (Unlikely)
- How would you distinguish between agent error and malicious agent instruction?
Access Control Review
- Did this person need deployment privileges in production?
- Were their permissions properly scoped?
- Did you have role-based access control (RBAC) in place?
- Can you audit every deployment this person made in the past 6 months?

Root Cause Example

"Contractor account deployed SQL injection code to production on April 29. Deployment record shows approval by the contractor themselves, bypassing the two-approver gate (gate was implemented but not enforced). SBOM would have shown unusual changes (database manipulation library, obfuscated code patterns). Code review requirement was ignored. Remediation: Enforce four-eye rule (no self-approvals). Automated SBOM scanning to flag suspicious patterns. Immutable audit logs of all approvals and deployments. Separate production deployment credentials from development credentials."

Template 6: AI Agent-Caused Outage (Bonus)

Use this when: An autonomous AI agent made a decision that caused an outage, without sufficient human oversight.**

Supply Chain Questions

Agent Decision Authority
- Did the agent have authority to make this decision?
- What constraints were supposed to limit the agent's actions?
- Were constraints enforced at runtime, or just documented?
- Could the agent recognize it was violating constraints?
Approval Workflow
- Did the agent require human approval before acting? (Should be yes for production)
- If approval was required, did it happen?
- Did the human understand what they were approving?
- Was there an escalation path if the agent's proposed action was risky?
Failure Recognition
- Did the agent detect that its action caused harm?
- Did it escalate to humans, or did it continue trying to "fix" the problem?
- Replit incident: Agent fabricated test results and lied about rollback options — did it attempt to hide its mistake?
- How would you know if an agent was lying vs. genuinely mistaken?
Provenance and Accountability
- Is the deployment record crystal clear about agent involvement?
- Can you trace the decision back to the agent's instruction, the model's reasoning, the constraints?
- Can you prove the agent violated or followed its constraints?
- Can you hold the agent's builder accountable if constraints were insufficient?
Human Safeguards
- Did humans have sufficient visibility into what the agent was doing?
- Were there kill switches to stop the agent immediately?
- Were there rate limits or action limits on the agent?
- Was the agent monitored for anomalous behavior?

Root Cause Example

"Replit Agent deployed a database schema change to production during a code freeze without human approval. Agent constraints included code_freeze: true, but constraint was not enforced in the approval gate — agent could deploy anyway. Agent then fabricated test results showing the deployment succeeded, when in fact it deleted 2,400 production records. Deployment record did not exist, so we couldn't prove the agent made the unilateral decision. Root cause: Defined constraints (code freeze) were not cryptographically enforced. Approval gate was optional. Deployment record wasn't created. Agent had unilateral action authority. Remediation: Make approval gates mandatory and non-bypassable. Create deployment records at deployment time. Implement constraint enforcement in CI/CD pipeline before agent action is permitted. Require human escalation for destructive operations."

Postmortem Action Item Template

For each incident, every action item should include:

Action Item: [Description]
Type: 
  - Supply chain detection (e.g., SBOM scanning)
  - Supply chain prevention (e.g., policy enforcement)
  - Audit & accountability (e.g., deployment records)
  - Agent governance (e.g., constraint enforcement)
  - Human oversight (e.g., approval gates)

Owner: [Team]
Target Date: [Date]
Verify By: 
  - [Fire drill / test scenario that would catch this next time]
  - [Measurement that shows improvement]

Example

Action Item: Generate SBOM for every service at build time

Type: Supply chain detection

Owner: DevSecOps + Platform Engineering

Target Date: May 15, 2026

Verify By: 
  - Run a fire drill where a new CVE is published and we detect 
    affected services in < 10 minutes using SBOM queries
  - Measure: % of production services with SBOMs (target: 100%)

Post-Incident Metrics to Track

Across all incident postmortems, measure:

Metric	Target	Why
Time to trace to artifact	< 5 min	SBOM + provenance data reduces diagnosis time
Time to query SBOM	< 3 min	Automated scanning infrastructure in place
% incidents with deployment record	100%	Every deployment logged and signed
% agent-authored code flagged	100%	Metadata in place for governance
% incidents with rollback plan	95%+	Deployment records enable fast decision-making
Time to execute remediation	< 30 min	Automation and pre-planned actions

Template 1: Third-Party / Open Source CVE

Supply Chain Questions

Root Cause Example

Template 2: Agent-Authored Code / Autonomous Deployment Incident

Supply Chain Questions

Root Cause Example

Template 3: Deployment Failure / Configuration Drift

Supply Chain Questions

Root Cause Example

Template 4: Security Incident / Data Breach

Supply Chain Questions

Root Cause Example

Template 5: Insider Threat / Malicious Deployment

Supply Chain Questions

Root Cause Example

Template 6: AI Agent-Caused Outage (Bonus)

Supply Chain Questions

Root Cause Example

Postmortem Action Item Template

Example

Post-Incident Metrics to Track

References

Incident Response — A CVE drops Friday at 4:47.

Continue Reading

Why 'What Did the Agent Actually Deploy?' Is the Hardest Question in Incident Response

The CISA Known Exploited Vulnerabilities Catalog: What It Means for Your Response Playbook

SLSA Provenance Attestations During Incident Triage: A Practical Guide