SBOM Diff for Container Updates Authored by Coding Agents

A coding agent opens a PR titled "bump base image to fix CVE-2026-0421." The diff is six lines: one FROM line in the Dockerfile, four lines in package-lock.json, one line in the README. The CI build passes. A reviewer skims the PR and clicks merge.

What that PR actually changed: 47 transitive dependencies, 3 of them across major versions, including a swap of node-fetch for the rewritten undici HTTP stack. Two new Apache-2.0 licenses entered the dependency graph. One package picked up a critical CVE that did not exist when the agent ran its check.

The diff a human read had six lines. The diff an attacker would care about had hundreds. SBOM diffing closes that gap.

The Premise

Every container build emits a CycloneDX SBOM. Every PR that changes anything that could affect dependencies — Dockerfile, lockfiles, build scripts — should produce two SBOMs (before and after) and post the diff as a PR comment. The reviewer reads the diff, not the lockfile, because the lockfile is unreadable and the SBOM diff is exactly the audit-ready summary needed.

This is doubly important when the PR author is an agent. Agents are good at producing valid lockfiles. They are not good at predicting transitive consequences. The SBOM diff is the safety net.

Step 1: Generate SBOMs Before and After

Use Syft (or cdxgen) in CI. The trick is producing two SBOMs from the same commit graph: one from the PR's base, one from the PR's head.

# .github/workflows/sbom-diff.yml
name: SBOM diff
on:
  pull_request:
    paths:
      - 'Dockerfile*'
      - '**/package-lock.json'
      - '**/yarn.lock'
      - '**/pnpm-lock.yaml'
      - '**/requirements*.txt'
      - '**/Pipfile.lock'
      - '**/go.sum'
      - '**/Cargo.lock'

jobs:
  sbom-diff:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Build base image
        run: |
          git checkout ${{ github.event.pull_request.base.sha }}
          docker build -t app:base .

      - name: Build head image
        run: |
          git checkout ${{ github.event.pull_request.head.sha }}
          docker build -t app:head .

      - name: Generate SBOMs
        run: |
          syft app:base -o cyclonedx-json > base.sbom.json
          syft app:head -o cyclonedx-json > head.sbom.json

      - name: Diff SBOMs
        run: |
          python3 scripts/sbom-diff.py base.sbom.json head.sbom.json > diff.md

      - name: Comment on PR
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          path: diff.md

The build step is expensive — often 60 to 120 seconds per image. Cache aggressively (BuildKit layer cache, registry-backed cache exporters), and only run this workflow when the trigger paths change.

Step 2: A CycloneDX Diff That a Human Can Read

The shape of a useful diff comment:

Net new packages (with version, license, source)
Removed packages
Version changes (with semver delta)
License changes (highlighted: any new copyleft license is a hard stop)
New CVEs introduced (joined against an OSV or NVD lookup)
Total transitive count delta

The script:

#!/usr/bin/env python3
# scripts/sbom-diff.py
import json, sys, subprocess
from collections import defaultdict

def load(p):
    return {c["purl"]: c for c in json.load(open(p))["components"] if c.get("purl")}

def osv_lookup(purl):
    # Minimal — production version uses the OSV batch API.
    # The purl already encodes the version (pkg:npm/[email protected]), so do
    # NOT pass a separate `version` field — the OSV /v1/query schema
    # accepts either {"package": {"purl": "..."}} OR
    # {"package": {"name": ..., "ecosystem": ...}, "version": ...},
    # not both. curl -d defaults to form-urlencoded; OSV requires JSON,
    # so set Content-Type explicitly.
    r = subprocess.run(
        ["curl", "-sS", "https://api.osv.dev/v1/query",
         "-H", "Content-Type: application/json",
         "-d", json.dumps({"package": {"purl": purl}})],
        capture_output=True, text=True)
    return json.loads(r.stdout).get("vulns", [])

base = load(sys.argv[1])
head = load(sys.argv[2])

added   = {p: head[p] for p in head if p not in base}
removed = {p: base[p] for p in base if p not in head}
changed = {p: (base[p], head[p]) for p in base
           if p in head and base[p].get("version") != head[p].get("version")}

print("## SBOM diff (CycloneDX)")
print(f"- Components before: **{len(base)}**, after: **{len(head)}**, "
      f"net change: **{len(head) - len(base):+d}**\n")

if added:
    print(f"### Added ({len(added)})")
    print("| Package | Version | License |")
    print("|---|---|---|")
    for p, c in sorted(added.items()):
        lic = ",".join(l.get("license", {}).get("id", "?")
                       for l in c.get("licenses", [])) or "?"
        print(f"| `{c['name']}` | `{c.get('version','?')}` | {lic} |")
    print()

if removed:
    print(f"### Removed ({len(removed)})")
    for p in sorted(removed):
        print(f"- `{removed[p]['name']}@{removed[p].get('version','?')}`")
    print()

if changed:
    print(f"### Version changes ({len(changed)})")
    print("| Package | Before | After |")
    print("|---|---|---|")
    for p, (b, h) in sorted(changed.items()):
        print(f"| `{b['name']}` | `{b.get('version')}` | `{h.get('version')}` |")
    print()

# License hard stop
copyleft = {"GPL-2.0", "GPL-3.0", "AGPL-3.0", "LGPL-3.0"}
new_copyleft = []
for c in added.values():
    for l in c.get("licenses", []):
        if l.get("license", {}).get("id") in copyleft:
            new_copyleft.append((c["name"], l["license"]["id"]))

if new_copyleft:
    print("### License alert")
    print("New copyleft licenses introduced — review required:")
    for name, lic in new_copyleft:
        print(f"- `{name}` &rarr; **{lic}**")
    print()

# CVE check on added or changed
print("### Vulnerability scan (added or upgraded packages)")
flagged_vulns = []  # list of {name, version, id, severity, cvss} dicts;
                    # used by the hard-fail pass below

def osv_severity(v):
    # OSV's portable severity lives in the TOP-LEVEL `severity` array as
    # CVSS scoring objects: [{"type": "CVSS_V3", "score": "<vector>"}, ...]
    # (or per-affected-package severity arrays). It is a CVSS *vector
    # string*, not a "CRITICAL"/"HIGH" label. `database_specific.severity`
    # is database-defined and OPTIONAL — many records omit it, so reading
    # it as the source of truth silently mislabels most vulns "UNKNOWN".
    #
    # We derive a numeric base score from the CVSS vector and bucket it
    # per the FIRST.org CVSS v3.1 qualitative ranges. If no CVSS vector is
    # present we fall back to database_specific.severity, then UNKNOWN —
    # and the caller should treat UNKNOWN as "needs manual triage", not
    # "safe".
    try:
        from cvss import CVSS3  # pip install cvss
    except ImportError:
        CVSS3 = None
    for s in v.get("severity", []):
        score = s.get("score", "")
        if CVSS3 and score.startswith("CVSS:3"):
            base = CVSS3(score).scores()[0]
            if base == 0.0:   return "NONE", base
            if base < 4.0:    return "LOW", base
            if base < 7.0:    return "MEDIUM", base
            if base < 9.0:    return "HIGH", base
            return "CRITICAL", base
    # Fallbacks when no parseable CVSS vector is attached.
    label = v.get("database_specific", {}).get("severity")
    return (label or "UNKNOWN"), None

for purl, c in {**added, **{p: h for p, (_, h) in changed.items()}}.items():
    vulns = osv_lookup(purl)
    for v in vulns:
        sev, base = osv_severity(v)
        score_note = f" (CVSS {base})" if base is not None else ""
        print(f"- **{sev}**{score_note} `{c['name']}@{c['version']}` &rarr; "
              f"[{v['id']}](https://osv.dev/vulnerability/{v['id']})")
        flagged_vulns.append({
            "name": c["name"],
            "version": c.get("version", "?"),
            "id": v["id"],
            "severity": sev,
        })
if not flagged_vulns:
    print("No known vulnerabilities in added or upgraded packages.")

The output is a markdown comment that fits in a PR scroll. A reviewer can see in five seconds: how many packages, which licenses, which CVEs.

Step 3: Tag the Diff with Agent Provenance

When the PR author is an agent, the diff comment should make that obvious. Adapt the comment header:

import os, subprocess

def is_agent_pr():
    head_sha = os.environ["GITHUB_SHA"]
    msg = subprocess.run(
        ["git", "log", "-1", "--format=%B", head_sha],
        capture_output=True, text=True).stdout
    return any(s in msg.lower() for s in
               ["co-authored-by: claude",
                "co-authored-by: copilot",
                "co-authored-by: cursor",
                "co-authored-by: codex"])

if is_agent_pr():
    print("> :robot: This PR was authored or co-authored by a coding agent. "
          "Review the diff below carefully before merging.")
    print()

The visual cue is small but powerful. Reviewers shift modes when they see the agent badge — and there is good reason to: an agent reasons about the change it intended, not the change the package manager actually performed, so agent-authored dependency PRs are a natural place for transitive surprises to slip in unnoticed.

Step 4: Block on Hard Rules, Warn on Soft Ones

Some classes of SBOM diff should never reach a human reviewer:

New GPL or AGPL dependencies in a service that ships proprietary code.
New dependencies from typosquatted-looking namespaces (reqeusts, lodash-utils, etc.).
Packages with zero stars, zero downloads, or first publication within 7 days.
Critical CVEs in any added or upgraded package.

These should fail the CI check, not just comment on the PR. Soft warnings — major version bumps, new transitive count over a threshold — should comment but not block.

# Continues from the diff script above — flagged_vulns is the list
# populated during the OSV lookup pass (each entry is a dict with
# keys: name, version, id, severity).
HARD_FAIL = False

if new_copyleft:
    HARD_FAIL = True
    print(":x: HARD FAIL: copyleft license introduced.\n")

# Critical CVE check. `severity` is the bucketed label derived from the
# CVSS base score in osv_severity() above. Records with no parseable CVSS
# vector surface as UNKNOWN — list them for manual triage rather than
# silently passing them.
critical_vulns = [v for v in flagged_vulns if v["severity"] == "CRITICAL"]
unknown_vulns = [v for v in flagged_vulns if v["severity"] == "UNKNOWN"]
if critical_vulns:
    HARD_FAIL = True
    print(":x: HARD FAIL: critical CVE in added or upgraded package.\n")
    for v in critical_vulns:
        print(f"  - `{v['name']}@{v['version']}` → {v['id']}")
if unknown_vulns:
    print(":warning: Vulnerabilities without a parseable CVSS score — "
          "triage manually, do not assume safe:\n")
    for v in unknown_vulns:
        print(f"  - `{v['name']}@{v['version']}` → {v['id']}")

sys.exit(1 if HARD_FAIL else 0)

A failed hard rule means the agent's PR cannot merge. The agent gets the failure in its next iteration and either chooses a different package, downgrades to a non-copyleft version, or escalates to a human. All three are correct outcomes; merging the bad change is not.

Step 5: Persist the SBOMs as Build Artifacts

The PR comment is for review-time. The signed SBOM is for audit-time. After the PR merges, the head image's SBOM should be:

Attached and signed in one step as a CycloneDX attestation (cosign attest --predicate sbom.json --type cyclonedx), which uploads it to the registry as a signed OCI artifact. (cosign attach sbom is deprecated — it attached an unsigned SBOM, so there was no way to verify authenticity; use cosign attest / cosign download attestation instead.)
Verifiable in one step via cosign verify-attestation --type cyclonedx.
Referenced in the SLSA Build provenance attestation.

That way, six months from now when CISA publishes a new KEV entry, you can rerun the SBOM query against every production image without rebuilding (see tracing CVE to production artifact).

Why This Matters Specifically for Agent PRs

Three observations from teams running this workflow:

Agents commit lockfile changes that humans would reject. Without a diff comment, no one notices.
Agents follow the path of least resistance to "build passes." They do not weigh license, maintainer health, or supply-chain risk. The SBOM diff makes those weights visible.
Agent PR volume scales without bounded review capacity. Automated hard-stop rules let humans focus on the diffs that actually need judgement.

The cost of running the SBOM diff workflow is one extra CI minute per relevant PR. The cost of not running it is the post-merge incident that the diff would have caught.

Rollout Sequence

Land the SBOM generation step first, with no diff and no PR comment. Verify the SBOMs are well-formed.
Add the diff script in comment-only mode. Watch a week of comments. Tune the thresholds.
Add the agent detection. Confirm the badge appears on the right PRs.
Introduce hard-stop rules one at a time, beginning with the lowest false-positive (copyleft license).
Wire the post-merge SBOM signing into the existing build provenance pipeline.

What you end up with is a build pipeline where every container update — agent or human — gets a transparent, signed, audit-ready record of exactly which dependencies entered, left, or changed. The agent gets faster feedback. The reviewer gets less surface to read. The auditor gets a complete trail. Nobody loses.

The Premise

Step 1: Generate SBOMs Before and After

Step 2: A CycloneDX Diff That a Human Can Read

Step 3: Tag the Diff with Agent Provenance

Step 4: Block on Hard Rules, Warn on Soft Ones

Step 5: Persist the SBOMs as Build Artifacts

Why This Matters Specifically for Agent PRs

Rollout Sequence

References

Software Compliance — Your last compliance vendor

Continue Reading

Verifying What's Actually Running in Production: Build Diff vs Runtime Reality

Container Provenance for AI-Generated Builds: SLSA Attestations When the Source Is Half Human, Half Agent

Pinning Base Images When AI Agents Author Dockerfiles