Automatically run SAST tools on build

Use Chalk to automatically run SAST tools such as Semgrep on every build

Summary

Static Application Security Testing (SAST) is a type of security testing that is run on the source code, byte code, or binary code of an application without running the application itself. One of the most popular SAST tools is the open source tool (Semgrep)[https://semgrep.dev/docs/), which will scan your source code for vulnerabilities, secrets leakage, and other issues according to a set of rules for each supported language, outputting a list of results that can be addressed by the security team. Chalk supports Semgrep integration out of the box, and other SAST tools can be added via the tools plugin.

This how-to uses Chalk to automate running Semgrep on build in three steps:

Run containers to receive and browse chalk data.
Configure Chalk to run Semgrep on build.
Build software using Docker that automatically generates SAST reports.

Steps

Before you start

You should have a working installation of Chalk. The easiest way to get Chalk is to download a pre-built binary from our release page. It's a self-contained binary with no dependencies to install.

Additionally, you will need docker, as the reporting web service will be installed by running two docker containers: one for collecting logs, and the other to give us a web frontend to browse them.

Step 1: Run containers to receive and browse chalk data

The SAST report generated by Semgrep can be directly inserted into the artifact, but as SAST reports tend to be very large, we recommend capturing the data at build-time and sending it off to the Chalk API server.

For this guide, we will be running the server locally. If you do not already have it set up, follow the instructions to do so at How-to run containers to browse Chalk data locally.

Step 2: Configure Chalk to run Semgrep on build

Chalk can load remote modules to reconfigure functionality. For this guide, we will be loading the module compliance_sast from chalkdust, our module repository.

To load the module, run:

./chalk load https://chalkdust.io/compliance_sast.c4m

During the load operation, you will be prompted to enter the IP address for the server we set up in the previous step. The default will be your personal IP address. Generally, the default should work just fine for the local testing server.

After accepting the IP address, Chalk will prompt you one more time to finish the setup, and then you should see confirmation that the configuration has been successfully loaded, ex:

info:  https://chalkdust.io/compliance_sast.c4m: Validating configuration.
info:  https://chalkdust.io/compliance_sast.c4m: Configuration successfully validated.
info:  Configuration replaced in binary: /home/liming/workspace/chalk/chalk

The resulting binary will be fully configured, and can be moved or copied to other machines withouth losing the configuration. The binaries will continue posting reports to the API server as long as your server container stays up, and as long as the IP address is correct.

There's nothing else you need to do to keep this new configuration -- Chalk rewrites data fields in its own binary when saving the configuration changes.

If you wish to additionally embed the full SAST report into artifacts, you can. This is not recommended as SAST reports can be very large, but may be useful in some situations such as testing.

If you'd like to add this capability, then run:

./chalk load https://chalkdust.io/embed_sast.c4m

As above, this will add to your current configuration. You can always check what configuration has been loaded by running:

./chalk dump

Step 3: Build software

Let's pick an off-the-shelf project and treat it like we're building part of it in a build pipeline. We'll use a sample Docker project called wordsmith.

To clone and build the wordsmith project, run:

git clone https://github.com/dockersamples/wordsmith
cd wordsmith/api
chalk docker build -t localhost:5000/wordsmith:latest .

You'll see Docker run normally (it'll take a minute or so). Once Docker is finished, you'll see some summary info from chalk on your command line in JSON format, including the contents of the Dockerfile used.

The terminal report (displayed after the docker ouput) should look like this:

[
  {
    "_OPERATION": "build",
    "_DATETIME": "2023-11-15T21:03:30.575-05:00",
    "_CHALKS": [
      {
        "CHALK_ID": "7ZA86R-WCEA-3Q5R-A5Z3NX",
        "DOCKERFILE_PATH": "/home/liming/workspace/wordsmith/api/Dockerfile",
        "DOCKER_FILE": "# Build stage\nFROM --platform=${BUILDPLATFORM} maven:3-amazoncorretto-20 as build\nWORKDIR /usr/local/app\nCOPY pom.xml .\nRUN mvn verify -DskipTests --fail-never\nCOPY src ./src\nRUN mvn verify\n\n# Run stage\nFROM --platform=${TARGETPLATFORM} amazoncorretto:20\nWORKDIR /usr/local/app\nCOPY --from=build /usr/local/app/target .\nENTRYPOINT [\"java\", \"-Xmx8m\", \"-Xms8m\", \"-jar\", \"/usr/local/app/words.jar\"]\nEXPOSE 8080\n",
        "DOCKER_LABELS": {},
        "DOCKER_TAGS": [
          "localhost:5000/wordsmith:latest"
        ],
        "ORIGIN_URI": "https://github.com/dockersamples/wordsmith",
        "COMMIT_ID": "313073ac55c1d6f8ff5ac9efeeb93e6f03efdbea",
        "BRANCH": "main",
        "CHALK_VERSION": "0.2.2",
        "METADATA_ID": "46CKQJ-G8HX-1PW4-N6X1RN",
        "_VIRTUAL": false,
        "_IMAGE_ID": "e068119cdb8e2bc61d664672f5e97c7ce58e7f90b1b81d26289e4fb83e008437",
        "_REPO_TAGS": [
          "localhost:5000/wordsmith:latest"
        ],
        "_CURRENT_HASH": "e068119cdb8e2bc61d664672f5e97c7ce58e7f90b1b81d26289e4fb83e008437"
    [...]

To check that the container pushed has been successfully chalked, we can run:

chalk extract localhost:5000/wordsmith:latest

The terminal report for the extract operation should look like this:

[
  {
    "_OPERATION": "extract",
    "_DATETIME": "2023-11-15T21:04:28.514-05:00",
    "_CHALKS": [
      {
        "_OP_ARTIFACT_TYPE": "Docker Image",
        "_IMAGE_ID": "e068119cdb8e2bc61d664672f5e97c7ce58e7f90b1b81d26289e4fb83e008437",
        "_REPO_TAGS": [
          "localhost:5000/wordsmith:latest"
        ],
        "_CURRENT_HASH": "e068119cdb8e2bc61d664672f5e97c7ce58e7f90b1b81d26289e4fb83e008437",
        "CHALK_ID": "7ZA86R-WCEA-3Q5R-A5Z3NX",
        "CHALK_VERSION": "0.2.2",
        "METADATA_ID": "46CKQJ-G8HX-1PW4-N6X1RN"
      }
    [...]

In particular, note that the CHALK_ID for the build and extract operations are the same -- this ID is how we will track the container. Let's go to the SQLite database and check the chalk mark with the associated CHALK_ID. We can do this by running the following query under the chalks tab:

SELECT *
FROM "chalks"
WHERE chalk_id == "7ZA86R-WCEA-3Q5R-A5Z3NX"

We should see something like this:

Output 1

Checking the raw chalk mark, we can see the SAST data has been embedded:

{
  "CHALK_ID": "7ZA86R-WCEA-3Q5R-A5Z3NX",
  [...]
  "SAST": {
    "semgrep": {
      "$schema": "https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/schemas/sarif-schema-2.1.0.json",
      "runs": [
        {
          "invocations": [{ "executionSuccessful": true, "toolExecutionNotifications": [] }],
          "results": [
            {
              "fingerprints": {
                "matchBasedId/v1": "23d567180068397303e8395a08b5a9dcd08bb7606d48ec550df13ac7e992afc60d17c99e8e24f3e5465b2ca0a525de4b1938a3527dd06d6a87623ccd565a9052_0"
              },
              "locations": [
                {
                  "physicalLocation": {
                    "artifactLocation": { "uri": "src/main/java/Main.java", "uriBaseId": "%SRCROOT%" },
                    "region": {
                      "endColumn": 120,
                      "endLine": 26,
                      "snippet": {
                        "text": " try (ResultSet set = statement.executeQuery(\"SELECT word FROM \" + table + \" ORDER BY random() LIMIT 1\")) {"
                      },
                      "startColumn": 38,
                      "startLine": 26
                    }
                  }
                }
              ],
              "message": {
                "text": "Detected a formatted string in a SQL statement. This could lead to SQL injection if variables in the SQL statement are not properly sanitized. Use a prepared statements (java.sql.PreparedStatement) instead. You can obtain a PreparedStatement using 'connection.prepareStatement'."
              },
              "properties": {},
              "ruleId": "java.lang.security.audit.formatted-sql-string.formatted-sql-string"
            }
          ],
  [...]

If the image we have built here is run as a container, the chalk mark will be included in a chalk.json file in the root of the container file system. If the embed_sast.c4m configuration component has been loaded, the chalk.json file in the container should also have the Semgrep data embedded; if not, the file will only have a minimal chalk mark.

If there's ever any sort of condition that chalk cannot handle (e.g., if you move to a future docker upgrade without updating chalk, then use features that chalk doesn't understand), chalk will always make sure the original docker command gets run if the wrapped command does not exit successfully. This ensures that adding Chalk to a build pipeline will not break any existing workflows.

How-to Guides User Guide