Semgrep Configuration

Follow these steps to ingest Semgrep findings with Cartography.

  1. Create a token with Agent (CI) and Web API scopes Creating a SEMGREP_APP_TOKEN.

  2. Populate an environment variable with the secrets value of the token

  3. Pass the environment variable name to the --semgrep-app-token-env-var CLI arg.

In order to ingest Semgrep dependencies with Cartography, additional steps are needed:

  1. Determine which language ecosystems you’d like to ingest. See the full list of supported ecosystems in source code at cartography.intel.semgrep.dependencies.

  2. Pass the list of ecosystems as a comma-separated string (e.g. gomod,npm) to the --semgrep-dependency-ecosystems CLI arg.

OSS Semgrep Configuration

Cartography can also ingest Semgrep OSS SAST findings from JSON reports generated by the Semgrep CLI.

Use --semgrep-oss-source to point Cartography at a repository mapping YAML file. This mapping file provides the repository metadata that Semgrep OSS JSON does not include, along with the explicit report artifact(s) associated with each repository.

The repository mapping file must:

  • Be valid UTF-8 YAML.

  • Contain a top-level repositories list.

  • Give each repository entry provider, owner, repo, url, branch, and a non-empty reports list.

  • Use reports entries that each point to exactly one Semgrep OSS JSON artifact.

  • Each reports entry must point to exactly one Semgrep OSS JSON artifact for the repository it is nested under. For sharded or monorepo scans, list each generated JSON artifact separately under reports; a reports entry must not be a directory or object-store prefix containing multiple JSON files. Cartography treats one repository entry as the intended snapshot for that repository in the current run. If all listed reports for a repository are successfully processed, Cartography runs cleanup for stale OSS findings scoped to that repository URL. If any listed report for that repository fails to resolve, fails to parse, or is not Semgrep-shaped, Cartography skips cleanup for that repository to avoid deleting findings from an incomplete snapshot.

Example repository mapping file:

repositories:
  - provider: "github"
    owner: "simpsoncorp"
    repo: "sample_repo"
    url: "https://github.com/simpsoncorp/sample_repo"
    branch: "main"
    reports:
      - "/path/to/sample_repo-semgrep.json"
  - provider: "github"
    owner: "different-org"
    repo: "different-repo"
    url: "https://github.com/different-org/different-repo"
    branch: "main"
    reports:
      - "s3://security-artifacts/semgrep/different-repo/report-1.json"
      - "s3://security-artifacts/semgrep/different-repo/report-2.json"

Example command:

cartography \
  --neo4j-uri bolt://localhost:7687 \
  --selected-modules semgrep \
  --semgrep-oss-source /path/to/repository_mappings.yaml

To create FOUND_IN relationships for OSS findings, matching GitHubRepository nodes must already exist in the graph with id equal to the repository url declared in the mapping file.