GitLab Configuration

Follow these steps to configure Cartography to sync GitLab organization, group, project, and related data.

Prerequisites

  1. A GitLab instance (self-hosted or gitlab.com)

  2. A GitLab personal access token with the required scopes (see below)

  3. The numeric ID of the GitLab organization (top-level group) to sync

Creating a GitLab Personal Access Token

  1. Navigate to your GitLab instance (e.g., https://gitlab.com or https://gitlab.example.com)

  2. Go to User SettingsAccess Tokens (or directly to https://your-gitlab-instance/-/user_settings/personal_access_tokens)

  3. Click Add new token

  4. Configure your token:

    • Token name: cartography-sync

    • Scopes: Select read_user, read_repository, and read_api

    • Expiration date: Set according to your security policy

  5. Click Create personal access token

  6. Important: Copy the token immediately - you won’t be able to see it again

Required Token Permissions

The token requires the following scopes:

Scope

Purpose

read_user

Access user profile information for group/project membership

read_repository

Access repository metadata, branches, and file contents

read_api

Access groups, projects, dependency scanning artifacts, language statistics, and group/project-level CI/CD runners

These scopes provide read-only access to:

  • Organizations (top-level groups) and nested groups

  • Projects and their metadata

  • Branches and default branch information

  • Dependency files (package.json, requirements.txt, etc.)

  • Dependency files and dependencies extracted from GitLab dependency scanning artifacts

  • Project language statistics

  • Group-level and project-level CI/CD runners

Dependency scanning artifact access

Cartography ingests GitLab dependencies from CycloneDX SBOM artifacts produced by GitLab dependency scanning jobs, not from GitLab’s dependency list API. The token scopes above are required, and the token’s user must also be allowed to download CI job artifacts for each project.

GitLab projects can restrict artifact downloads with artifacts:access. If a dependency scanning job uses artifacts:access: developer or artifacts:access: maintainer, a token that belongs to a Reporter-level user can receive 403 Forbidden when Cartography downloads the job artifacts. Grant the token’s user a project role that satisfies the artifact access policy.

Dependency scanning jobs must produce CycloneDX SBOM artifacts, such as gl-sbom-*.cdx.json, gl-sbom.cdx.json, or gzipped equivalents. GitLab documents these SBOMs as job artifacts of the dependency scanning job. Cartography can only ingest artifacts that GitLab still serves, so expired or deleted job artifacts cannot be recovered during sync.

Optional: instance-level runners

Listing instance-level (shared) runners via GET /api/v4/runners/all requires the token to belong to a GitLab administrator. If the token does not have admin privileges, the sync logs a warning and skips instance-level runners; group-level and project-level runners continue to be ingested normally.

CI config (.gitlab-ci.yml) ingestion

The CI config sync first calls GET /api/v4/projects/:id/ci/lint?dry_run=true to obtain the merged YAML (with all include: references expanded). Tokens generated from a user without Maintainer access on the project may not be allowed to use this endpoint — in that case the sync falls back to the raw .gitlab-ci.yml from the repository, which only requires read_repository. If both calls fail (404 / 403), the project is skipped (a warning is logged before the skip).

Finding Your Organization ID

The organization ID is the numeric ID of the top-level GitLab group you want to sync. To find it:

  1. Navigate to your group’s page on GitLab (e.g., https://gitlab.com/your-organization).

  2. Click the (three dots) menu in the top right of the group header and select Copy group ID.

  3. Alternatively, fetch it via the API:

    curl -H "PRIVATE-TOKEN: your-token" "https://gitlab.com/api/v4/groups/your-organization"
    

    The id field in the response is your organization ID.

Configuration

  1. Set your GitLab token in an environment variable:

    export GITLAB_TOKEN="glpat-your-token-here"
    
  2. Run Cartography with GitLab module:

    cartography \
      --neo4j-uri bolt://localhost:7687 \
      --selected-modules gitlab \
      --gitlab-organization-id 12345678 \
      --gitlab-token-env-var "GITLAB_TOKEN"
    

Configuration Options

Parameter

CLI Argument

Environment Variable

Required

Default

Description

GitLab URL

--gitlab-url

N/A

No

https://gitlab.com

The GitLab instance URL. Only set for self-hosted instances.

GitLab Token

--gitlab-token-env-var

Set by you

Yes

N/A

Name of the environment variable containing your GitLab personal access token

Organization ID

--gitlab-organization-id

N/A

Yes

N/A

The numeric ID of the top-level GitLab group (organization) to sync

Performance Considerations

  • Language detection: Fetches programming language statistics for all projects using parallel async requests (10 concurrent by default). Languages are stored as a JSON property on each project.

  • Large instances: For ~3000 projects, language fetching takes approximately 5-7 minutes

  • API rate limits: GitLab.com has rate limits (2000 requests/minute for authenticated users). Self-hosted instances may have different limits

Multi-Instance Support

Cartography supports syncing from multiple GitLab instances simultaneously. Repository and group IDs are prefixed with the GitLab instance URL to prevent collisions:

https://gitlab.com/projects/12345
https://gitlab.example.com/projects/12345

Both can exist in the same Neo4j database without conflicts.

Example: Self-Hosted GitLab

export GITLAB_TOKEN="glpat-abc123xyz"

cartography \
  --neo4j-uri bolt://localhost:7687 \
  --selected-modules gitlab \
  --gitlab-url "https://gitlab.example.com" \
  --gitlab-organization-id 12345678 \
  --gitlab-token-env-var "GITLAB_TOKEN"

Troubleshooting

Connection timeout:

  • Default timeout is 60 seconds

  • For slow GitLab instances, the sync may take longer during language detection

  • Check GitLab instance health if repeated timeouts occur

Missing language data:

  • Some projects may not have language statistics available (empty repos, binary-only repos)

  • Errors fetching languages for individual projects are logged as warnings but don’t stop the sync

Missing dependency data:

  • Dependency scanning requires projects to have supported manifest files (package.json, requirements.txt, etc.)

  • The GitLab Dependency Scanning feature must be enabled for the project

Permission errors:

  • Ensure your token has all required scopes: read_user, read_repository, read_api

  • Verify the token hasn’t expired

  • Check that the GitLab user has access to the organization and projects you want to sync

Organization not found:

  • Verify the --gitlab-organization-id is the correct numeric ID (not the group path)

  • Ensure the token’s user has access to the organization