GitLab Configuration

Follow these steps to configure Cartography to sync GitLab organization, group, project, and related data.

Prerequisites

  1. A GitLab instance (self-hosted or gitlab.com)

  2. A GitLab personal access token with the required scopes (see below)

  3. The numeric ID of the GitLab organization (top-level group) to sync

Creating a GitLab Personal Access Token

  1. Navigate to your GitLab instance (e.g., https://gitlab.com or https://gitlab.example.com)

  2. Go to User SettingsAccess Tokens (or directly to https://your-gitlab-instance/-/user_settings/personal_access_tokens)

  3. Click Add new token

  4. Configure your token:

    • Token name: cartography-sync

    • Scopes: Select read_user, read_repository, and read_api

    • Expiration date: Set according to your security policy

  5. Click Create personal access token

  6. Important: Copy the token immediately - you won’t be able to see it again

Required Token Permissions

The token requires the following scopes:

Scope

Purpose

read_user

Access user profile information for group/project membership

read_repository

Access repository metadata, branches, and file contents

read_api

Access groups, projects, dependencies, language statistics, and group/project-level CI/CD runners

These scopes provide read-only access to:

  • Organizations (top-level groups) and nested groups

  • Projects and their metadata

  • Branches and default branch information

  • Dependency files (package.json, requirements.txt, etc.)

  • Dependencies extracted from dependency files

  • Project language statistics

  • Group-level and project-level CI/CD runners

Optional: instance-level runners

Listing instance-level (shared) runners via GET /api/v4/runners/all requires the token to belong to a GitLab administrator. If the token does not have admin privileges, the sync logs a warning and skips instance-level runners; group-level and project-level runners continue to be ingested normally.

CI config (.gitlab-ci.yml) ingestion

The CI config sync first calls GET /api/v4/projects/:id/ci/lint?dry_run=true to obtain the merged YAML (with all include: references expanded). Tokens generated from a user without Maintainer access on the project may not be allowed to use this endpoint — in that case the sync falls back to the raw .gitlab-ci.yml from the repository, which only requires read_repository. If both calls fail (404 / 403), the project is skipped (a warning is logged before the skip).

Finding Your Organization ID

The organization ID is the numeric ID of the top-level GitLab group you want to sync. To find it:

  1. Navigate to your group’s page on GitLab (e.g., https://gitlab.com/your-organization).

  2. Click the (three dots) menu in the top right of the group header and select Copy group ID.

  3. Alternatively, fetch it via the API:

    curl -H "PRIVATE-TOKEN: your-token" "https://gitlab.com/api/v4/groups/your-organization"
    

    The id field in the response is your organization ID.

Configuration

  1. Set your GitLab token in an environment variable:

    export GITLAB_TOKEN="glpat-your-token-here"
    
  2. Run Cartography with GitLab module:

    cartography \
      --neo4j-uri bolt://localhost:7687 \
      --selected-modules gitlab \
      --gitlab-organization-id 12345678 \
      --gitlab-token-env-var "GITLAB_TOKEN"
    

Configuration Options

Parameter

CLI Argument

Environment Variable

Required

Default

Description

GitLab URL

--gitlab-url

N/A

No

https://gitlab.com

The GitLab instance URL. Only set for self-hosted instances.

GitLab Token

--gitlab-token-env-var

Set by you

Yes

N/A

Name of the environment variable containing your GitLab personal access token

Organization ID

--gitlab-organization-id

N/A

Yes

N/A

The numeric ID of the top-level GitLab group (organization) to sync

Performance Considerations

  • Language detection: Fetches programming language statistics for all projects using parallel async requests (10 concurrent by default). Languages are stored as a JSON property on each project.

  • Large instances: For ~3000 projects, language fetching takes approximately 5-7 minutes

  • API rate limits: GitLab.com has rate limits (2000 requests/minute for authenticated users). Self-hosted instances may have different limits

Multi-Instance Support

Cartography supports syncing from multiple GitLab instances simultaneously. Repository and group IDs are prefixed with the GitLab instance URL to prevent collisions:

https://gitlab.com/projects/12345
https://gitlab.example.com/projects/12345

Both can exist in the same Neo4j database without conflicts.

Example: Self-Hosted GitLab

export GITLAB_TOKEN="glpat-abc123xyz"

cartography \
  --neo4j-uri bolt://localhost:7687 \
  --selected-modules gitlab \
  --gitlab-url "https://gitlab.example.com" \
  --gitlab-organization-id 12345678 \
  --gitlab-token-env-var "GITLAB_TOKEN"

Troubleshooting

Connection timeout:

  • Default timeout is 60 seconds

  • For slow GitLab instances, the sync may take longer during language detection

  • Check GitLab instance health if repeated timeouts occur

Missing language data:

  • Some projects may not have language statistics available (empty repos, binary-only repos)

  • Errors fetching languages for individual projects are logged as warnings but don’t stop the sync

Missing dependency data:

  • Dependency scanning requires projects to have supported manifest files (package.json, requirements.txt, etc.)

  • The GitLab Dependency Scanning feature must be enabled for the project

Permission errors:

  • Ensure your token has all required scopes: read_user, read_repository, read_api

  • Verify the token hasn’t expired

  • Check that the GitLab user has access to the organization and projects you want to sync

Organization not found:

  • Verify the --gitlab-organization-id is the correct numeric ID (not the group path)

  • Ensure the token’s user has access to the organization