GitLab Configuration¶
Follow these steps to configure Cartography to sync GitLab organization, group, project, and related data.
Prerequisites¶
A GitLab instance (self-hosted or gitlab.com)
A GitLab personal access token with the required scopes (see below)
The numeric ID of the GitLab organization (top-level group) to sync
Creating a GitLab Personal Access Token¶
Navigate to your GitLab instance (e.g.,
https://gitlab.comorhttps://gitlab.example.com)Go to User Settings → Access Tokens (or directly to
https://your-gitlab-instance/-/user_settings/personal_access_tokens)Click Add new token
Configure your token:
Token name:
cartography-syncScopes: Select
read_user,read_repository, andread_apiExpiration date: Set according to your security policy
Click Create personal access token
Important: Copy the token immediately - you won’t be able to see it again
Required Token Permissions¶
The token requires the following scopes:
Scope |
Purpose |
|---|---|
|
Access user profile information for group/project membership |
|
Access repository metadata, branches, and file contents |
|
Access groups, projects, dependencies, language statistics, and group/project-level CI/CD runners |
These scopes provide read-only access to:
Organizations (top-level groups) and nested groups
Projects and their metadata
Branches and default branch information
Dependency files (package.json, requirements.txt, etc.)
Dependencies extracted from dependency files
Project language statistics
Group-level and project-level CI/CD runners
Optional: instance-level runners¶
Listing instance-level (shared) runners via GET /api/v4/runners/all requires the token to belong to a GitLab administrator. If the token does not have admin privileges, the sync logs a warning and skips instance-level runners; group-level and project-level runners continue to be ingested normally.
CI config (.gitlab-ci.yml) ingestion¶
The CI config sync first calls GET /api/v4/projects/:id/ci/lint?dry_run=true to obtain the merged YAML (with all include: references expanded). Tokens generated from a user without Maintainer access on the project may not be allowed to use this endpoint — in that case the sync falls back to the raw .gitlab-ci.yml from the repository, which only requires read_repository. If both calls fail (404 / 403), the project is skipped (a warning is logged before the skip).
Finding Your Organization ID¶
The organization ID is the numeric ID of the top-level GitLab group you want to sync. To find it:
Navigate to your group’s page on GitLab (e.g.,
https://gitlab.com/your-organization).Click the ⋮ (three dots) menu in the top right of the group header and select Copy group ID.
Alternatively, fetch it via the API:
curl -H "PRIVATE-TOKEN: your-token" "https://gitlab.com/api/v4/groups/your-organization"The
idfield in the response is your organization ID.
Configuration¶
Set your GitLab token in an environment variable:
export GITLAB_TOKEN="glpat-your-token-here"Run Cartography with GitLab module:
cartography \ --neo4j-uri bolt://localhost:7687 \ --selected-modules gitlab \ --gitlab-organization-id 12345678 \ --gitlab-token-env-var "GITLAB_TOKEN"
Configuration Options¶
Parameter |
CLI Argument |
Environment Variable |
Required |
Default |
Description |
|---|---|---|---|---|---|
GitLab URL |
|
N/A |
No |
|
The GitLab instance URL. Only set for self-hosted instances. |
GitLab Token |
|
Set by you |
Yes |
N/A |
Name of the environment variable containing your GitLab personal access token |
Organization ID |
|
N/A |
Yes |
N/A |
The numeric ID of the top-level GitLab group (organization) to sync |
Performance Considerations¶
Language detection: Fetches programming language statistics for all projects using parallel async requests (10 concurrent by default). Languages are stored as a JSON property on each project.
Large instances: For ~3000 projects, language fetching takes approximately 5-7 minutes
API rate limits: GitLab.com has rate limits (2000 requests/minute for authenticated users). Self-hosted instances may have different limits
Multi-Instance Support¶
Cartography supports syncing from multiple GitLab instances simultaneously. Repository and group IDs are prefixed with the GitLab instance URL to prevent collisions:
https://gitlab.com/projects/12345
https://gitlab.example.com/projects/12345
Both can exist in the same Neo4j database without conflicts.
Example: Self-Hosted GitLab¶
export GITLAB_TOKEN="glpat-abc123xyz"
cartography \
--neo4j-uri bolt://localhost:7687 \
--selected-modules gitlab \
--gitlab-url "https://gitlab.example.com" \
--gitlab-organization-id 12345678 \
--gitlab-token-env-var "GITLAB_TOKEN"
Troubleshooting¶
Connection timeout:
Default timeout is 60 seconds
For slow GitLab instances, the sync may take longer during language detection
Check GitLab instance health if repeated timeouts occur
Missing language data:
Some projects may not have language statistics available (empty repos, binary-only repos)
Errors fetching languages for individual projects are logged as warnings but don’t stop the sync
Missing dependency data:
Dependency scanning requires projects to have supported manifest files (package.json, requirements.txt, etc.)
The GitLab Dependency Scanning feature must be enabled for the project
Permission errors:
Ensure your token has all required scopes:
read_user,read_repository,read_apiVerify the token hasn’t expired
Check that the GitLab user has access to the organization and projects you want to sync
Organization not found:
Verify the
--gitlab-organization-idis the correct numeric ID (not the group path)Ensure the token’s user has access to the organization