Github Schema

GitHubRepository

Representation of a single GitHubRepository (repo) repository object. This node contains all data unique to the repo.

Field

Description

firstseen

Timestamp of when a sync job first created this node

lastupdated

Timestamp of the last time the node was updated

id

The GitHub repo id. These are not unique across GitHub instances, so are prepended with the API URL the id applies to

createdat

GitHub timestamp from when the repo was created

name

Name of the repo

fullname

Name of the organization and repo together

description

Text describing the repo

primarylanguage

The primary language used in the repo

homepage

The website used as a homepage for project information

defaultbranch

The default branch used by the repo, typically master

defaultbranchid

The unique identifier of the default branch

private

True if repo is private

disabled

True if repo is disabled

archived

True if repo is archived

locked

True if repo is locked

giturl

URL used to access the repo from git commandline

url

Web URL for viewing the repo

sshurl

URL for access the repo via SSH

updatedat

GitHub timestamp for last time repo was modified

Relationships

  • GitHubUsers or GitHubOrganizations own GitHubRepositories.

    (GitHubUser)-[OWNER]->(GitHubRepository)
    (GitHubOrganization)-[OWNER]->(GitHubRepository)
    
  • GitHubRepositories in an organization can have outside collaborators with different permissions, including ADMIN, WRITE, MAINTAIN, TRIAGE, and READ (Reference).

    (GitHubUser)-[:OUTSIDE_COLLAB_{ACTION}]->(GitHubRepository)
    
  • GitHubRepositories use ProgrammingLanguages

    (GitHubRepository)-[:LANGUAGE]->(ProgrammingLanguage)
    
  • GitHubRepositories have GitHubBranches

    (GitHubRepository)-[:BRANCH]->(GitHubBranch)
    
  • GitHubTeams can have various levels of access to GitHubRepositories.

    (GitHubTeam)-[ADMIN|READ|WRITE|TRIAGE|MAINTAIN]->(GitHubRepository)
    

GitHubOrganization

Representation of a single GitHubOrganization organization object. This node contains minimal data for the GitHub Organization.

Field

Description

firstseen

Timestamp of when a sync job first created this node

lastupdated

Timestamp of the last time the node was updated

id

The URL of the GitHub organization

username

Name of the organization

Relationships

  • GitHubOrganizations own GitHubRepositories.

    (GitHubOrganization)-[OWNER]->(GitHubRepository)
    
  • GitHubTeams are resources under GitHubOrganizations

    (GitHubOrganization)-[RESOURCE]->(GitHubTeam)
    
  • GitHubUsers are members of an organization. In some cases there may be a user who is “unaffiliated” with an org, for example if the user is an enterprise owner, but not member of, the org. Enterprise owners have complete control over the enterprise (i.e. they can manage all enterprise settings, members, and policies) yet may not show up on member lists of the GitHub org.

    (GitHubUser)-[MEMBER_OF|UNAFFILIATED]->(GitHubOrganization)
    

GitHubTeam

A GitHubTeam organization object.

Field

Description

firstseen

Timestamp of when a sync job first created this node

lastupdated

Timestamp of the last time the node was updated

id

The URL of the GitHub Team

name

The name (a.k.a URL slug) of the GitHub Team

description

Description of the GitHub team

Relationships

  • GitHubTeams can have various levels of access to GitHubRepositories.

    (GitHubTeam)-[ADMIN|READ|WRITE|TRIAGE|MAINTAIN]->(GitHubRepository)
    
  • GitHubTeams are resources under GitHubOrganizations

    (GitHubOrganization)-[RESOURCE]->(GitHubTeam)
    

GitHubUser

Representation of a single GitHubUser user object. This node contains minimal data for the GitHub User.

Field

Description

firstseen

Timestamp of when a sync job first created this node

lastupdated

Timestamp of the last time the node was updated

id

The URL of the GitHub user

username

Name of the user

fullname

The full name

has_2fa_enabled

Whether the user has 2-factor authentication enabled

role

Either ‘ADMIN’ (denoting that the user is an owner of a Github organization) or ‘MEMBER’

is_site_admin

Whether the user is a site admin

is_enterprise_owner

Whether the user is an enterprise owner

permission

Only present if the user is an outside collaborator of this repo. permission is either ADMIN, MAINTAIN, READ, TRIAGE, or WRITE (ref).

email

The user’s publicly visible profile email.

company

The user’s public profile company.

Relationships

  • GitHubUsers own GitHubRepositories.

    (GitHubUser)-[OWNER]->(GitHubRepository)
    
  • GitHubRepositories in an organization can have outside collaborators with different permissions, including ADMIN, WRITE, MAINTAIN, TRIAGE, and READ (Reference).

    (GitHubUser)-[:OUTSIDE_COLLAB_{ACTION}]->(GitHubRepository)
    
  • GitHubUsers are members of an organization. In some cases there may be a user who is “unaffiliated” with an org, for example if the user is an enterprise owner, but not member of, the org. Enterprise owners have complete control over the enterprise (i.e. they can manage all enterprise settings, members, and policies) yet may not show up on member lists of the GitHub org.

    (GitHubUser)-[MEMBER_OF|UNAFFILIATED]->(GitHubOrganization)
    

GitHubBranch

Representation of a single GitHubBranch ref object. This node contains minimal data for a repository branch.

Field

Description

firstseen

Timestamp of when a sync job first created this node

lastupdated

Timestamp of the last time the node was updated

id

The GitHub branch id. These are not unique across GitHub instances, so are prepended with the API URL the id applies to

name

Name of the branch

Relationships

  • GitHubRepositories have GitHubBranches.

    (GitHubBranch)<-[BRANCH]-(GitHubRepository)
    

ProgrammingLanguage

Representation of a single Programming Language language object. This node contains programming language information.

Field

Description

firstseen

Timestamp of when a sync job first created this node

lastupdated

Timestamp of the last time the node was updated

id

Language ids need not be tracked across instances, so defaults to the name

name

Name of the language

Relationships

  • GitHubRepositories use ProgrammingLanguages.

    (ProgrammingLanguage)<-[LANGUAGE]-(GitHubRepository)
    

Dependency::PythonLibrary

Representation of a Python library as listed in a requirements.txt or setup.cfg file. Within a setup.cfg file, cartography will load everything from install_requires, setup_requires, and extras_require.

Field

Description

id

The canonicalized name of the library. If the library was pinned in a requirements file using the == operator, then id has the form {canonical name}|{pinned_version}.

name

The canonicalized name of the library.

version

The exact version of the library. This field is only present if the library was pinned in a requirements file using the == operator.

Relationships

  • Software on Github repos can import Python libraries by optionally specifying a version number.

    (GitHubRepository)-[:REQUIRES{specifier}]->(PythonLibrary)
    
    • specifier: A string describing this library’s version e.g. “<4.0,>=3.0” or “==1.0.2”. This field is only present on the :REQUIRES edge if the repo’s requirements file provided a version pin.

  • A Python Dependency is affected by a SemgrepSCAFinding (optional)

    (:SemgrepSCAFinding)-[:AFFECTS]->(:PythonLibrary)