Ontology in Cartography

What is an Ontology?

An ontology is a structured representation of concepts and relationships within a given domain. It enables semantic modeling of knowledge, making it easier to organize, analyze, and query data. In Cartography, the ontology defines entity (node) types and their relationships, using semantic labels and abstract nodes for better interoperability and extensibility.

A key benefit of this approach is that it enables cross-module queries and the export of data in a unified format. By providing a common semantic layer, different modules can interact seamlessly, and data from various sources can be normalized and exported consistently.

Implementation in Cartography

Cartography implements ontology using two main concepts:

  • Semantic Label: Each graph node can have one or more semantic labels describing its nature (e.g., UserAccount, DNSRecord).

  • Abstract Nodes: Some nodes serve as abstract concepts to group similar entities or define common behaviors. This allows logic to be factored and ensures model consistency.

How Ontology Works in Cartography

The intel.ontology module in Cartography manages ontology logic. It allows:

  • Loading ontology definitions from JSON files

  • Validating the consistency of entities and relationships

  • Ensuring mapping between collected data and the defined semantic model

The module provides functions to traverse, enrich, and leverage the ontology during data ingestion. It plays a key role in normalizing entities from heterogeneous sources.

By default, nodes are created in the ontology based on data observed in various modules. For some node, such as User, Device, etc., you can specify “sources of truth” modules that will exclusively create those nodes. This allows for more controlled and accurate representation of certain entities.

Example: User Nodes and Source of Truth

If you set the --ontology-users-source parameter to duo, then a User node will be created for every account found in Duo. In contrast, for other integrations like Tailscale, only existing User nodes (those created by the source of truth) will be linked to Tailscale accounts. No new User nodes will be created from Tailscale data alone.

Ontology Field Mappings

Ontology mappings are defined in Python using the OntologyFieldMapping class, which maps fields from source nodes to ontology nodes. Each mapping specifies:

  • ontology_field: The field name in the ontology node (e.g., “email”, “hostname”)

  • node_field: The corresponding field name in the source node (e.g., “email_address”, “device_name”)

  • required: Whether this field is required for ontology node creation (defaults to False)

Example:

OntologyFieldMapping(
    ontology_field="email",
    node_field="email_address",
    required=True
)

Required Fields

The required flag serves two critical purposes:

  1. Data Quality Control: If a source node lacks a required field (i.e., the field is None or missing), the entire ontology node creation is skipped for that record. This ensures only complete, usable data creates ontology nodes.

  2. Primary Identifier Validation: Fields that serve as primary identifiers must be marked as required. For example:

    • email should be required for User ontology nodes

    • hostname should be required for Device ontology nodes

This prevents creating ontology nodes that cannot be properly identified or matched across different data sources.

Example with Required Field:

# If a source DuoUser has no email, no User ontology node is created
OntologyFieldMapping(ontology_field="email", node_field="email", required=True)

Special Field Handling

Ontology field mappings support special handling for complex data transformations using the special_handling parameter:

invert_boolean

Inverts boolean values - useful when a source field represents the opposite of the ontology field:

OntologyFieldMapping(
    ontology_field="inactive",
    node_field="account_enabled",
    special_handling="invert_boolean",
)
# account_enabled=True becomes inactive=False

to_boolean

Converts any non-null value to True, null/missing values to False:

OntologyFieldMapping(
    ontology_field="has_mfa",
    node_field="multifactor",
    special_handling="to_boolean",
)
# Any non-null multifactor value becomes has_mfa=True

or_boolean

Combines multiple boolean fields using logical OR - useful when a concept spans multiple source fields:

OntologyFieldMapping(
    ontology_field="inactive",
    node_field="suspended",
    special_handling="or_boolean",
    extra={"fields": ["archived"]},
)
# inactive = suspended OR archived

equal_boolean

Checks if the field value equals any of the specified values:

OntologyFieldMapping(
    ontology_field="inactive",
    node_field="status",
    special_handling="equal_boolean",
    extra={"values": ["disabled", "locked out", "pending deletion"]},
)
# inactive=True if status is "disabled", "locked out", or "pending deletion"

Node Eligibility

The eligible_for_source parameter in OntologyNodeMapping controls whether a node mapping can create new ontology nodes (default: True).

Set eligible_for_source=False when:

  • A node type lacks sufficient data to create meaningful ontology nodes

  • The node serves only as a connection point to existing ontology nodes

  • Required fields are not available or reliable enough for primary node creation

Example:

OntologyNodeMapping(
    node_label="AWSUser",
    eligible_for_source=False,  # Cannot create User ontology nodes
    fields=[
        OntologyFieldMapping(ontology_field="username", node_field="name")
    ],
)

In this example, AWS IAM users don’t have email addresses (required for User ontology nodes), so they’re marked as ineligible for creating new User nodes. They can still be linked to existing User nodes through relationships.

Ontology Mapping Structure

Ontology mappings are defined in cartography/models/ontology/mapping/data/ using Python dataclasses:

your_service_mapping = OntologyMapping(
    module_name="your_service",
    nodes=[
        OntologyNodeMapping(
            node_label="YourServiceUser",
            fields=[
                OntologyFieldMapping(ontology_field="email", node_field="email", required=True),
                OntologyFieldMapping(ontology_field="username", node_field="username"),
                OntologyFieldMapping(ontology_field="fullname", node_field="display_name"),
            ],
        ),
    ],
    rels=[
        OntologyRelMapping(
            __comment__="Link User to Device based on observed relationships",
            query="MATCH (u:User)-[:HAS_ACCOUNT]->(:YourServiceUser)-[:OWNS]->(:YourServiceDevice)<-[:OBSERVED_AS]-(d:Device) MERGE (u)-[r:OWNS]->(d) ON CREATE SET r.firstseen = timestamp() SET r.lastupdated = $UPDATE_TAG",
            iterative=False,
        ),
    ],
)

This structure allows Cartography to flexibly describe how to map and relate entities from specific integrations into the unified ontology graph.