Enriching the Ontology¶
Related docs: Main AGENTS.md | Create Module | Add Node Type
This guide covers how to integrate your module with Cartography’s Ontology system to enable cross-module querying.
Table of Contents¶
Overview of Ontology System¶
Cartography includes an Ontology system that provides both semantic labels and canonical nodes to unify data from multiple sources. This enables cross-module querying and provides a normalized view of identity and device management across your infrastructure.
The Ontology system works in two ways:
Semantic Labels: Adds semantic labels (like
UserAccount) and prefixed properties (_ont_*) directly to source nodes for cross-module querying during ingestionCanonical Nodes: Creates canonical nodes (like
(:User:Ontology)) that represent unified entities
Types of Ontology Integration¶
Semantic Labels (Recommended)¶
Adds UserAccount labels and _ont_* properties to existing nodes:
Simpler implementation - no additional node creation
Direct querying of source nodes with normalized properties
Automatic property mapping with special handling for data transformations
Source tracking via
_ont_sourceproperty
Canonical Nodes¶
Creates separate abstract User/Device nodes:
More complex - requires separate node creation and relationship management
Additional storage overhead
Useful when you need to aggregate data from multiple sources into single entities
Ontology Execution Flow¶
Understanding when and how the ontology is applied:
Semantic Labels (Applied at Ingestion Time)¶
Semantic labels are applied automatically during data ingestion when you call load():
Your module calls
load(neo4j_session, YourSchema(), data, ...)Cartography checks if your schema has a semantic label (via
ExtraNodeLabels)If found, it looks up the mapping in
SEMANTIC_LABELS_MAPPINGThe
_ont_*properties are automatically added to your nodesThe
_ont_sourceproperty is set to track which module provided the data
Canonical Nodes (Applied as Separate Intel Module)¶
Canonical nodes are created by a dedicated intel module (cartography.intel.ontology) that runs after your module:
Your module ingests data with semantic labels
The ontology intel module runs (configured via CLI options)
It reads source nodes from the graph matching the configured sources of truth
Creates
(:User:Ontology)or(:Device:Ontology)canonical nodesExecutes
OntologyRelMappingqueries to link canonical nodes
Configuration:
# Configure sources of truth for ontology nodes
cartography --ontology-users-source "okta,entra,gsuite"
cartography --ontology-devices-source "crowdstrike,kandji,duo"
Available Semantic Labels and Fields¶
For the complete list of available semantic labels and their fields, see:
Schema documentation: Ontology Schema
Mapping source code:
cartography/models/ontology/mapping/data/
Step 1: Add Ontology Mapping Configuration¶
Create mapping configurations in cartography/models/ontology/mapping/data/.
Available special_handling Options¶
When mapping fields, you can use these special handling options to transform values:
Value |
Description |
Extra Parameters |
|---|---|---|
|
Inverts the boolean value ( |
None |
|
Converts to boolean, treating non-null as |
None |
|
Logical OR of multiple boolean fields |
|
|
Logical NOR of multiple boolean fields |
|
|
Returns |
|
|
Sets a static value, ignoring |
|
Examples:
# Invert a boolean (disabled → active)
OntologyFieldMapping(
ontology_field="active",
node_field="disabled",
special_handling="invert_boolean",
)
# OR multiple fields (suspended OR archived → inactive)
OntologyFieldMapping(
ontology_field="active",
node_field="suspended",
extra={"fields": ["archived"]},
special_handling="nor_boolean",
)
# Match against allowed values
OntologyFieldMapping(
ontology_field="active",
node_field="status",
extra={"values": ["active", "bypass"]},
special_handling="equal_boolean",
)
# Set a static value for database type
OntologyFieldMapping(
ontology_field="type",
node_field="",
special_handling="static_value",
extra={"value": "dynamodb"},
)
For Semantic Labels¶
# cartography/models/ontology/mapping/data/useraccounts.py
from cartography.models.ontology.mapping.specs import OntologyFieldMapping
from cartography.models.ontology.mapping.specs import OntologyMapping
from cartography.models.ontology.mapping.specs import OntologyNodeMapping
# Add your mapping to the file
your_service_mapping = OntologyMapping(
module_name="your_service",
nodes=[
OntologyNodeMapping(
node_label="YourServiceUser", # Your node label
fields=[
# Map your node fields to ontology fields with special handling
OntologyFieldMapping(ontology_field="email", node_field="email"),
OntologyFieldMapping(ontology_field="username", node_field="username"),
OntologyFieldMapping(ontology_field="fullname", node_field="display_name"),
OntologyFieldMapping(ontology_field="firstname", node_field="first_name"),
OntologyFieldMapping(ontology_field="lastname", node_field="last_name"),
# Special handling examples:
OntologyFieldMapping(
ontology_field="inactive",
node_field="account_enabled",
special_handling="invert_boolean",
),
OntologyFieldMapping(
ontology_field="has_mfa",
node_field="multifactor",
special_handling="to_boolean",
),
OntologyFieldMapping(
ontology_field="inactive",
node_field="suspended",
special_handling="or_boolean",
extra={"fields": ["archived"]},
),
],
),
],
)
For Canonical Nodes¶
# cartography/models/ontology/mapping/data/devices.py
from cartography.models.ontology.mapping.specs import OntologyFieldMapping
from cartography.models.ontology.mapping.specs import OntologyMapping
from cartography.models.ontology.mapping.specs import OntologyNodeMapping
from cartography.models.ontology.mapping.specs import OntologyRelMapping
# Add your mapping to the file
your_service_mapping = OntologyMapping(
module_name="your_service",
nodes=[
OntologyNodeMapping(
node_label="YourServiceDevice", # Your node label
fields=[
# Map your node fields to ontology fields
OntologyFieldMapping(ontology_field="hostname", node_field="device_name", required=True), # Required field
OntologyFieldMapping(ontology_field="os", node_field="operating_system"),
OntologyFieldMapping(ontology_field="os_version", node_field="os_version"),
OntologyFieldMapping(ontology_field="model", node_field="device_model"),
OntologyFieldMapping(ontology_field="platform", node_field="platform"),
OntologyFieldMapping(ontology_field="serial_number", node_field="serial"),
],
),
],
# Optional: Add relationship mappings to connect Users to Devices
rels=[
OntologyRelMapping(
__comment__="Link Device to User based on YourServiceUser-YourServiceDevice ownership",
query="""
MATCH (u:User)-[:HAS_ACCOUNT]->(:YourServiceUser)-[:OWNS]->(:YourServiceDevice)<-[:OBSERVED_AS]-(d:Device)
MERGE (u)-[r:OWNS]->(d)
ON CREATE SET r.firstseen = timestamp()
SET r.lastupdated = $UPDATE_TAG
""",
iterative=False,
),
],
)
Step 2: Register Your Mapping¶
After creating your mapping, you must register it so Cartography can discover it.
Add your mapping to the appropriate file in
cartography/models/ontology/mapping/data/Add your mapping to the dictionary at the bottom of the file:
# Example: At the bottom of useraccounts.py
USERACCOUNTS_ONTOLOGY_MAPPING: dict[str, OntologyMapping] = {
# ... existing mappings ...
"your_service": your_service_mapping, # Add your mapping here
}
The mappings are automatically imported via cartography/models/ontology/mapping/__init__.py.
Step 3: Add Ontology Configuration to Your Node Schema¶
Semantic Labels¶
Simply add the semantic label - the ontology system will automatically add _ont_* properties at the ingestion time.
from cartography.models.core.nodes import ExtraNodeLabels
@dataclass(frozen=True)
class YourServiceUserSchema(CartographyNodeSchema):
label: str = "YourServiceUser"
# Add UserAccount label for semantic ontology integration
extra_node_labels: ExtraNodeLabels = ExtraNodeLabels(["UserAccount"])
properties: YourServiceUserNodeProperties = YourServiceUserNodeProperties()
sub_resource_relationship: YourServiceTenantToUserRel = YourServiceTenantToUserRel()
That’s it! The ontology system will automatically:
Add
_ont_email,_ont_fullname, etc. properties to your nodesApply any special handling (boolean conversion, inversion, etc.)
Add
_ont_sourceproperty to track which module provided the data
Canonical Nodes¶
You need to define a Schema model for the canonical node and add a relationship to it (similar to regular intel nodes)
@dataclass(frozen=True)
class UserNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("email")
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
email: PropertyRef = PropertyRef("email", extra_index=True)
fullname: PropertyRef = PropertyRef("fullname")
firstname: PropertyRef = PropertyRef("firstname")
lastname: PropertyRef = PropertyRef("lastname")
inactive: PropertyRef = PropertyRef("inactive")
@dataclass(frozen=True)
class UserToUserAccountRelProperties(CartographyRelProperties):
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
# (:User)-[:HAS_ACCOUNT]->(:UserAccount)
# This is a relationship to a sementic label used by modules' users nodes
@dataclass(frozen=True)
class UserToUserAccountRel(CartographyRelSchema):
target_node_label: str = "UserAccount"
target_node_matcher: TargetNodeMatcher = make_target_node_matcher(
{"email": PropertyRef("email")},
)
direction: LinkDirection = LinkDirection.OUTWARD
rel_label: str = "HAS_ACCOUNT"
properties: UserToUserAccountRelProperties = UserToUserAccountRelProperties()
@dataclass(frozen=True)
class UserSchema(CartographyNodeSchema):
label: str = "User"
extra_node_labels: ExtraNodeLabels = ExtraNodeLabels(["Ontology"])
properties: UserNodeProperties = UserNodeProperties()
scoped_cleanup: bool = False
other_relationships: OtherRelationships = OtherRelationships(
rels=[UserToUserAccountRel()],
)
Step 4: Understanding Ontology Field Mappings¶
Required Fields¶
The required parameter in OntologyFieldMapping serves two critical purposes:
1. Data Quality Control: When required=True, source nodes that lack this field (i.e., the field is None or missing) will be completely excluded from ontology node creation. This ensures only complete, usable data creates ontology nodes.
2. Primary Identifier Validation: Fields used as primary identifiers must be marked as required to ensure ontology nodes can always be properly identified and matched across data sources.
# DO: Mark primary identifiers as required
OntologyFieldMapping(ontology_field="email", node_field="email", required=True), # Users
OntologyFieldMapping(ontology_field="hostname", node_field="device_name", required=True), # Devices
# DO: Mark optional fields as not required (default)
OntologyFieldMapping(ontology_field="firstname", node_field="first_name"), # Optional field
Example: If a DuoUser node has no email address and email is marked as required=True, no corresponding User ontology node will be created for that record.
Node Eligibility¶
The eligible_for_source parameter in OntologyNodeMapping controls whether this node mapping can create new ontology nodes (default: True).
When to set eligible_for_source=False:
Node type lacks sufficient data to create meaningful ontology nodes (e.g., no email for Users)
Node serves only as a connection point to existing ontology nodes
Required fields are not available or reliable enough for primary node creation
# Example: AWS IAM users don't have email addresses (required for User ontology nodes)
OntologyNodeMapping(
node_label="AWSUser",
eligible_for_source=False, # Cannot create new User ontology nodes
fields=[
OntologyFieldMapping(ontology_field="username", node_field="name")
],
),
In this example, AWS IAM users can be linked to existing User ontology nodes through relationships, but they cannot create new User nodes since they lack email addresses.
Step 5: Handle Complex Relationships¶
For services that have user-device relationships, add relationship mappings:
# In your device mapping
rels=[
OntologyRelMapping(
__comment__="Connect users to their devices",
query="""
MATCH (u:User)-[:HAS_ACCOUNT]->(:YourServiceUser)-[:OWNS]->(:YourServiceDevice)<-[:OBSERVED_AS]-(d:Device)
MERGE (u)-[r:OWNS]->(d)
ON CREATE SET r.firstseen = timestamp()
SET r.lastupdated = $UPDATE_TAG
""",
iterative=False,
),
]
Testing Ontology Integration¶
To verify your ontology integration works correctly:
For Semantic Labels¶
Check that _ont_* properties are added to your nodes:
def test_ontology_properties(neo4j_session):
# After running your sync function
result = neo4j_session.run(
"MATCH (n:YourServiceUser) RETURN n._ont_email, n._ont_source LIMIT 1"
).single()
assert result["n._ont_email"] is not None
assert result["n._ont_source"] == "your_service"
For Canonical Nodes¶
Check that canonical nodes and relationships are created:
def test_canonical_user_created(neo4j_session):
# After running the ontology intel module
result = neo4j_session.run(
"""
MATCH (u:User:Ontology)-[:HAS_ACCOUNT]->(ua:YourServiceUser)
RETURN count(u) as user_count
"""
).single()
assert result["user_count"] > 0
Documenting Ontology Integration¶
When your node has an ontology integration (semantic label or canonical node relationship), you must document it in the schema file (docs/root/modules/your_service/schema.md).
Standard Ontology Mapping Phrase¶
Add a blockquote immediately after the node description using this format:
### YourServiceUser
Represents a user in Your Service.
> **Ontology Mapping**: This node has the extra label `{SemanticLabel}` to enable cross-platform queries for {description} across different systems (e.g., {examples}).
Phrase Templates by Semantic Label¶
Semantic Label |
Standard Phrase |
|---|---|
|
|
|
|
|
|
|
|
|
|
Example: AWSAccount with Tenant Label¶
From docs/root/modules/aws/schema.md:
### AWSAccount
Representation of an AWS Account.
> **Ontology Mapping**: This node has the extra label `Tenant` to enable cross-platform queries for organizational tenants across different systems (e.g., OktaOrganization, AzureTenant, GCPOrganization).
| Field | Description |
|-------|-------------|
| firstseen | Timestamp of when a sync job discovered this node |
| name | The name of the account |
| lastupdated | Timestamp of the last time the node was updated |
| **id** | The AWS Account ID number |