Adding a New Node Type¶
Related docs: Main AGENTS.md | Create Module | Add Relationship
This guide covers advanced node schema properties, including extra labels, scoped cleanup, one-to-many relationships, and common mistakes to avoid.
Node Properties¶
Define the properties that will be stored on your node:
from dataclasses import dataclass
from cartography.models.core.common import PropertyRef
from cartography.models.core.nodes import CartographyNodeProperties
@dataclass(frozen=True)
class YourServiceUserNodeProperties(CartographyNodeProperties):
# Required unique identifier
id: PropertyRef = PropertyRef("id")
# Automatic fields (set by cartography)
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
# Business fields from your API
email: PropertyRef = PropertyRef("email", extra_index=True) # Create index for queries
name: PropertyRef = PropertyRef("name")
created_at: PropertyRef = PropertyRef("created_at")
last_login: PropertyRef = PropertyRef("last_login")
is_admin: PropertyRef = PropertyRef("is_admin")
# Fields from kwargs (same for all records in a batch)
tenant_id: PropertyRef = PropertyRef("TENANT_ID", set_in_kwargs=True)
PropertyRef Parameters:
First parameter: Key in your data dict or kwarg name. Use keys when you are ingesting a list of records. Use kwargs when you want to set the same value for all records in the list of records.
extra_index=True: Create database index for better query performanceset_in_kwargs=True: Value comes from kwargs passed toload(), not from individual records
Node Schema¶
Define your complete node schema:
from cartography.models.core.nodes import CartographyNodeSchema
from cartography.models.core.relationships import OtherRelationships
@dataclass(frozen=True)
class YourServiceUserSchema(CartographyNodeSchema):
label: str = "YourServiceUser" # Neo4j node label
properties: YourServiceUserNodeProperties = YourServiceUserNodeProperties()
sub_resource_relationship: YourServiceTenantToUserRel = YourServiceTenantToUserRel()
# Optional: Additional relationships
other_relationships: OtherRelationships = OtherRelationships([
YourServiceUserToHumanRel(), # Connect to Human nodes
])
Advanced Node Schema Properties¶
Extra Node Labels¶
Add additional Neo4j labels to your nodes using extra_node_labels:
from cartography.models.core.nodes import ExtraNodeLabels
@dataclass(frozen=True)
class YourServiceUserSchema(CartographyNodeSchema):
label: str = "YourServiceUser"
properties: YourServiceUserNodeProperties = YourServiceUserNodeProperties()
sub_resource_relationship: YourServiceTenantToUserRel = YourServiceTenantToUserRel()
# Add extra labels like "Identity" and "Asset" to the node
extra_node_labels: ExtraNodeLabels = ExtraNodeLabels(["Identity", "Asset"])
This creates nodes with multiple labels: (:YourServiceUser:Identity:Asset).
Conditional Node Labels¶
Warning: Conditional labels are a specialized feature primarily used for ontology mapping scenarios where a single data source produces records that map to different semantic types. Most intel modules do not need this feature.
Sometimes you want to apply labels only when certain conditions are met. Use ConditionalNodeLabel for this:
from cartography.models.core.nodes import ConditionalNodeLabel, ExtraNodeLabels
@dataclass(frozen=True)
class ECRImageSchema(CartographyNodeSchema):
label: str = "ECRImage"
properties: ECRImageNodeProperties = ECRImageNodeProperties()
sub_resource_relationship: ECRImageToAccountRel = ECRImageToAccountRel()
# Apply different ontology labels based on the image type
extra_node_labels: ExtraNodeLabels = ExtraNodeLabels([
ConditionalNodeLabel(
label="Image",
conditions={"type": "IMAGE"}
),
ConditionalNodeLabel(
label="ImageAttestation",
conditions={"type": "IMAGE_ATTESTATION"}
),
ConditionalNodeLabel(
label="ImageManifestList",
conditions={"type": "IMAGE_MANIFEST_LIST"}
),
])
Primary use case: Container Registry Images
ECR (and other container registries) store different types of artifacts that share the same base schema but have fundamentally different semantic meanings:
|
Ontology Label |
Description |
|---|---|---|
|
|
Standard container image |
|
|
SLSA/Sigstore attestation |
|
|
Multi-arch manifest list |
Without conditional labels, we cannot accurately map these to distinct ontology types. An ECRImage node with type: "IMAGE_ATTESTATION" should be labeled as ImageAttestation in the ontology, not just generic Image.
How it works:
String labels (like
"SecurityFinding") are applied unconditionally to all nodes during ingestionConditionalNodeLabellabels are applied in a separate query after ingestion, only to nodes matching all specified conditionsConditions use exact string equality and are combined with AND logic
When conditions change, labels are automatically added or removed on subsequent syncs
Important notes:
Condition values must be strings (e.g.,
"true"notTrue)All conditions must match for the label to be applied (AND logic)
Indexes are automatically created for conditional labels and their condition fields
Scoped Cleanup¶
Control cleanup behavior with scoped_cleanup:
@dataclass(frozen=True)
class YourServiceUserSchema(CartographyNodeSchema):
label: str = "YourServiceUser"
properties: YourServiceUserNodeProperties = YourServiceUserNodeProperties()
sub_resource_relationship: YourServiceTenantToUserRel = YourServiceTenantToUserRel()
# Default behavior (scoped_cleanup=True): Only clean up users within the current tenant
# scoped_cleanup: bool = True # This is the default, no need to specify
When to Override scoped_cleanup:
Set scoped_cleanup=False ONLY for intel modules that don’t have a clear tenant-like entity:
@dataclass(frozen=True)
class VulnerabilitySchema(CartographyNodeSchema):
label: str = "Vulnerability"
properties: VulnerabilityNodeProperties = VulnerabilityNodeProperties()
sub_resource_relationship: None = None # No tenant relationship
# Vulnerabilities are global data, not scoped to a specific tenant
scoped_cleanup: bool = False
Examples where scoped_cleanup=False makes sense:
Vulnerability databases (CVE data is global)
Threat intelligence feeds (IOCs are not tenant-specific)
Public certificate transparency logs
Global DNS/domain information
Default behavior (scoped_cleanup=True) is correct for:
User accounts (scoped to organization/tenant)
Infrastructure resources (scoped to AWS account, Azure subscription, etc.)
Application assets (scoped to company/tenant)
One-to-Many Relationships¶
Sometimes you need to connect one node to many others. Example from AWS route tables:
Source Data¶
# Route table with multiple subnet associations
{
"RouteTableId": "rtb-123",
"Associations": [
{"SubnetId": "subnet-abc"},
{"SubnetId": "subnet-def"},
]
}
Transform for One-to-Many¶
def transform_route_tables(route_tables):
result = []
for rt in route_tables:
transformed = {
"id": rt["RouteTableId"],
# Extract list of subnet IDs
"subnet_ids": [assoc["SubnetId"] for assoc in rt.get("Associations", []) if "SubnetId" in assoc],
}
result.append(transformed)
return result
Define One-to-Many Relationship¶
@dataclass(frozen=True)
class RouteTableToSubnetRel(CartographyRelSchema):
target_node_label: str = "EC2Subnet"
target_node_matcher: TargetNodeMatcher = make_target_node_matcher({
"subnet_id": PropertyRef("subnet_ids", one_to_many=True), # KEY: one_to_many=True
})
direction: LinkDirection = LinkDirection.OUTWARD
rel_label: str = "ASSOCIATED_WITH"
properties: RouteTableToSubnetRelProperties = RouteTableToSubnetRelProperties()
The Magic: one_to_many=True tells Cartography to create a relationship to each subnet whose subnet_id is in the subnet_ids list.
Common Schema Mistakes to Avoid¶
DO NOT add custom properties to CartographyRelSchema or CartographyNodeSchema subclasses: These dataclasses are processed by Cartography’s core loading system, which only recognizes the standard fields defined in the base classes. Any additional fields you add will be ignored and have no effect.
# DON'T do this - custom fields are ignored by the loading system
@dataclass(frozen=True)
class MyRelationship(CartographyRelSchema):
target_node_label: str = "SomeNode"
target_node_matcher: TargetNodeMatcher = make_target_node_matcher({"id": PropertyRef("some_id")})
direction: LinkDirection = LinkDirection.OUTWARD
rel_label: str = "CONNECTS_TO"
properties: MyRelProperties = MyRelProperties()
# These custom fields do nothing:
conditional_match_property: str = "some_id"
custom_flag: bool = True
extra_config: dict = {"key": "value"}
# DON'T do this either - custom fields on node schemas are also ignored
@dataclass(frozen=True)
class MyNodeSchema(CartographyNodeSchema):
label: str = "MyNode"
properties: MyNodeProperties = MyNodeProperties()
sub_resource_relationship: MyRel = MyRel()
# This custom field does nothing:
custom_setting: str = "ignored"
# DO this instead - stick to the standard schema fields only
@dataclass(frozen=True)
class MyRelationship(CartographyRelSchema):
target_node_label: str = "SomeNode"
target_node_matcher: TargetNodeMatcher = make_target_node_matcher({"id": PropertyRef("some_id")})
direction: LinkDirection = LinkDirection.OUTWARD
rel_label: str = "CONNECTS_TO"
properties: MyRelProperties = MyRelProperties()
@dataclass(frozen=True)
class MyNodeSchema(CartographyNodeSchema):
label: str = "MyNode"
properties: MyNodeProperties = MyNodeProperties()
sub_resource_relationship: MyRel = MyRel()
extra_node_labels: ExtraNodeLabels = ExtraNodeLabels(["AnotherLabel", ...]) # Optional
other_relationships: OtherRelationships = OtherRelationships([...]) # Optional
scoped_cleanup: bool = True # Optional, defaults to True
Standard fields for CartographyRelSchema:
target_node_label: strtarget_node_matcher: TargetNodeMatcherdirection: LinkDirectionrel_label: strproperties: CartographyRelProperties subclass
Standard fields for CartographyNodeSchema:
label: strproperties: CartographyNodeProperties subclasssub_resource_relationship: CartographyRelSchema subclassother_relationships: OtherRelationships (optional)extra_node_labels: ExtraNodeLabels (optional)scoped_cleanup: bool (optional, defaults to True, almost should never be overridden. This is only used for intel modules that don’t have a clear tenant-like entity.)
If you need conditional behavior, handle it in your transform function by setting field values to None when relationships shouldn’t be created, or by filtering your data before calling load().
Sub-Resource Relationships: Always Point to Tenant-Like Objects¶
The sub_resource_relationship should always refer to a tenant-like object that represents the ownership or organizational boundary of the resource. This is crucial for proper data organization and cleanup operations.
Correct Examples:
AWS Resources: Point to
AWSAccount(tenant = AWS account)Azure Resources: Point to
AzureSubscription(tenant = Azure subscription)GCP Resources: Point to
GCPProject(tenant = GCP project)SaaS Applications: Point to
YourServiceTenant(tenant = organization/company)GitHub Resources: Point to
GitHubOrganization(tenant = GitHub org)
Incorrect Examples:
Pointing to a parent resource that’s not tenant-like (e.g.,
ECSTaskDefinition->ECSTask)Pointing to infrastructure components (e.g.,
ECSContainer->ECSTask)Pointing to logical groupings that aren’t organizational boundaries
Why This Matters:
Cleanup Operations: Cartography uses the sub-resource relationship to determine which data to clean up during sync operations
Data Organization: Tenant-like objects provide natural boundaries for data organization
Access Control: Tenant relationships enable proper access control and data isolation
Consistency: Following this pattern ensures consistent data modeling across all modules
Loading Data¶
Use the load function with your schema:
from cartography.client.core.tx import load
def load_users(
neo4j_session: neo4j.Session,
data: list[dict[str, Any]],
tenant_id: str,
update_tag: int,
) -> None:
# Load tenant first (if it doesn't exist)
load(
neo4j_session,
YourServiceTenantSchema(),
[{"id": tenant_id}],
lastupdated=update_tag,
)
# Load users with relationships
load(
neo4j_session,
YourServiceUserSchema(),
data,
lastupdated=update_tag,
TENANT_ID=tenant_id, # This becomes available as PropertyRef("TENANT_ID", set_in_kwargs=True)
)
Required Node Properties¶
Every node must have these properties:
@dataclass(frozen=True)
class YourNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("id") # REQUIRED: Unique identifier
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True) # REQUIRED: Auto-managed
# Your business properties here...
Date Handling¶
Neo4j 4+ supports native Python datetime objects and ISO 8601 formatted strings. Avoid manual datetime parsing:
# DON'T: Manually parse dates or convert to epoch timestamps
"created_at": int(dt_parse.parse(user_data["created_at"]).timestamp() * 1000)
"last_login": dict_date_to_epoch({"d": dt_parse.parse(data["last_login"])}, "d")
# DO: Pass datetime values directly
"created_at": user_data.get("created_at") # AWS/API returns ISO 8601 dates
"last_login": user_data.get("last_login") # Neo4j handles these natively