Adding a New Relationship

Related docs: Main AGENTS.md | Create Module | Add Node Type

This guide covers how to define relationships in Cartography, including standard relationships, MatchLinks for connecting existing nodes, and patterns for multiple modules modifying the same node type.

Standard Relationships

Define how your nodes connect to other nodes:

from cartography.models.core.relationships import (
    CartographyRelSchema, CartographyRelProperties, LinkDirection,
    make_target_node_matcher, TargetNodeMatcher
)

# Relationship properties (usually just lastupdated)
@dataclass(frozen=True)
class YourServiceTenantToUserRelProperties(CartographyRelProperties):
    lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)

# The relationship itself
@dataclass(frozen=True)
class YourServiceTenantToUserRel(CartographyRelSchema):
    target_node_label: str = "YourServiceTenant"                # What we're connecting to
    target_node_matcher: TargetNodeMatcher = make_target_node_matcher({
        "id": PropertyRef("TENANT_ID", set_in_kwargs=True),     # Match on tenant.id = TENANT_ID kwarg
    })
    direction: LinkDirection = LinkDirection.OUTWARD            # Direction of relationship
    rel_label: str = "RESOURCE"                                 # Relationship label
    properties: YourServiceTenantToUserRelProperties = YourServiceTenantToUserRelProperties()

Relationship Directions

  • LinkDirection.OUTWARD: (:YourServiceUser)-[:RESOURCE]->(:YourServiceTenant)

  • LinkDirection.INWARD: (:YourServiceUser)<-[:RESOURCE]-(:YourServiceTenant)

One-to-Many Relationships

When you need to connect one node to many others:

Source Data

# Route table with multiple subnet associations
{
    "RouteTableId": "rtb-123",
    "Associations": [
        {"SubnetId": "subnet-abc"},
        {"SubnetId": "subnet-def"},
    ]
}

Transform for One-to-Many

def transform_route_tables(route_tables):
    result = []
    for rt in route_tables:
        transformed = {
            "id": rt["RouteTableId"],
            # Extract list of subnet IDs
            "subnet_ids": [assoc["SubnetId"] for assoc in rt.get("Associations", []) if "SubnetId" in assoc],
        }
        result.append(transformed)
    return result

Define One-to-Many Relationship

@dataclass(frozen=True)
class RouteTableToSubnetRel(CartographyRelSchema):
    target_node_label: str = "EC2Subnet"
    target_node_matcher: TargetNodeMatcher = make_target_node_matcher({
        "subnet_id": PropertyRef("subnet_ids", one_to_many=True),  # KEY: one_to_many=True
    })
    direction: LinkDirection = LinkDirection.OUTWARD
    rel_label: str = "ASSOCIATED_WITH"
    properties: RouteTableToSubnetRelProperties = RouteTableToSubnetRelProperties()

The Magic: one_to_many=True tells Cartography to create a relationship to each subnet whose subnet_id is in the subnet_ids list.



Multiple Intel Modules Modifying the Same Node Type

It is possible (and encouraged) for more than one intel module to modify the same node type. However, there are two distinct patterns for this:

Simple Relationship Pattern

When data type A only refers to data type B by an ID without providing additional properties about B, we can just define a relationship schema. This way when A is loaded, the relationship schema performs a MATCH to find and connect to existing nodes of type B.

For example, when an RDS instance refers to EC2 security groups by ID, we create a relationship from the RDS instance to the security group nodes, since the RDS API doesn’t provide additional properties about the security groups beyond their IDs.

# RDS Instance refers to Security Groups by ID only
@dataclass(frozen=True)
class RDSInstanceToSecurityGroupRel(CartographyRelSchema):
    target_node_label: str = "EC2SecurityGroup"
    target_node_matcher: TargetNodeMatcher = make_target_node_matcher({
        "id": PropertyRef("SecurityGroupId"),  # Just the ID, no additional properties
    })
    direction: LinkDirection = LinkDirection.OUTWARD
    rel_label: str = "MEMBER_OF_EC2_SECURITY_GROUP"
    properties: RDSInstanceToSecurityGroupRelProperties = RDSInstanceToSecurityGroupRelProperties()

Composite Node Pattern

When a data type A refers to another data type B and offers additional fields about B that B doesn’t have itself, we should define a composite node schema. This composite node would be named “BASchema” to denote that it’s a “B” object as known by an “A” object. When loaded, the composite node schema targets the same node label as the primary B schema, allowing the loading system to perform a MERGE operation that combines properties from both sources.

For example, in the AWS EC2 module, we have both EBSVolumeSchema (from the EBS API) and EBSVolumeInstanceSchema (from the EC2 Instance API). The EC2 Instance API provides additional properties about EBS volumes that the EBS API doesn’t have, such as deleteontermination. Both schemas target the same EBSVolume node label, allowing the node to accumulate properties from both sources.

# EC2 Instance provides additional properties about EBS Volumes
@dataclass(frozen=True)
class EBSVolumeInstanceProperties(CartographyNodeProperties):
    id: PropertyRef = PropertyRef("VolumeId")
    arn: PropertyRef = PropertyRef("Arn", extra_index=True)
    lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
    # Additional property that EBS API doesn't have
    deleteontermination: PropertyRef = PropertyRef("DeleteOnTermination")

@dataclass(frozen=True)
class EBSVolumeInstanceSchema(CartographyNodeSchema):
    label: str = "EBSVolume"  # Same label as EBSVolumeSchema
    properties: EBSVolumeInstanceProperties = EBSVolumeInstanceProperties()
    sub_resource_relationship: EBSVolumeToAWSAccountRel = EBSVolumeToAWSAccountRel()
    # ... other relationships

The key distinction is whether the referring module provides additional properties about the target entity. If it does, use a composite node schema. If it only provides IDs, use a simple relationship schema.


Common Patterns

Pattern 1: Simple Service with Users (LastPass Style)

# Data flow
API Response -> transform() -> [{"id": "123", "email": "user@example.com", ...}] -> load()

# Key characteristics:
- One main entity type (users)
- Simple tenant relationship
- Standard fields (id, email, created_at, etc.)

Pattern 2: Complex Infrastructure (AWS EC2 Style)

# Data flow
API Response -> transform() -> Multiple lists -> Multiple load() calls

# Key characteristics:
- Multiple entity types (instances, security groups, subnets)
- Complex relationships between entities
- Regional/account-scoped resources

Pattern 3: Hierarchical Resources (Route Tables Style)

# One-to-many transformation
{
    "RouteTableId": "rtb-123",
    "Associations": [{"SubnetId": "subnet-abc"}, {"SubnetId": "subnet-def"}]
}
->
{
    "id": "rtb-123",
    "subnet_ids": ["subnet-abc", "subnet-def"]  # Flattened for one_to_many
}