Tree diff formats

The functions in treediffer are used to find differences between two trees. The resulting diff contains both structural information about the nodes’ position in the tree as well as the nodes’ attributes.

Example of node deleted

A node deletion appears under the key nodes_deleted.{node_id} and contains the structural info where the deletion occurred, and the attributes of the node that was deleted:

nodes_deleted = [
    {
        "old_node_id": (str),
        "old_parent_id": (str),
        "old_sort_order": (float),
        "content_id": (content_id),
        "attributes": {
            "title": {"value": "The title of the deleted node"},
            "description": {"value": "The description of the node that was deleted"},
            "content_id": {"value": (str)},
            "sort_order": {"value": (float)},
            ... }
    },
    {},
    ...,
    {},
]

To apply this deletion patch, find content node c = ContenNode.filter(tree_id=...).get(node_id=deleted['node_id']), and its parent parent_node = ContenNode.filter(tree_id=...).get(node_id=deleted['old_parent_id']), then call parent_node.children.remove(c). The attributes dict is provided for information purposes only (e.g. to display info about the node being deleted in the UI).

Example of node added

A node added to the tree is appears under the key nodes_added.{node_id} and is represented by the following dict:

nodes_added = [
    {
        "parent_id": (node_id),
        "node_id": (node_id),
        "sort_order": (float),
        "content_id": (content_id),
        "attributes": {
            "title": {"value": "The title of the new node"},
            "description": {"value": "The description of the new node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": (float)},
            ... }
    },
    {},
    ...,
    {},
]

To apply this “patch,” create a new c = ContentNode(**map(..., attributes)), find the parent node where the new node is to be added, parent_node = ContenNode.filter(tree_id=...).get(node_id=added['parent_id']), then call parent_node.children.add(c).

Note: when applying a patch with multiple node additions, watch out for sort order in systems that don’t know how to handle sort_order (ricecooker).

Example of node moved

Nodes in the newtree that have the same content_id as a node in the oldtree but whose node_id has changed can be interpreted as moves. If multiple nodes satisfy this criterion, the only the first node is treated as a move (in tree-order) while others “node clones” of the same content_id are recorded as additions.

nodes_moved = [
    {
        "content_id": (content_id),
        "node_id": (node_id),
        "old_node_id": (node_id),
        "parent_id": (node_id),
        "old_parent_id": (node_id),
        "sort_order": (float),
        "old_sort_order": (float),
        "attributes": {
            "title": {"value": "The title of the new node"},
            "description": {"value": "The description of the new node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": (float), "old_value": (== old_sort_order)},
            ...},
    },
    {},
    ...,
    {},
]

Example of node modified

When a node appears in both the “before tree” and the “after tree” in the same node_id (position within the tree) but who have different attributes. These “attribute diffs” are stored under the keys old_value, value = new value, with special handling for set-like attributes (files and tags), and list-like attributes (assessment_items).

Example data structure:

nodes_modified = [
    {
        "node_id": (node_id),
        "parent_id": (node_id),
        "content_id": (content_id),
        # intentionally not tracking changes in sort_order since those are defined as moves
        "changed": ["title", "sort_order", "tags", "assessment_items"],
        "attributes": {
            "title":{
                "old_value": "Old title",
                "value": "The new title for the node"
            },
            "description": {"value": "The description of the new node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": (float), "old_value": (== old_sort_order)},
            "tags": {                                                           # set-like
                "old_value": ['oldtag1', 'oldtag2'],
                "value": ['oldtag1', 'newtag3', 'newtag4'],
                "tags_removed": ['oldtag2'],
                "tags_added": ['newtag3', 'newtag4'],
            },
            "assessment_items": {                                               # list-like
                "old_value": [
                    {"id": "aiid1", "assessment_id": 'q1', ... },
                    {"id": "aiid2", "assessment_id": 'q2', ... },
                    {"id": "aiid3", "assessment_id": 'q3', ... }
                ],
                "value": [
                    {"id": "aiid1", "assessment_id": 'q1', ... },
                    {"id": "aiid4", "assessment_id": 'q4', ... },
                    {"id": "aiid3", "assessment_id": 'q3', ... },
                ],
                "deleted": [
                    {"id": "aiid2", "assessment_id": 'q2', ... },
                ]
                "added": [
                    {"id": "aiid4", "assessment_id": 'q4', ... },
                ]
                "moved": [],
                "modified": [],
            },
        },
    },
    {},
    ...,
    {},
]

Node structural annotations

old_parent_id parent_id old_sort_order sort_order
nodes_deleted x x
nodes_added x x
nodes_moved x x x x
nodes_modified

Flat diffs

Suppose a topic node T1 is added which has two children N1 and N2. Depending on the use case for the diff, we want to represent this change in different ways. The raw and simplified diff formats correspond to flat lists, so this change will appear as three separate items in the nodes_added list:

nodes_added = [
    ...,
    {
        "parent_id": (node_id(parentT1)),
        "node_id": (node_id(T1)),
        "sort_order": (float),
        "content_id": (content_id(T1)),
        "attributes": {
            "title": {"value": "T1"},
            "description": {"value": "The description of the new topic node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": (float)},
            ... }
    },
    {
        "parent_id": (node_id(T1)),
        "node_id": (node_id(N1)),
        "sort_order": 1.0,
        "content_id": (content_id(N1)),
        "attributes": {
            "title": {"value": "N1"},
            "description": {"value": "The description of the first new node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": 1.0},
            ... }
    },
    {
        "parent_id": (node_id(T1)),
        "node_id": (node_id(N2)),
        "sort_order": 2.0,
        "content_id": (content_id(N2)),
        "attributes": {
            "title": {"value": "N2"},
            "description": {"value": "The description of the second new node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": 2.0},
            ... }
    },
    ...,
]

This is appropriate for counting number of nodes added/deleted/moved/modified.

Restructured diffs

In the restructured diff format we’ll combine these additions into a single logical addition of the topic, and indicate the presence of the child notes as children to the top-level topic addition:

nodes_added = [
    ...,
    {
        "parent_id": (node_id(parentT1)),
        "node_id": (node_id(T1)),
        "sort_order": (float),
        "content_id": (content_id(T1)),
        "attributes": {
            "title": {"value": "T1"},
            "description": {"value": "The description of the new topic node"},
            "content_id": {"value": (str)},
            "sort_order": {"value": (float)},
            # no children     # <<< Note: children treated outside of attributes
            ... },
        "children": [
              {
                  "parent_id": (node_id(T1)),
                  "node_id": (node_id(N1)),
                  "sort_order": 1.0,
                  "content_id": (content_id(N1)),
                  "attributes": {
                      "title": {"value": "N1"},
                      "description": {"value": "The description of the first new node"},
                      "content_id": {"value": (str)},
                      "sort_order": {"value": 1.0},
                      ... }
              },
              {
                  "parent_id": (node_id(T1)),
                  "node_id": (node_id(N2)),
                  "sort_order": 2.0,
                  "content_id": (content_id(N2)),
                  "attributes": {
                      "title": {"value": "N2"},
                      "description": {"value": "The description of the second new node"},
                      "content_id": {"value": (str)},
                      "sort_order": {"value": 2.0},
                      ... }
              },
        ]
    },
    ...,
]

This format would is better suited for displaying the changes in a UI in a compact logical manner to avoid overwhelming users with long lists.