Patricia Trie: Key to Efficient State Management in JAM Protocol

Visualization of a Patricia Trie structure showing compressed binary paths leading to encoded state components, with nodes representing state transitions and leaves containing encoded values such as α, β, and other state elements.

In the Gray Paper, Appendix D delves into the concept of the Patricia Trie and its application in computing a Merkle root for the entire system state. This article aims to simplify and explain these concepts, making them accessible to readers who may not be familiar with the technical details.

What is a Patricia Trie?

A Patricia Trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric) is a space-optimized version of a trie (prefix tree) that is used to store a mapping from keys to values. In the context of the Gray Paper, it is used to organize state components efficiently.

Key Characteristics:

  • Binary Paths: Each key is represented as a sequence of bits (0s and 1s), forming a path from the root to a leaf node.
  • Compression: Common prefixes are stored only once, reducing redundancy.
  • Efficient Lookup: Enables quick retrieval of values based on their keys.

Encoding State Components

The state components are encoded into binary paths using the state-key constructor functions defined in Appendix D.1. These functions map state components to 32-byte sequences, which are then converted into bit sequences to form paths in the trie.

Example Paths:

Here are the paths representing different state components:

  • [1, 0, 0, 0] → Encoded value α
  • [0, 1, 0, 0] → Encoded value φ
  • [1, 1, 0, 0] → Encoded value β
  • [0, 0, 1, 0] → Encoded value γ
  • [1, 0, 1, 0] → Encoded value ψ
  • [0, 1, 1, 0] → Encoded value η
  • [1, 1, 1, 0] → Encoded value ι
  • [0, 0, 0, 1] → Encoded value κ
  • [1, 0, 0, 1] → Encoded value λ
  • [0, 1, 0, 1] → Encoded value ρ
  • [1, 1, 0, 1] → Encoded value τ
  • [0, 0, 1, 1] → Encoded value ξ
  • [1, 0, 1, 1] → Encoded value π
  • [1, 1, 1, 1, 1, 1, 1, 1, s₁¹, s₁², ...] → Encoded account state: c, b, g, m, l, i

Each path consists of bits where:

  • 0 and 1 represent the edges from a node.
  • The sequence of bits leads to a leaf node containing the encoded state component.

Path Compression in Patricia Trie

In a Patricia Trie, each state component is mapped to a 32-byte (256-bit) sequence, meaning each path would typically consist of 256 bits. However, the trie compresses these paths to eliminate unnecessary bits when only a single path exists.

For example, the full path that leads to α might be [1, 0, 0, 0, 0, 0, 0, 0, ...], but the trie structure is compressed. Since there are no other branches, the trie doesn't traverse all the way through the remaining 0s. Instead, it terminates at the first few significant bits, which in this case is [1, 0, 0, 0].

This compression mechanism makes the Patricia Trie more space-efficient and faster to traverse while retaining all the necessary information to map the path to its corresponding state component. The same applies to all other paths—only the necessary portion of the bit sequence is stored.

Visualizing the Patricia Trie

To better understand the structure, you can visualize the Patricia Trie using a graph. Each node represents a state in the traversal, and edges represent the bits (0 or 1). The leaves contain the encoded values.

Merklization Process

The Merklization process involves computing a root hash from the leaf values, similar to how a regular Merkle tree works. However, in the Patricia Trie, there are special considerations for encoding leaves and branches.

Leaf Encoding

When encoding leaves:

  • bits(k)...248: Represents the first 248 bits of bits(k), where bits(k) are the bits of the path encoding C defined in Appendix D.1. This ensures that the path information is included in the encoding.
  • bits(E₁(|v|)): The bit representation of the byte encoding the length of the value |v|.
  • bits(E₁(|v|))₀₋₆: The first 6 bits of bits(E₁(|v|)).
  • Padding: The bit array is padded with zeros to reach 512 bits.

Steps:

  • Extract the First 248 Bits: Take the first 248 bits of the path encoding to include in the leaf encoding.
  • Encode the Value Length: Use E₁(|v|) to get the byte representation of the value's length.
  • Take the First 6 Bits: From bits(E₁(|v|)), take the first 6 bits.
  • Concatenate: Combine the bits from steps 1 and 3.
  • Padding: Pad the combined bits with zeros to reach 512 bits.
  • Reverse Bits: Reverse the entire bit array (interpreted as bytes in little-endian order).
  • Hash: Compute the hash of the reversed bit array to get the leaf hash.

Branch Encoding

For branches:

  • First Bit Replacement: For the left side of a branch, the first bit is replaced with 0.
  • Padding and Reversal: Similar to leaves, the bit array is padded to 512 bits and reversed in little-endian order before hashing.

Steps:

  • Modify the First Bit: Replace the first bit of the left side with 0.
  • Padding: Pad the bit array to 512 bits with zeros.
  • Reverse Bits: Reverse the bit array as bytes in little-endian order.
  • Hash: Compute the hash to get the branch hash.

Handling Missing Neighbors

When computing the hash for a node that doesn't have a neighbor (i.e., no sibling node):

  • Zero Hash Value: The missing neighbor is assumed to have a hash value of zero.
  • Parent Hash Computation: The hash of the parent node is computed using the existing child hash and the zero hash.

Computing the Root Hash

By applying the encoding and hashing procedures to all leaves and branches, you can compute the root hash of the Patricia Trie. This root hash serves as a cryptographic commitment to the entire system state, allowing for efficient and secure verification of any state component.

Storing the Patricia Trie in a Database

Given that the Patricia Trie representing the entire system state is too large to fit in memory, most implementations utilize a database to store the nodes. This allows the structure to scale efficiently, especially as the state grows over time.

Database Storage for Nodes

The approach to storing nodes in the database is flexible, but most implementations follow a common pattern:

  • Node Hash as Key: Each node in the Patricia Trie is stored in the database, where the hash of the node serves as the key.
  • Child Hashes as Value: The value stored for each node typically includes the hash of the left and right child nodes. This allows for efficient reconstruction of the tree when needed without storing the full structure in memory.

By storing nodes this way, the Patricia Trie can be incrementally updated without requiring the entire structure to be kept in memory.

Efficient Root Hash Update

One of the most critical aspects of the Patricia Trie implementation is the efficient computation of the updated root hash whenever a leaf value is modified. Recomputing the entire Merkle root from all leaves would be highly inefficient, so instead, the trie is designed to update the root incrementally.

  • Updating from Modified Leaves: When a leaf node is modified (for example, when an account balance changes), only the hashes along the path from that leaf to the root need to be recomputed.
  • Partial Updates: The hash of each parent node along the path is updated based on the new hashes of its children, working upwards to the root. Since this process only affects the path from the updated leaf to the root, it minimizes the number of nodes that need to be recomputed, significantly reducing the computational complexity.

Providing Merkle Proofs for Light Clients

Storing node hashes as keys in the database (rather than storing the full paths as keys) provides another important benefit: it enables the node to generate Merkle proofs for light clients based on previous states.

  • Since the hash of each node (and its children) is stored, the node can generate cryptographic proofs for specific parts of the state.
  • This allows light clients to verify the inclusion (or exclusion) of specific data in previous states, without needing access to the full state tree.

This method of storage ensures that the Patricia Trie remains scalable, efficient, and capable of supporting light clients with cryptographic guarantees, even as the system state evolves over time.


More Posts

An in-depth explanation of Merkle tree structures from Appendix E of the Gray Paper, covering well-balanced and constant-depth Merkle trees. The post also introduces the node operator as an abstraction to highlight similarities between these trees and explains the trace operator, which is used to generate cryptographic proofs. Visual examples are provided to illustrate how input values are divided in well-balanced trees and padded for constant-depth trees, ensuring efficient verification and data integrity in cryptographic systems.

Merkle Tree Structure: Appendix E of the Gray Paper

In this post, we break down the key Merkle tree structures from Appendix E of the Gray Paper, including well-balanced and constant-depth Merkle trees. We'll also explain the trace operator used for cryptographic proofs and explore how these structures ensure data integrity and efficient verification in cryptographic systems.

AI generated abstract image representing the program blob

JAM Virtual Machine: Program Blob Structure

The JAM virtual machine uses a highly structured program blob to execute its programs, where each component of the blob plays a critical role in managing jumps, instructions, and their metadata. In this article, we will break down the JAM program blob structure step by step, demystifying its components and illustrating how they come together to enable dynamic execution within the virtual machine.

llustration of the Safrole algorithm’s process for block creation and fork prevention, showing how validators, tickets, and bandersnatch signatures work in decentralized blockchain systems.

Safrole Algorithm Demystified: JAM Block Creation

The Safrole algorithm plays a critical role in preventing forks in blockchain by managing block creation through a preselected set of validators. Learn how it functions, the key state components, ticket submission process, and how blocks are verified and imported in a decentralized system.