JAM Virtual Machine: Program Blob Structure
Ivan Subotic
Understanding the JAM Program Blob Structure
The JAM virtual machine uses a highly structured program blob to execute its programs, where each component of the blob plays a critical role in managing jumps, instructions, and their metadata. While the underlying formula defining this structure may appear complex at first glance, a closer look reveals a logical and efficient encoding system that allows the JAM virtual machine to interpret and execute bytecode programs. In this article, we will break down the JAM program blob structure step by step, demystifying its components and illustrating how they come together to enable dynamic execution within the virtual machine.
In Appendix A.1. of the JAM graypaper, the program blob structure is defined by a formula that may seem complex at first. However, when broken down, the structure becomes more logical and approachable. The definition looks something like this:
Let’s break this down step by step:
1. The Program Structure
The entire program blob is denoted as . This represents the collection of encoded data that constitutes the program in the JAM virtual machine.
2. The Encodings
The symbol represents an encoding function that applies to different elements of the program. The encoding function varies depending on the element being encoded, as indicated by the subscripts or different arguments.
3. Component Encodings
The program consists of a sequence of encoded components, separated by the symbol . This indicates a concatenation of encodings that make up the overall program blob.
- : This represents the variable integer encoding of the size of the dynamic jump table , as described in Appendix C.1.2. Here, refers to the length, i.e., number of elements inside the dynamic jump table .
- : This is the one-byte little-endian encoding of as described in Appendix C.1.2. The value is later used in the term , denoting the used encoding.
- : Similarly, this represents the variable integer encoding of the size of the instruction data , where , refers to the length of , i.e., how many bytes the encoded ,consists of.
- : This indicates that the encoding is applied to each value inside the dynamic jump table .
- This is the actual encoding of the instruction data , which includes the opcodes and their argument values. This is the core part of the program that will be executed by the JAM virtual machine.
- : Here, represents the opcode bitmask. The bitmask is a crucial part of the program because each set bit in indicates the position within where an instruction starts, i.e., the location of each opcode. The condition ensures that the bitmask has the same size as the instruction data, allowing precise identification of where instructions begin within .
Example Breakdown of the JAM Program Blob
Let’s walk through an example to better understand how the components of the JAM program blob fit together. The structure of the blob, as outlined in the formula, includes the dynamic jump table, the instruction data, and the bit-mask, among other elements. Here's a visual representation:
Sections:
1. Jump Table Size (): 1 byte
2. Jump Table Entry Size (): 1 byte
3. Instruction Data Length (): 1 byte
4. Jump Table Entries (): 1 entry
5. Instruction Data (): 15 bytes
6. Opcode Bitmask (): 2 bytes
From this example, let's break down the components step by step:
1. Jump Table Size () and Encoding ()
- In the first part of the blob, we encode the size of the dynamic jump table, denoted as |j|. This size is encoded using variable integer encoding, as described in Appendix C.1.2.
- In this example, the jump table has a size of 1 We encode this size, and it takes 1 byte of space.
2. Jump Table Entry Size with One-Byte Encoding of
- Next, we encounter the one-byte encoding of . This value represents the number of bytes used for encoding each value the dynamic jump table. In our example it has the value of 1.
- Here, is also 1 byte long, as shown by the notation.
3. Instruction Data Length () and Encoding ()
- Moving forward, we encode the size of the instruction data () using the same variable integer encoding method. This provides a compact representation of the instruction data's size.
- In this case, the instruction data has a length of 15 bytes, and the encoded length uses 1 byte.
4. Jump Table Entries ()
- The next component is the dynamic jump table itself, with each entry encoded using the encoding .
- In the example, the jump table has only one entry: 49. Values are encoded according to the specified encoding method, which in this case is as the value for is 1.
5. Instruction Data ()
- The instruction data, c, contains the actual opcodes and argument values that make up the program's executable code. This is where the core logic of the program resides.
- In the example, the instruction data is simply represented, with each position in c being indexed for reference later by the bitmask.
6. Opcode Bitmask () and Encoding ()
- The final component is the bitmask, denoted as k. This bitmask serves as a map to indicate where instructions start within the instruction data c.
- In the bitmask, each set bit (i.e., a bit with value 1) corresponds to a position in c where an opcode starts. In this example, the bitmask is represented as a series of bits 1000 0011 0000 1001 which we can
Here is also a representation of the opcode bitmask and its meaning:
Each bit corresponds to whether an instruction starts at that position in the instruction data (c):
- 1: Indicates the start of an instruction.
- 0: No instruction starts at this position, i.e., argument data.
Since we know that the instruction data is 15 bytes long, we only look at the first 15 bitmask bits (starting from right).
Putting It All Together
The jump table helps manage dynamic jumps, allowing for flexibility in the program's execution flow. The instruction data contains the program's instructions, while the opcode bitmask ensures that we know exactly where each instruction starts.
The size of is always equal to the size of , ensuring that each bit can map to a specific byte in the instruction data.
Summary
This example illustrates how the JAM program blob structure is constructed and how each component — whether it's the jump table, instruction data, or opcode bitmask — fits into the overall structure. By using compact encodings and a clear organization, the JAM virtual machine can efficiently interpret and execute the program.
More Posts
Patricia Trie: Key to Efficient State Management in JAM Protocol
In large-scale blockchain systems, the Patricia Trie efficiently organizes and stores state components in a compressed binary structure. This article explains how the trie leverages database storage for scalability, enabling efficient updates to the root hash and supporting Merkle proofs for light clients. By storing node hashes and updating only the necessary portions of the trie, implementers can maintain optimal performance while providing cryptographic proofs of past states.
Merkle Tree Structure: Appendix E of the Gray Paper
In this post, we break down the key Merkle tree structures from Appendix E of the Gray Paper, including well-balanced and constant-depth Merkle trees. We'll also explain the trace operator used for cryptographic proofs and explore how these structures ensure data integrity and efficient verification in cryptographic systems.
Safrole Algorithm Demystified: JAM Block Creation
The Safrole algorithm plays a critical role in preventing forks in blockchain by managing block creation through a preselected set of validators. Learn how it functions, the key state components, ticket submission process, and how blocks are verified and imported in a decentralized system.