Can you clearly understand the basic ideas of BitVM, Bitcoin scripts, and SegWit from this article? The article is written by GeekWeb3 Nickqiao & Faust & Shew Wang, with Bitlayer research team as advisors.
Summary
Text
MATT and Commitment: The Basic Idea of BitVM
What is Bitcoin Script
How Bitcoin Script Triggers
SegWit and Witness
Recently, Delphi Digital released a Bitcoin Layer 2 related technical research report titled “The Dawn of Bitcoin Programmability: Paving the Way for Rollups”, which systematically outlined core concepts related to Bitcoin Rollup, such as the BitVM family, OP_CAT and Covenant restrictions, Bitcoin ecosystem DA layer, bridges, and the four major Bitcoin layers that adopt BitVM: Bitlayer, Citrea, Yona, and Bob.
While the report provided a general overview of Bitcoin Layer 2 technology, it lacked detailed descriptions, making it difficult to understand. Geek web3 has conducted an in-depth exploration based on the Delphi report, attempting to help more people understand technologies like BitVM.
We will collaborate with the Bitlayer research team and the BitVM Chinese community to launch a series of columns called “Approaching BTC”, focusing on key topics such as BitVM, OP_CAT, and Bitcoin cross-chain bridges for education, aiming to demystify Bitcoin Layer 2 related technologies and pave the way for more enthusiasts.
A few months ago, ZeroSync’s Robin Linus released an article titled “BitVM: Compute Anything on Bitcoin”, introducing the concept of BitVM and driving the advancement of Bitcoin Layer 2 technology. This is one of the most revolutionary innovations in the Bitcoin ecosystem, attracting projects like Bitlayer, Citrea, BOB, and others, injecting vitality into the entire market.
Subsequently, more researches have improved BitVM, releasing different iterations such as BitVM1, BitVM2, BitVMX, and BitSNARK. The overall situation is as shown in the figure:
Robin Linus’s early BitVM implementation white paper, based on fictional logical gate circuit BitVM implementation, called BitVM0;
Robin Linus informally introduced the BitVM solution based on fictional CPU (referred to as BitVM1), similar to Optimism’s fraud-proof system Cannon, simulating the effect of a general CPU using Bitcoin Script off-chain.
Robin Linus also proposed BitVM2, a Permissionless single-step non-interactive fraud proof protocol.
Members of Rootstock Labs and Fairgate Labs released the BitVMX white paper, similar to BitVM1, hoping to simulate the effect of a general CPU using Bitcoin Script off-chain.
The development ecosystem of BitVM is becoming clearer, with iterative improvements in peripheral tools visible to the naked eye. Compared to last year, the BitVM ecosystem has shifted from being “castles in the air” to being “vaguely visible”, attracting more developers and VCs to enter the Bitcoin ecosystem.
However, for most people, understanding BitVM and related Bitcoin Layer 2 technical terms is not easy, as it requires a systematic understanding of the surrounding basic knowledge, especially Bitcoin scripts and Taproot background knowledge. Existing reference materials online are either too long-winded or not thorough enough in explanation, making it difficult to grasp. We strive to address these issues, using clear language to help more people understand the peripheral knowledge of Bitcoin Layer 2 and establish a systematic understanding of the BitVM system.
First and foremost, we emphasize that the basic concept of BitVM is MATT, which stands for Merkleize All The Things, mainly using a Merkle Tree data storage structure to display complex program execution processes and attempt to verify fraud proofs natively on Bitcoin.
Although MATT can express a complex program and its data processing traces, it does not directly publish this data on the BTC chain because the overall scale of this data is very large. The MATT solution only stores data in a Merkle tree off-chain and releases only the topmost digest (Merkle Root) of the Merkle tree on-chain. This Merkle tree mainly includes three core contents:
Smart contract script code
Data required by the contract
Traces left in contract execution (changes recorded in memory, CPU registers when smart contracts run in virtual machines like EVM)
Under the MATT solution, only the extremely small Merkle Root is stored on-chain, while the complete data set contained in the Merkle Tree is stored off-chain, utilizing a concept called “commitment”. Let’s explain what a “commitment” is.
A commitment is similar to a simplified declaration, which can be understood as a “fingerprint” obtained by compressing a large batch of data. Generally, those who publish a “commitment” on-chain claim that certain data stored off-chain is accurate, and this off-chain data corresponds to a simplified declaration, which is the “commitment”.
At times, the hash of data can serve as the “commitment” to the data itself, while other commitment schemes include KZG commitments or Merkle Trees. In the fraud-proof protocols commonly used in Layer 2, data publishers will publish the complete data set off-chain and release a commitment to the data set on-chain. If someone finds invalid data in the off-chain data set, they can challenge the commitment of the data on-chain.
Through commitments, Layer 2 can compress a large amount of data processing, only publishing their “commitment” on the Bitcoin chain. Of course, it is also necessary to ensure that the complete off-chain data set can be observed externally.
Currently, several major BitVM solutions such as BitVM0, BitVM1, BitVM2, and BitVMX all use similar abstract structures:
Program decomposition and commitment: First decompose a complex program into a large number of basic opcodes (compile), then record the traces generated when executing these opcodes (simply put, it is the changes in state when a program runs in CPU and memory, called Trace). Afterwards, organize all the data, including the Trace and opcodes, into a dataset, and then generate a commitment to that dataset.
Specific commitment schemes can take various forms, such as Merkle Trees, PIOPs (various ZK algorithms), hash functions.
Asset pledging and presigning: Data publishers and verifiers need to lock a certain amount of assets on-chain in the form of presignatures, with specific conditions. These conditions will trigger targeted events that may occur in the future, and verifiers can submit evidence to claim the assets of data publishers if they act maliciously.
Data and commitment publication: Data publishers release commitments on-chain, while publishing the complete data set off-chain. Verifiers retrieve the data set and check for any errors. Each part of the off-chain data set is related to the commitment on-chain.
Challenge and punishment: Once a verifier discovers that a data publisher has provided incorrect data, they will take that portion of data to the chain for direct verification (the data must be cut very finely), which is the logic of fraud proof. If the verification result shows that the data publisher did provide invalid data off-chain, their assets will be challenged and taken by their verifier.
In summary, data publisher Alice publicly reveals all the traces of Layer 2 transaction execution under the chain, releasing the corresponding commitment on-chain. If you need to prove that certain data is incorrect, you first need to prove to the Bitcoin node that this data is related to the commitment on-chain, proving that this data is publicly disclosed by Alice, and then let the Bitcoin node confirm that this data is incorrect.
Now we have a general understanding of the overall idea of BitVM, and all BitVM variants are basically based on the above normalization. Next, let’s start learning and understanding some important technologies used in the above process, starting with the basics of Bitcoin scripts, Taproot, and presigning.
Bitcoin-related knowledge is more difficult to understand than Ethereum’s, as even the most basic transfer actions involve a series of concepts, including UTXO (Unspent Transaction Output), locking scripts (ScriptPubKey), and unlocking scripts (ScriptSig). Let’s explain these main concepts first.The “Storage Location” of TXO data. It is important to note that Bitcoin and Ethereum are fundamentally different. Ethereum provides two types of accounts, contract accounts and EOA accounts, to store data. Asset balances, recorded as digital, are stored under contract accounts or EOA account names, uniformly placed in a database called the “world state.” When transferring, modifications are made directly from the “world state” to specific accounts in order to locate the storage location of the data;
Bitcoin does not have a “world state” design. Asset data is stored in past blocks (specifically in unspent UTXO data, individually stored in the output of each transaction).
If you want to unlock a specific UTXO, you need to specify which transaction’s output the UTXO information is stored in, provide the ID of that transaction (i.e., its hash), and let the Bitcoin node search through the historical records. To check the Bitcoin balance of a specific address, you need to traverse all blocks from the beginning and identify the unspent UTXOs associated with that address.
When using a Bitcoin wallet, you can quickly check the Bitcoin balance of a specific address. This is often due to the wallet service itself scanning blocks and indexing all addresses, making it easier for us to query.
(When generating a transaction to transfer your UTXO to someone else, you need to mark the position of that UTXO in the Bitcoin historical records based on the transaction hash/ID to which it belongs).
Interestingly, the outcomes of Bitcoin transactions are calculated off-chain. When a user generates a transaction on a local device, they must directly establish all inputs and outputs, essentially completing the calculation of the transaction’s output results. The transaction is broadcast to the Bitcoin network and is only added to the chain after being verified by nodes. This “off-chain calculation – on-chain verification” model is entirely different from Ethereum, where you only need to provide transaction input arguments, and the transaction results are calculated and output by Ethereum nodes.
In addition, the locking script of UTXO can be customized. You can set the UTXO as “unlockable by the owner of a specific Bitcoin address,” where the owner of that address needs to provide a digital signature and public key (P2PKH). In Pay-to-Script-Hash (P2SH) transaction types, you can add a script hash to the locking script of the UTXO. Anyone who submits the corresponding script hash and satisfies the conditions predefined in that script hash can unlock the UTXO. The Taproot script relied upon by BitVM utilizes features similar to P2SH.
Here, we will first use P2PKH as an example to introduce the triggering mechanism of Bitcoin scripts. Only by understanding its triggering mechanism can we comprehend the more complex Taproot and BitVM. P2PKH, also known as “Pay to Public Key Hash,” sets a public key hash in the locking script of UTXO, requiring the submission of the corresponding public key for unlocking, aligning with the basic concept of Bitcoin transactions.
In summary, in the P2PKH scheme, the transaction initiator submits a unlocking script containing the public key and digital signature. The public key must match the public key hash specified in the locking script, and the digital signature of the transaction must be correct. Only when these conditions are met can the UTXO be successfully unlocked.
(This image is dynamic: Illustration of Bitcoin unlocking script under the P2PKH scheme
Source: https://learnmeabitcoin.com/technical/script)
Of course, the Bitcoin network supports various transaction types, not limited to Pay to public key/public key hash. There is also P2SH (Pay to Script hash) and others, depending on how the locking script of the UTXO was customized when established.
It is important to note that in the P2SH scheme, the locking script can predefine a Script Hash, and the unlocking script must fully submit the content corresponding to the Script Hash. Bitcoin nodes can execute this script, enabling the implementation of multi-signature wallets on the Bitcoin chain.
In the P2SH scheme, UTXO creators need to ensure that the future unlockers of the UTXO are aware of the content corresponding to the Script Hash. As long as both parties know the content of this script, more complex business logic beyond multi-signature can be achieved.
It should be noted that the Bitcoin chain (blocks) does not directly record the association between UTXOs and addresses. It only records which public key hash / script hash can unlock the UTXO. However, based on the public key hash / script hash, the corresponding address (the seemingly random string displayed in the wallet interface) can be quickly computed.
The reason we can see the Bitcoin balance under a specific address in block explorers and wallet interfaces is that these platforms analyze this data, scanning all blocks and calculating the corresponding “address” based on the specified public key hash / script hash in the locking script, then displaying the Bitcoin amount under that address.
With an understanding of P2SH, we are now closer to Taproot, which is relied upon by BitVM. But before that, we need to understand an important concept: Witness and Segregated Witness.
Recapping the discussion on unlocking scripts, locking scripts, and the process of unlocking UTXOs, we find a problem: the transaction’s digital signature is included in the unlocking script, making it impossible to override the unlocking script when generating the signature. The digital signature can only cover parts outside the unlocking script, meaning it can only be associated with the core part of the transaction data, unable to fully override the transaction data.
As a result, even if the unlocking script of a transaction is tampered with by an intermediary, it will not affect the verification result. For example, Bitcoin nodes or mining pools can insert additional data into the unlocking script of a transaction, causing subtle changes in the transaction data without affecting the verification process and the transaction outcome. This is known as the transaction malleability issue.
The downside of this is that if you plan to initiate multiple transactions consecutively, with a sequential dependency (e.g., transaction 3 referencing the output of transaction 2, transaction 2 referencing the output of transaction 1), the later transactions must reference the ID (hash) of the previous transactions. Mining pools or Bitcoin nodes, as intermediaries, can adjust the content of the unlocking script, causing the on-chain hash of the transaction to differ from what you expect, rendering your pre-established series of sequentially linked transactions ineffective.
In practice, in scenarios like DLC bridges and BitVM2, batches of transactions with sequential dependencies are constructed, making the aforementioned situation fairly common.
Simply put, the transaction malleability issue arises because the ID/hash of a transaction includes the data from the unlocking script when calculated, and Bitcoin nodes and other intermediaries can adjust the content of the unlocking script, leading to discrepancies between the calculated transaction ID and what the user expects. This is essentially a historical burden left from Bitcoin’s early design.
The subsequent introduction of Segregated Witness (SegWit) upgrade is essentially a complete decoupling of the transaction ID and the unlocking script, eliminating the need to include unlocking script data when calculating the transaction hash. UTXO locking scripts that follow the SegWit upgrade include an “OP_0” opcode set as a marker at the beginning, with the corresponding unlocking script renamed from SigScript to Witness.
Following the rules of Segregated Witness, the transaction malleability issue is effectively resolved. You no longer need to worry about the transaction data you send to Bitcoin nodes being tampered with. While P2WSH functionality is not fundamentally different from P2SH discussed earlier, where you can predefine a script hash in the UTXO locking script and have the submitter of the unlocking script provide the corresponding script content on-chain for execution.
However, if the script content you want to implement is particularly large, containing a significant amount of code, it may not be possible to submit the complete script to the Bitcoin chain through conventional means (each block has size limitations). In such cases, Taproot comes into play, streamlining the on-chain script content and serving as the basis for the complex solutions built by BitVM.
(These images are from:
1. Bitcoin unlocking script under the P2PKH scheme
2. Dynamic illustration of Bitcoin unlocking script under the P2PKH scheme
3. Bitcoin unlocking script under the P2PKH scheme
4. Chart showing the relationship between the transaction hash and unlocking script in Bitcoin transactions
5. Diagram illustrating the Witness structure in Bitcoin transactions)
Related Reports
Overview of new trends in the Bitcoin ecosystem: Ordinal, Atomical, bitVM, Lightning Network
What is a cross-chain bridge? Principles and transaction methods, risk analysis, recommended query tools
Combining Bitcoin security with Ethereum smart contracts, the technical features and ecosystem analysis of “BOB”