Merkle Mountain Range (MMR): the case of Herodotus

In this blog, we’ll discuss some of the challenges encountered in building a Merkle Mountain Range (MMR), as observed in our recent review of a project. These insights may also apply to other Merkle-related algorithms.

Introduction to Merkle Mountain Ranges

A MMR (Merkle Mountain Range) is a unique data structure in the Merkle family. It resembles a list of perfectly balanced binary trees, with the tallest trees on the left and decreasing in height towards the right. The top node of each tree is referred to as a peak.

MMR is strictly append-only. Each node, except for leaf nodes, has two children. New parent nodes are created as soon as a pair of children exists. The MMR’s root is determined by its size and the “bagging” of all peaks.

 // root = H(size, H(P0, H(P1, H(P2, ... H(Pn-1, Pn)))))
fn compute_root(mmr_size: felt252, peaks: Peaks) -> felt252 {
    let bag = peaks.bag();
        // bag = PoseidonHasher::hash_double(peaks[0], 
        //           PoseidonHasher::hash_double(peaks[1], ... 
        //               PoseidonHasher::hash_double(peaks[n-1], peaks[n])
    //           )
        //       )
    PoseidonHasher::hash_double(mmr_size, bag)
}

Herodotus: Storage Proofs Using MMR

We recently reviewed Herodotus, a storage-proof system that leverages MMR’s efficiency to store blockhashes and block timestamps. Herodotus implements a stateless MMR structure, storing only the final root and its size. The storage proof operates as follows:

An MMR proof verifies the inclusion of a historical blockhash in the MMR.
A Merkle Patricia Trie proof confirms the existence of an account or a slot in the state trie for a given blockhash.

This method allows for permissionless proof of any historical state. For more details on Ethereum state and storage proofs, please see the further reading section.

Inserting a blockhash in Herodotus’s MMR also requires submitting all peaks. It is essential that the hash of the MMR size, along with the bagging results of the submitted peaks, matches the stored MMR root.

 fn append(ref self: MMR, hash: felt252, peaks: Peaks) -> Result<(felt252, Peaks), felt252> {
        if !peaks.valid(self.last_pos, self.root) {
            return Result::Err('Invalid peaks');
        }
    ...
}

fn valid(self: Peaks, last_pos: usize, root: felt252) -> bool {
        let computed_root = compute_root(last_pos.into(), self);
        computed_root == root
}

Insufficient Peaks Validation Breaks the MMR Updates

Hashing the submitted peaks with the stored size and comparing them with the stored root seems to ensure the integrity of user-submitted peaks. However, this check is inadequate. Due to the unique hash construction method, the intermediate result of bagging the right-side peaks could also serve as a valid peak, circumventing the check.

An honest user submits the actual peaks (N7, N8) when inserting N9.
A malicious user could submit Hash(N7, N8) as a peak, which also passes the check.

This vulnerability allows abnormal growth of the MMR and incorrect updating of the root. Consequently, legitimate Merkle proofs may fail against the manipulated MMR.

Additionally, this flaw could enable the verification of an intermediate node. Given that N10 is expected to be the right tree’s peak, one could falsely verify the existence of X as a leaf. This is particularly dangerous as it treats an intermediate node as a leaf, potentially misusing its large hash value as a legitimate data point. For more on this attack, please refer to our audit report.

How Can Misuse and Manipulation of MMR be mitigated?

Herodotus has addressed this vulnerability by explicitly verifying the peak hash against the MMR root and the peak count against the MMR size. Additionally, the tree depth is now used to verify and restrict proof length. These measures effectively limit potential MMR manipulation and misuse.

It is crucial to meticulously verify all inputs of a Merkle proof. Any unverified attribute could pose a threat to the project. Consider the following as a starting point for further examination:

Is there a clear distinction between leaf and intermediate nodes? Can an intermediate node be verified, and if so, could its value be exploited?
What data is stored in the leaf nodes? Is it hashed? If not, could malicious data (e.g., a subtree) be inserted into a leaf?
Are the proof inputs, such as proof length and node index, thoroughly checked? Is each proof unique, and can data be illegitimately injected or altered in the proof?

Further reading:

#Merkle Mountain Range

#Herodotus storage Proof

Herodotus security audit reports

Herodotus on Starknet

Herodotus Cairo Libs