PR #29284 | Notion

https://github.com/bitcoin/bitcoin/pull/29284

Background

Blocks are stored by our node in a tree-like structure with the genesis block at its source (or root) and potentially many children per height. At any given point in time only a single block per height is valid, and the valid chain of blocks (the so called blockchain) is the chain of connected blocks from the best known tip to the root. This chain is also known in the codebase as active chain. If at a given point in time two or more competing blocks are received at the tip of the chain 1️⃣, the rules to decide which one to connect to the active chain is to pick the one that was received first (as in the full block data, not just the header).

1️⃣ Notice that by competing blocks at the tip we mean blocks that build on a chain of only valid blocks and which share the same amount of cumulative work. This most usually refers to forks on depth one, but can potentially be of arbitrary length.

Motivation

The current tiebreaker for multiple valid blocks received at the tip of our tree is designed to prevent malicious miners from withholding block data, that is, a best tip candidate will be picked only if the full block data has been received by our node. Otherwise, miners would be incentivized to withhold block data and broadcast it only when a competing miner finds a block at the same height, knowing that they will win the tiebreaker (given they were first to spread their header). Honest miners would only be able to mine empty blocks on top of blocks they only know the header of, given some transactions may potentially conflict with the ones in the withheld block, making their block invalid.

However, turns out our current solution does not completely fix the problem, it only offsets it by one block. A dishonest miner can secretly mine two blocks on top of the current tip (A <- B <- C), broadcast the header of the first (B), and the full data for the second (C). Then, wait for a fork to be found (A <- B' <- C') and broadcast the missing data from B. Given our node would have received C before C', on receiving B, the A <- B <- C chain will be activated.

This is interesting because up until B is received, C is not even part of the active chain, and even though the A <- B' <- C' chain may have been received first (as in the whole chain) and potentially activated, upon receiving B this would become an orphaned (or inactive) chain.

Approach

We start from the following invariants:

A chain is only activatable if all its blocks are valid and we have all their block data
Two compared chains have the exact amount of work (nChainWork), otherwise wee would already have been able to pick which one is best. This means one of two things:
- Both chains have the same height, or
- The common ancestor for the chains is before the last epoch (difficulty readjustment)

Our approach to decide which chain is best works by finding which chain was activatable first. We can do that by finding what of the chains has the smallest maximum nSequenceId 2️⃣, since that means that its tip was received first. Notice that, given our nChainWork invariant, it cannot be the case that a longer chain has a smaller maximum nSequenceId than a shorter one if the former’s tip was not received later than the latter’s, otherwise, the chain work would be different and these checks wouldn’t even be performed.

In oder to decide which of the chains has the smallest maximum sequence number, we need to check all the blocks on each chain down to their common ancestor. To do so, we first walk them back to the same height if needed (notice this only applies to forks that spawn across epochs, given the difficulty adjustment may differ). For each block we traverse back, we keep track of what the maximum nSequenceId is for both chains. Once we have reached a common height, we keep traversing back both chains at the same time until we reach the common ancestor 3️⃣, at which point the two maximums are compared, and the minimum of them wins.

2️⃣ nSequenceId is a monotonic value that is given to downloaded blocks to decide what block was received first. The value is only assigned when the full data of the block is received.