PR #26326 | Notion

https://github.com/bitcoin/bitcoin/pull/26326

Motivation

Currently, we are locking cs_main when processing network messages that require I/O to retrieve block data from disk. This is not really necessary provided the data we are trying to pull is not pruned before we do.

Thoughts/Questions

The approach followed by this PR is fairly straight-forward: Do lock cs_main to access critical sections until we have found the pointer to the data we want to pull from disk, and release after that (but before doing I/O). There are two ways in which this could fail:

If external software deleted the data we are trying to access, but that is out of scope
If we prune the data after referencing it but before reading from disk, given we are releasing cs_main. This cannot really happen, given the depth of historical block we are willing to serve to any peer (MAX_BLOCKTXN_DEPTH=10) is significantly smaller than the minimum amount of blocks to keep if we are running in pruned mode (MIN_BLOCKS_TO_KEEP=288)

The approach then is to load all the references that we need with cs_main locked, and then work with them after releasing it. I was a bit worried that by doing so, we may be trying to dereference a nullptr. However, that seems not to be possible either, given we are creating CBlockIndex* and those are kept even if the block is pruned, so accessing attributes or method of them is safe even if cs_main is not locked.

This also changes the behavior of the node with respect of what to do if data is not available when we try to fetch it from disk. Before this PR, if a block is requested, we find its reference but then fail to fetch it, the node would crash (given we are assuming this is not possible, and asserting so). This is sort of an invariant, given cs_main is locked before the changes on this PR: the block cannot appear to be available and not found on fetching (modulo other software messing around with the data). After this PR, this could theoretically happen, but the odds are low, therefore, instead of asserting, the software now just logs and error and returns. This mean that the software will continue running and the peer will eventually disconnect us for not responding to the request (as opposed to the connection being dropped because we crashed).

This poses a more general question regarding what to do in this situation: if a block is not found, or we are unwilling to provide it, the current behavior is radio-silence. A better approach would simply be responding with a NotFound, as we do for transaction data, and avoid leaving our peer handing until the request times-out and they disconnect. This seems to be not as straight-forward as just changing the functionality, given other software may be expecting us to not respond (this is, ultimately, a change in the P2P protocol), so in order to do this a BIP may be needed, and we should need a new protocol version for it (@sipa pointed this out)

Review

net: don't lock cs_main while reading blocks in net processing by andrewtoth · Pull Request #26326 · bitcoin/bitcoin