CBlockIndex::nStatus race when pruning #17161

issue maflcko opened this issue on October 16, 2019
  1. maflcko commented at 5:21 PM on October 16, 2019: member

    Disclaimer: I couldn't reproduce this issue in practice on ext4 and ntfs partitions on a x86 CPU, but I think that this issue should be solved in the long-term.

    Basically, CBlockIndex::nStatus is used to hold a bunch of flags (e.g. whether the block data is currently on the hard drive, or whether the block data was verified at some point in the past). Most of this access is from validation, which happens to hold the cs_main lock. However, some of the access (e.g. background index threads, or RPC calls) does not take the cs_main lock, so the access to nStatus is not guarded for them. In theory, this could turn out to be racy in combination with pruning.

    I am not sure how to solve this. Some ideas:

    • Put nStatus under the cs_main lock. --> meh, this will affect validation performance when polling on RPC
    • Make a "read-write-lock" for nStatus, similar to ::mempool.cs. I.e. when writing nStatus, the cs_main lock and the read lock is taken and when reading nStatus, only the read lock is taken. Maybe use a std::shared_mutex?
    • Any other ideas ... !?
  2. maflcko added the label Brainstorming on Oct 16, 2019
  3. maflcko added the label Bug on Oct 16, 2019
  4. maflcko added the label Validation on Oct 16, 2019
  5. maflcko added the label Block storage on Oct 16, 2019
  6. maflcko cross-referenced this on Oct 16, 2019 from issue chain: Remove CBlockIndex::SetNull helper by maflcko
  7. promag commented at 10:04 AM on October 17, 2019: member

    this will affect validation performance when polling on RPC

    Where? From a quick look only getrawtransaction reads nStatus without cs_main (I may be missing other cases). For this case we could do somthing like:

        uint32_t block_status;
        {
            LOCK(cs_main);
            blockindex = LookupBlockIndexAndStatus(blockhash, block_status);
    
  8. maflcko commented at 12:30 PM on October 17, 2019: member

    As pointed out by @ryanofsky in #13903 (comment), reading the status under a lock is not sufficient, as the block might be deleted as soon as the lock is releases.

  9. fanquake referenced this in commit 4daadce36c on Oct 17, 2019
  10. maflcko cross-referenced this on Mar 17, 2021 from issue rpc: reduce LOCK(cs_min) scope in rest_block: ~5 times as many requests per second by martinus
  11. maflcko cross-referenced this on Jun 10, 2021 from issue Potential data race on fHavePruned flag by BradSwain
  12. maflcko cross-referenced this on Jun 15, 2021 from issue Guard `fHavePruned` to avoid potential data race by amadeuszpawlik
  13. PastaPastaPasta referenced this in commit 1d2f2b0cc8 on Jun 27, 2021
  14. PastaPastaPasta referenced this in commit 26822f5e68 on Jun 28, 2021
  15. PastaPastaPasta referenced this in commit 328a7fd7be on Jun 29, 2021
  16. PastaPastaPasta referenced this in commit a9d7580bc7 on Jul 1, 2021
  17. PastaPastaPasta referenced this in commit fae0bfdfa8 on Jul 1, 2021
  18. PastaPastaPasta referenced this in commit d577147f4f on Jul 12, 2021
  19. PastaPastaPasta referenced this in commit bf8935119c on Jul 13, 2021
  20. PastaPastaPasta referenced this in commit b485210d68 on Jul 13, 2021
  21. maflcko cross-referenced this on Sep 17, 2021 from issue Add CBlockIndex lock annotations, guard nStatus/nFile/nDataPos/nUndoPos by cs_main by jonatack
  22. vasild commented at 1:50 PM on November 26, 2021: contributor

    What exactly is the problem with nStatus?

    1. Torn reads? (e.g. somebody reads it while somebody else is writing it, resulting in read returning garbage - a value that was not in memory before the write, nor after the write)

    2. Torn writes? (e.g. two threads write to it concurrently, resulting in garbage being stored in it - a value that was not written by either thread)

    3. The access to nStatus is not synced with access to another variable?

  23. maflcko commented at 1:53 PM on November 26, 2021: member

    All of the above. nStatus is both written to and read from from several threads.

  24. vasild commented at 2:17 PM on November 26, 2021: contributor

    Which are the other variables (for 3.)? Because 1. and 2. alone could be solved by using std::atomic_uint32_t.

  25. maflcko commented at 2:27 PM on November 26, 2021: member

    atomic can't solve this, because HAVE_DATA needs to be valid until right after you read the block, not the status of it. See my previous comment, which links to #13903 (comment)

  26. vasild commented at 3:02 PM on November 26, 2021: contributor

    Are you saying that even #22932 does not solve the problem because even with that PR the code is:

    https://github.com/bitcoin/bitcoin/blob/b1c859abb790a1bbc9c31e3a535e36d1c59085b9/src/node/blockstorage.cpp#L400-L404

    and the block file may be removed from disk after GetBlockPos() completes (under cs_main) but before the if? In this case ReadBlockFromDisk() would return false.

    cc @jonatack

  27. jonatack commented at 12:46 PM on December 6, 2021: contributor

    Thanks, having a look/updating.

  28. laanwj closed this on Jan 27, 2022

  29. fanquake commented at 10:11 AM on January 27, 2022: member

    Not sure this should have been closed given #22932 doesn't say it actually fixed this, and at least one of the review comments said the same.

  30. fanquake reopened this on Jan 27, 2022

  31. andrewtoth cross-referenced this on Oct 14, 2022 from issue rpc/rest/zmq: reduce LOCK(cs_main) scope: ~6 times as many requests per second by andrewtoth
  32. andrewtoth cross-referenced this on Oct 15, 2022 from issue use shared mutex to guard against block files being removed before read by andrewtoth
  33. andrewtoth commented at 12:38 AM on October 15, 2022: contributor

    If all that is left is the ReadBlockFromDisk race, then that could be fixed with just a shared mutex guarding the reading of blocks and the removal of blocks no? This way cs_main is not blocked and blocks can still be read in parallel via shared locks.

  34. maflcko commented at 3:49 PM on December 7, 2022: member

    Closing for now, as this is a theoretical issue that isn't (hopefully) exposed to users.

  35. maflcko closed this on Dec 7, 2022

  36. Fabcien referenced this in commit dcac626d18 on Jan 24, 2023
  37. bitcoin locked this on Dec 7, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-19 06:54 UTC