IBD Crash with v0.10rc3: checkqueue.h:183: Assertion `pqueue->nTotal == pqueue->nIdle' failed. #5703

issue wtogami opened this issue on January 24, 2015
  1. wtogami commented at 3:47 AM on January 24, 2015: contributor

    Fedora 21 x86_64 with Bitcoin Core v0.10rc3 linux64 gitian build. 17 minutes into testnet IBD it crashed with this assertion failure.

    [warren@odin bin]$ bitcoin-qt -testnet
    bitcoin-qt: checkqueue.h:183: CCheckQueueControl<T>::CCheckQueueControl(CCheckQueue<T>*) [with T = CScriptCheck]: Assertion `pqueue->nTotal == pqueue->nIdle' failed.
    Aborted (core dumped)
    

    debug.log ends with:

    2015-01-23 22:52:57 UpdateTip: new best=000000000019eeffa3a51b555d4cefb6bcc373665bd3498b98b4d6587dde57f1  height=178433  log2_work=58.029322  tx=1090333  date=2014-02-01 19:42:07 progress=0.910668  cache=109722
    

    Two subsequent IBD's succeeded without crash.

  2. sdaftuar commented at 6:01 PM on January 28, 2015: member

    In the CCheckQueueControl constructor, we're not acquiring the lock on the pqueue before checking its state:

    CCheckQueueControl(CCheckQueue<T>* pqueueIn) : pqueue(pqueueIn), fDone(false)
        {
            // passed queue is supposed to be unused, or NULL
            if (pqueue != NULL) {
                assert(pqueue->nTotal == pqueue->nIdle);
                assert(pqueue->nTodo == 0);
                assert(pqueue->fAllOk == true);
            }
        }
    

    Consequently I think there could be a race condition where these values could look inconsistent (nIdle is updated each time in the thread's Loop()).

    I was able to reliably reproduce this behavior by inserting a usleep(500000); in CCheckQueue::Loop, just before the call to nIdle++;, and then starting up with a reindex.

    To fix this I think we just need to acquire the pqueue's lock before checking these variables; perhaps add an IsIdle() member function to CCheckQueue that acquires the lock and then does these checks, and then assert that function returns true in the CCheckQueueControl constructor?

  3. sipa commented at 7:20 PM on January 28, 2015: member

    @sdaftuar Nice find, and doing that shouldn't hurt. But I'm not entirely sure how this is possible in the first place - there should not be two threads holding a CCheckQueueControl object simultaneous (it's only created inside ConnectBlock, while holding cs_main the whole time).

  4. sdaftuar cross-referenced this on Jan 28, 2015 from issue Acquire CCheckQueue's lock to avoid race condition by sdaftuar
  5. laanwj closed this on Feb 6, 2015

  6. dr-mr-space-monkey cross-referenced this on Mar 27, 2021 from issue Race condition in CCheckQueueControl by dr-mr-space-monkey
  7. bitcoin locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:55 UTC