scheduler_tests (currently disabled) occasionally deadlocks #6540

issue posita opened this issue on August 10, 2015
  1. posita commented at 5:20 PM on August 10, 2015: contributor

    (Largely reproduced from #6171 (comment) et seq.)

    I'm seeing occasional deadlocks in test_bitcoin as recently as v0.11.0, but only with debugging symbols removed (e.g., via eu-strip .../bin/test_bitcoin). They may also happen with a binary whose debugging symbols are intact, but I haven't observed that (yet).

    Here's what I found:

    % eu-strip -f .../bin/test_bitcoin.debug .../bin/test_bitcoin
    % # Run test_bitcoin in another shell and wait for it to hang
    % gdb -s .../bin/test_bitcoin.debug .../bin/test_bitcoin <PID>
    ...
    Reading symbols from .../bin/test_bitcoin...Reading symbols from .../bin/test_bitcoin.debug...done.
    done.
    Attaching to program: .../bin/test_bitcoin, process <PID>
    ...
    (gdb) bt
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007fa0334e308f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007fa0359f8a9b in boost::condition_variable::wait (this=0x7fa036fde358, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007fa034e455ac in boost::thread::join_noexcept (this=this@entry=0x7fa036fd9180) at libs/thread/src/pthread/thread.cpp:312
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007fa0359f93e6 in join (this=0x7fa036fd9180) at /usr/include/boost/thread/detail/thread.hpp:756
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  boost::thread_group::join_all (this=this@entry=0x7ffe608fbf90) at /usr/include/boost/thread/detail/thread_group.hpp:118
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007fa0359f6c7e in scheduler_tests::manythreads::test_method (this=this@entry=0x7ffe608fcc87) at test/scheduler_tests.cpp:109
    [#6](/github-metadata-backup-bitcoin-bitcoin/6/)  0x00007fa0359f70ce in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:43
    [#7](/github-metadata-backup-bitcoin-bitcoin/7/)  0x00007fa035970bf7 in invoke<void (*)()> (this=<optimized out>, f=<optimized out>) at /usr/include/boost/test/utils/callback.hpp:56
    [#8](/github-metadata-backup-bitcoin-bitcoin/8/)  boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=<optimized out>)
        at /usr/include/boost/test/utils/callback.hpp:89
    [#9](/github-metadata-backup-bitcoin-bitcoin/9/)  0x00007fa0349ed8f1 in operator() (this=<optimized out>) at ./boost/test/utils/callback.hpp:118
    [#10](/github-metadata-backup-bitcoin-bitcoin/10/) operator() (this=<optimized out>) at ./boost/test/impl/unit_test_monitor.ipp:41
    [#11](/github-metadata-backup-bitcoin-bitcoin/11/) invoke<boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<> > > (this=<optimized out>, f=...)
        at ./boost/test/utils/callback.hpp:42
    [#12](/github-metadata-backup-bitcoin-bitcoin/12/) boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke (this=<optimized out>) at ./boost/test/utils/callback.hpp:89
    [#13](/github-metadata-backup-bitcoin-bitcoin/13/) 0x00007fa0349c46de in operator() (this=0x7ffe608fd7d0) at ./boost/test/utils/callback.hpp:118
    [#14](/github-metadata-backup-bitcoin-bitcoin/14/) do_invoke<boost::scoped_ptr<boost::detail::translate_exception_base>, boost::unit_test::callback0<int> > (F=..., tr=...)
        at ./boost/test/impl/execution_monitor.ipp:281
    [#15](/github-metadata-backup-bitcoin-bitcoin/15/) boost::execution_monitor::catch_signals (
        this=this@entry=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:885
    [#16](/github-metadata-backup-bitcoin-bitcoin/16/) 0x00007fa0349c4f33 in boost::execution_monitor::execute (
        this=this@entry=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:1211
    [#17](/github-metadata-backup-bitcoin-bitcoin/17/) 0x00007fa0349eda05 in boost::unit_test::unit_test_monitor_t::execute_and_translate (
        this=0x7fa034c2d9a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, tc=...)
        at ./boost/test/impl/unit_test_monitor.ipp:69
    [#18](/github-metadata-backup-bitcoin-bitcoin/18/) 0x00007fa0349d44af in boost::unit_test::framework_impl::visit (this=0x7fa034c2d8c0 <boost::unit_test::(anonymous namespace)::s_frk_impl()::the_inst>,
        tc=...) at ./boost/test/impl/framework.ipp:156
    [#19](/github-metadata-backup-bitcoin-bitcoin/19/) 0x00007fa034a08a33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#20](/github-metadata-backup-bitcoin-bitcoin/20/) 0x00007fa034a08a33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#21](/github-metadata-backup-bitcoin-bitcoin/21/) 0x00007fa0349cf9fa in boost::unit_test::framework::run (id=1, id@entry=4294967295, continue_test=continue_test@entry=true)
        at ./boost/test/impl/framework.ipp:442
    [#22](/github-metadata-backup-bitcoin-bitcoin/22/) 0x00007fa0349eb697 in boost::unit_test::unit_test_main (init_func=<optimized out>, argc=<optimized out>, argv=<optimized out>)
        at ./boost/test/impl/unit_test_main.ipp:185
    [#23](/github-metadata-backup-bitcoin-bitcoin/23/) 0x00007fa03314fb45 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
    [#24](/github-metadata-backup-bitcoin-bitcoin/24/) 0x00007fa035957b7e in _start ()
    (gdb) generate-core-file
    warning: Memory read failed for corefile section, 8192 bytes at 0x7ffe6097a000.
    Saved corefile core.<PID>
    ...
    

    It looks like it's hanging on a join, but I don't know why. See test/scheduler_tests.cpp#L109. A core file (dumped from gdb) is here, if it's worth anything.

  2. posita cross-referenced this on Aug 10, 2015 from issue Fix intermittent issue in scheduler_tests by laanwj
  3. gavinandresen commented at 5:25 PM on August 10, 2015: contributor

    What version of Boost and operating system?

  4. TheBlueMatt commented at 11:18 AM on August 13, 2015: contributor

    Can you "info threads" and "bt" from more than just the one thread?

  5. TheBlueMatt commented at 11:19 AM on August 13, 2015: contributor

    Oh, and can you reproduce with -DDEBUG_LOCKORDER?

  6. dexX7 commented at 2:17 AM on August 14, 2015: contributor

    @posita: I think I observed this/a related issue some weeks ago, too: #6278 (comment)

    Back then I didn't strip the files.

  7. posita commented at 4:02 AM on August 14, 2015: contributor

    What version of Boost and operating system?

    Debian stable ("jessie") w/ Boost 1.55.0.2:

    root@f7c305516d72:/# uname -a
    Linux f7c305516d72 4.0.7-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Wed Jul 15 00:01:41 UTC 2015 x86_64 GNU/Linux
    root@f7c305516d72:/# cat /etc/debian_version
    8.1
    root@f7c305516d72:/# dpkg -l libboost-all-dev
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
    |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
    ||/ Name                             Version               Architecture          Description
    +++-================================-=====================-=====================-=====================================================================
    ii  libboost-all-dev                 1.55.0.2              amd64                 Boost C++ Libraries development files (ALL) (default version)
    
  8. posita commented at 7:20 PM on August 14, 2015: contributor

    Can you "info threads" and "bt" from more than just the one thread? Oh, and can you reproduce with -DDEBUG_LOCKORDER -g?

    Yes, but test_bitcoin did not produce any extra output (if it was supposed to). Below are the stack traces, and how I got them. The core file is here. It looks like scheduler.cpp#L40 is where the other threads are blocked, perhaps on each other?

    Prep

    root@e282d4a2b356:/# cd .../bitcoin
    root@e282d4a2b356:.../bitcoin# git clean -Xdf
    ...
    root@e282d4a2b356:.../bitcoin# git clean -df
    ...
    root@e282d4a2b356:.../bitcoin# git checkout v0.11.0
    ...
    root@e282d4a2b356:.../bitcoin# git branch
    * (detached from v0.11.0)
      master
    root@e282d4a2b356:.../bitcoin# git status --ignored
    HEAD detached at v0.11.0
    nothing to commit, working directory clean
    root@e282d4a2b356:.../bitcoin# ./autogen.sh
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' ./configure --disable-wallet
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' make
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' make check
    ...
    root@e282d4a2b356:.../bitcoin# CFLAGS='-DDEBUG_LOCKORDER -g' CXXFLAGS='-DDEBUG_LOCKORDER -g' CPPFLAGS='-DDEBUG_LOCKORDER -g' make install
    ...
    root@e282d4a2b356:.../bitcoin# cd /usr/local/bin
    root@e282d4a2b356:/usr/local/bin# ls -al test_bitcoin*
    -rwxr-xr-x 1 root staff 61087584 Aug 14 18:57 test_bitcoin
    root@e282d4a2b356:/usr/local/bin# eu-strip -f test_bitcoin.debug test_bitcoin
    root@e282d4a2b356:/usr/local/bin# ls -al test_bitcoin*
    -rwxr-xr-x 1 root staff  9776504 Aug 14 18:57 test_bitcoin
    -rwxr-xr-x 1 root staff 51314088 Aug 14 18:57 test_bitcoin.debug
    root@e282d4a2b356:/usr/local/bin# which test_bitcoin
    /usr/local/bin/test_bitcoin
    root@e282d4a2b356:/usr/local/bin# pgrep test_bitcoin
    root@e282d4a2b356:/usr/local/bin# ( while true ; do test_bitcoin ; done ) &
    ... [around 25 iterations]
    Running 151 test cases...
    [hung, CPU back to idle]
    root@e282d4a2b356:/usr/local/bin# pgrep test_bitcoin
    30886
    

    Debug Info

    root@e282d4a2b356:/usr/local/bin# gdb -s ./test_bitcoin.debug ./test_bitcoin "$( pgrep test_bitcoin )"
    ...
    Reading symbols from ./test_bitcoin...Reading symbols from /usr/local/bin/test_bitcoin.debug...done.
    done.
    Attaching to program: /usr/local/bin/test_bitcoin, process 30886
    ...
    (gdb) info threads
    (gdb) info threads
      Id   Target Id         Frame
      3    Thread 0x7f5b767fc700 (LWP 30939) "test_bitcoin" 0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/x86_64-linux-gnu/libpthread.so.0
      2    Thread 0x7f5b63fff700 (LWP 30943) "test_bitcoin" 0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/x86_64-linux-gnu/libpthread.so.0
    * 1    Thread 0x7f5b80729740 (LWP 30886) "test_bitcoin" 0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 ()
       from /lib/x86_64-linux-gnu/libpthread.so.0
    (gdb) thread apply all bt
    
    Thread 3 (Thread 0x7f5b767fc700 (LWP 30939)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f5b7fd9dc00 in boost::condition_variable::wait (this=0x7ffc92ba4ce0, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f5b8010b163 in CScheduler::serviceQueue (this=0x7ffc92ba4cb0) at scheduler.cpp:40
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f5b7fda6728 in boost::_mfi::mf0<void, CScheduler>::operator() (this=0x7f5b8132c008, p=0x7ffc92ba4cb0)
        at /usr/include/boost/bind/mem_fn_template.hpp:49
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f5b7fda661e in boost::_bi::list1<boost::_bi::value<CScheduler*> >::operator()<boost::_mfi::mf0<void, CScheduler>, boost::_bi::list0> (
        this=0x7f5b8132c018, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f5b7fda64ac in boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > >::operator()
        (this=0x7f5b8132c008) at /usr/include/boost/bind/bind_template.hpp:20
    [#6](/github-metadata-backup-bitcoin-bitcoin/6/)  0x00007f5b7fda6331 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > > >::run (this=0x7f5b8132be50) at /usr/include/boost/thread/detail/thread.hpp:117
    [#7](/github-metadata-backup-bitcoin-bitcoin/7/)  0x00007f5b7f13aaea in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:164
    [#8](/github-metadata-backup-bitcoin-bitcoin/8/)  0x00007f5b7d7d50a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#9](/github-metadata-backup-bitcoin-bitcoin/9/)  0x00007f5b7d50a04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
    
    Thread 2 (Thread 0x7f5b63fff700 (LWP 30943)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f5b7fd9dc00 in boost::condition_variable::wait (this=0x7ffc92ba4ce0, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f5b8010b163 in CScheduler::serviceQueue (this=0x7ffc92ba4cb0) at scheduler.cpp:40
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f5b7fda6728 in boost::_mfi::mf0<void, CScheduler>::operator() (this=0x7f5b81346948, p=0x7ffc92ba4cb0)
        at /usr/include/boost/bind/mem_fn_template.hpp:49
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f5b7fda661e in boost::_bi::list1<boost::_bi::value<CScheduler*> >::operator()<boost::_mfi::mf0<void, CScheduler>, boost::_bi::list0> (
        this=0x7f5b81346958, f=..., a=...) at /usr/include/boost/bind/bind.hpp:253
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f5b7fda64ac in boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > >::operator()
        (this=0x7f5b81346948) at /usr/include/boost/bind/bind_template.hpp:20
    [#6](/github-metadata-backup-bitcoin-bitcoin/6/)  0x00007f5b7fda6331 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, CScheduler>, boost::_bi::list1<boost::_bi::value<CScheduler*> > > >::run (this=0x7f5b81346790) at /usr/include/boost/thread/detail/thread.hpp:117
    [#7](/github-metadata-backup-bitcoin-bitcoin/7/)  0x00007f5b7f13aaea in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at libs/thread/src/pthread/thread.cpp:164
    [#8](/github-metadata-backup-bitcoin-bitcoin/8/)  0x00007f5b7d7d50a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#9](/github-metadata-backup-bitcoin-bitcoin/9/)  0x00007f5b7d50a04d in clone () from /lib/x86_64-linux-gnu/libc.so.6
    
    Thread 1 (Thread 0x7f5b80729740 (LWP 30886)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f5b7d7d908f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f5b7fd9dc00 in boost::condition_variable::wait (this=0x7f5b8132bea8, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f5b7f13b5ac in boost::thread::join_noexcept (this=0x7f5b812e40e0) at libs/thread/src/pthread/thread.cpp:312
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f5b7fd9db50 in boost::thread::join (this=0x7f5b812e40e0) at /usr/include/boost/thread/detail/thread.hpp:756
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f5b7fd9e80d in boost::thread_group::join_all (this=0x7ffc92ba4d70) at /usr/include/boost/thread/detail/thread_group.hpp:118
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f5b7fd9c642 in scheduler_tests::manythreads::test_method (this=0x7ffc92ba5a67) at test/scheduler_tests.cpp:109
    [#6](/github-metadata-backup-bitcoin-bitcoin/6/)  0x00007f5b7fd9b980 in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:43
    [#7](/github-metadata-backup-bitcoin-bitcoin/7/)  0x00007f5b7fc5e563 in boost::unit_test::ut_detail::invoker<boost::unit_test::ut_detail::unused>::invoke<void (*)()> (this=0x7ffc92ba5ad7,
        f=@0x7f5b81274348: 0x7f5b7fd9b95d <scheduler_tests::manythreads_invoker()>) at /usr/include/boost/test/utils/callback.hpp:56
    [#8](/github-metadata-backup-bitcoin-bitcoin/8/)  0x00007f5b7fc5dfd7 in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=0x7f5b81274340)
        at /usr/include/boost/test/utils/callback.hpp:89
    [#9](/github-metadata-backup-bitcoin-bitcoin/9/)  0x00007f5b7ece38f1 in operator() (this=<optimized out>) at ./boost/test/utils/callback.hpp:118
    [#10](/github-metadata-backup-bitcoin-bitcoin/10/) operator() (this=<optimized out>) at ./boost/test/impl/unit_test_monitor.ipp:41
    [#11](/github-metadata-backup-bitcoin-bitcoin/11/) invoke<boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<> > > (this=<optimized out>, f=...)
        at ./boost/test/utils/callback.hpp:42
    [#12](/github-metadata-backup-bitcoin-bitcoin/12/) boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke (this=<optimized out>) at ./boost/test/utils/callback.hpp:89
    [#13](/github-metadata-backup-bitcoin-bitcoin/13/) 0x00007f5b7ecba6de in operator() (this=0x7ffc92ba6610) at ./boost/test/utils/callback.hpp:118
    [#14](/github-metadata-backup-bitcoin-bitcoin/14/) do_invoke<boost::scoped_ptr<boost::detail::translate_exception_base>, boost::unit_test::callback0<int> > (F=..., tr=...)
        at ./boost/test/impl/execution_monitor.ipp:281
    [#15](/github-metadata-backup-bitcoin-bitcoin/15/) boost::execution_monitor::catch_signals (
        this=this@entry=0x7f5b7ef239a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:885
    [#16](/github-metadata-backup-bitcoin-bitcoin/16/) 0x00007f5b7ecbaf33 in boost::execution_monitor::execute (
        this=this@entry=0x7f5b7ef239a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, F=...)
        at ./boost/test/impl/execution_monitor.ipp:1211
    [#17](/github-metadata-backup-bitcoin-bitcoin/17/) 0x00007f5b7ece3a05 in boost::unit_test::unit_test_monitor_t::execute_and_translate (
        this=0x7f5b7ef239a0 <boost::unit_test::singleton<boost::unit_test::unit_test_monitor_t>::instance()::the_inst>, tc=...)
        at ./boost/test/impl/unit_test_monitor.ipp:69
    [#18](/github-metadata-backup-bitcoin-bitcoin/18/) 0x00007f5b7ecca4af in boost::unit_test::framework_impl::visit (
        this=0x7f5b7ef238c0 <boost::unit_test::(anonymous namespace)::s_frk_impl()::the_inst>, tc=...) at ./boost/test/impl/framework.ipp:156
    [#19](/github-metadata-backup-bitcoin-bitcoin/19/) 0x00007f5b7ecfea33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#20](/github-metadata-backup-bitcoin-bitcoin/20/) 0x00007f5b7ecfea33 in boost::unit_test::traverse_test_tree (suite=..., V=...) at ./boost/test/impl/unit_test_suite.ipp:207
    [#21](/github-metadata-backup-bitcoin-bitcoin/21/) 0x00007f5b7ecc59fa in boost::unit_test::framework::run (id=1, id@entry=4294967295, continue_test=continue_test@entry=true)
        at ./boost/test/impl/framework.ipp:442
    [#22](/github-metadata-backup-bitcoin-bitcoin/22/) 0x00007f5b7ece1697 in boost::unit_test::unit_test_main (init_func=<optimized out>, argc=<optimized out>, argv=<optimized out>)
        at ./boost/test/impl/unit_test_main.ipp:185
    [#23](/github-metadata-backup-bitcoin-bitcoin/23/) 0x00007f5b7fdf3788 in main (argc=1, argv=0x7ffc92ba6948) at /usr/include/boost/test/unit_test.hpp:59
    (gdb) generate-core-file
    warning: Memory read failed for corefile section, 8192 bytes at 0x7ffc92bf9000.
    Saved corefile core.30886
    ...
    
  9. laanwj added the label Tests on Sep 4, 2015
  10. laanwj cross-referenced this on Sep 22, 2015 from issue test_bitcoin hangs in test case "manythreads" by MarcoFalke
  11. MarcoFalke commented at 9:55 AM on September 23, 2015: member

    If someone wants to track down this issue further, you don't have to run all test.

    src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests
    

    is enough.

  12. morcos commented at 2:56 AM on October 28, 2015: member

    This happens to me about 1 out of every 4 times I run test_bitcoin. I'm not doing anything special like stripping debugging symbols.

  13. MarcoFalke cross-referenced this on Dec 1, 2015 from issue Remove UTXO cache entries when the tx they were added for is removed/does not enter mempool by TheBlueMatt
  14. laanwj commented at 12:31 PM on December 1, 2015: member

    If this is a common issue (I've never noticed it locally and I run test_bitcoin 10's of times per day) Maybe better to disable the scheduler tests until they're fixed?

  15. MarcoFalke commented at 1:30 PM on December 1, 2015: member

    @laanwj Happens 4% of the time:

    for i in {1..10000}; do timeout 1 src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests &> /dev/null ;echo $? >> /tmp/result100 ;done;cat /tmp/result100|sort|uniq -c
       9638 0
        462 124
    
  16. laanwj referenced this in commit 8f0d79e3c8 on Dec 1, 2015
  17. laanwj cross-referenced this on Dec 1, 2015 from issue test: Disable scheduler test manythreads by laanwj
  18. laanwj commented at 1:57 PM on December 1, 2015: member

    See #7144

  19. gavinandresen commented at 8:39 PM on December 1, 2015: contributor

    For what it's worth: I have never seen this bug, OSX and boost 1.59.

    Looking through Boost release notes, I see a fix in boost 1.56 that might be the culprit:

    1.56 : Simplified, refactored and unified (timed_)lock code based on try_lock(). There were several bugs when handling timeout expirations. @MarcoFalke @morcos : are you using boost version < 1.56 ?

  20. MarcoFalke commented at 9:43 PM on December 1, 2015: member

    Running 1.58 (the same? as travis is running, which also locks)

  21. morcos commented at 10:03 PM on December 1, 2015: member

    hmm if i was paying closer attention i could have helped debug this better, because it was happening fairly consistently for me, and now it never happens anymore.

  22. posita commented at 11:34 PM on December 1, 2015: contributor

    FWIW, I don't recall being able to reproduce this in OS X (using Boost 1.58 or 1.59). I was able to reliably reproduce it using Debian Jessie<sup>1</sup> (but the system packages for Boost are at 1.55). @MarcoFalke, what platform (uname -a) are you experiencing locks with 1.58? Can you do a linking sanity check (e.g., ldd -v .../path/to/test_bitcoin on Linux, otool -Lv ... on Darwin)?

    <sup>1</sup> Technically, it was inside a Docker container based on debian:stable which was running via boot2docker inside a Parallels VM running on OS X, but I'm pretty sure that doesn't count as OS X. :stuck_out_tongue_winking_eye:

  23. posita commented at 5:59 AM on December 2, 2015: contributor

    I am unable to reproduce on OS X with Boost 1.59:

    % uname -a
    Darwin ... 13.4.0 Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 x86_64
    % src/bitcoin-cli --version
    Bitcoin Core RPC client version v0.11.2
    % otool -Lv src/bitcoin-cli
    src/bitcoin-cli:
            .../lib/libboost_system-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_filesystem-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_program_options-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_thread-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libboost_chrono-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            .../lib/libcrypto.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
            time stamp 2 Wed Dec 31 16:00:02 1969
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1197.1.1)
            time stamp 2 Wed Dec 31 16:00:02 1969
    % grep BOOST_LIB_VERSION .../include/boost/version.hpp
    //  BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION
    #define BOOST_LIB_VERSION "1_59"
    % for i in {1..10000} ; do gtimeout 2 src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests &>/dev/null ; echo $? >>/tmp/result100 ; done ; sort /tmp/result100 | uniq -c
    10000 0
    
  24. MarcoFalke commented at 7:08 AM on December 2, 2015: member

    With BOOST_LIB_VERSION=1_58

    $ uname -a
    Linux localhost.localdomain 4.2.6-301.fc23.x86_64 [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 22:22:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    $ ldd -v ./src/test/test_bitcoin|grep boost
        libboost_system.so.1.58.0 => /lib64/libboost_system.so.1.58.0 (0x00007f5e84548000)
        libboost_filesystem.so.1.58.0 => /lib64/libboost_filesystem.so.1.58.0 (0x00007f5e84332000)
        libboost_program_options.so.1.58.0 => /lib64/libboost_program_options.so.1.58.0 (0x00007f5e840bc000)
        libboost_thread.so.1.58.0 => /lib64/libboost_thread.so.1.58.0 (0x00007f5e83e96000)
        libboost_chrono.so.1.58.0 => /lib64/libboost_chrono.so.1.58.0 (0x00007f5e83c8e000)
        libboost_unit_test_framework.so.1.58.0 => /lib64/libboost_unit_test_framework.so.1.58.0 (0x00007f5e839eb000)
        /lib64/libboost_system.so.1.58.0:
        /lib64/libboost_filesystem.so.1.58.0:
        /lib64/libboost_program_options.so.1.58.0:
        /lib64/libboost_thread.so.1.58.0:
        /lib64/libboost_chrono.so.1.58.0:
        /lib64/libboost_unit_test_framework.so.1.58.0:
    
    
  25. posita commented at 7:20 AM on December 2, 2015: contributor

    I am unable to reproduce on OS X with Boost 1.59 ...

    However, I can reproduce it on Debian Sid (running in Docker) with Boost 1.58:

    root@c74c16a6aee2:~/bitcoin# uname -a
    Linux c74c16a6aee2 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    root@c74c16a6aee2:~/bitcoin# src/bitcoin-cli --version
    Bitcoin Core RPC client version v0.11.99.0-4077ad2
    root@c74c16a6aee2:~/bitcoin# ldd -v src/bitcoin-cli | grep boost
            libboost_system.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0 (0x00007fc1a24f2000)
            libboost_filesystem.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.58.0 (0x00007fc1a22d9000)
            libboost_program_options.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.58.0 (0x00007fc1a2057000)
            libboost_thread.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0 (0x00007fc1a1e30000)
            libboost_chrono.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.58.0 (0x00007fc1a1c28000)
            /usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0:
            /usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.58.0:
    root@c74c16a6aee2:~/bitcoin# grep BOOST_LIB_VERSION /usr/include/boost/version.hpp
    //  BOOST_LIB_VERSION must be defined to be the same as BOOST_VERSION
    #define BOOST_LIB_VERSION "1_58"
    root@c74c16a6aee2:~/bitcoin# for i in {1..2000} ; do timeout 2 src/test/test_bitcoin --log_level=ALL --run_test=scheduler_tests &>/dev/null ; echo $? >>/tmp/result100 ; done ; sort /tmp/result100 | uniq -c
       1695 0
        305 124
    

    Caveats:

  26. posita commented at 4:31 PM on December 3, 2015: contributor

    FWIW, I used these with the stack mentioned in my previous comment to reproduce this on Debian (wheezy, jessie, sid) and Ubuntu (trusty, vivid, wily):

    % for i in Dockerfile-*-*-automake ; do read -k 1 "?Any key to continue with ${i} ... " ; echo ; docker build -f "${i}" . ; done
    Any key to continue with Dockerfile-debian-jessie-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.9.1
    STL     : GNU libstdc++ version 20140912
    Boost   : 1.55.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux 11aafb97cfeb 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        871 0
        129 124
    ...
    Any key to continue with Dockerfile-debian-sid-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 5.2.1 20151129
    STL     : GNU libstdc++ version 20151129
    Boost   : 1.58.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux 5cf97854c2bc 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        881 0
        119 124
    ...
    Any key to continue with Dockerfile-debian-wheezy-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.7.2
    STL     : GNU libstdc++ version 20120920
    Boost   : 1.49.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux c98f3c4629d3 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        909 0
         91 124
     ...
    Any key to continue with Dockerfile-ubuntu-trusty-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.8.2
    STL     : GNU libstdc++ version 20140404
    Boost   : 1.54.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux a52c17016130 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        887 0
        113 124
    ...
    Any key to continue with Dockerfile-ubuntu-vivid-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.9.2
    STL     : GNU libstdc++ version 20150305
    Boost   : 1.55.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux 19d74fa25b3a 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        897 0
        103 124
    ...
    Any key to continue with Dockerfile-ubuntu-wily-automake ... .
    ...
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    *** No errors detected
    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 5.2.1 20151010
    STL     : GNU libstdc++ version 20151010
    Boost   : 1.58.0
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Bitcoin Core RPC client version v0.11.2
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Linux a6af3de67df6 4.1.13-boot2docker [#1](/github-metadata-backup-bitcoin-bitcoin/1/) SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
        741 0
        259 124
    

    So it appears to be happening pretty consistently under each of those environments, often more than 10% of the time with my setup (124 exit code counts are the hung ones).

    While it's possible that virtualization amplifies opportunities to observe a race condition that more efficient environments hide, concurrent with that last test (for Ubuntu Wily), I ran the following on the OS X host and still got zero hangs:

    % bitcoin-cli --version
    Bitcoin Core RPC client version v0.11.2
    % test_bitcoin --build_info --run_test=scheduler_tests
    Running 1 test case...
    Platform: Mac OS
    Compiler: Clang version 6.0 (clang-600.0.57)
    STL     : libc++ version 1101
    Boost   : 1.59.0
    
    *** No errors detected
    % python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > python -c 'import decimal ; print((decimal.Decimal("10" * 16384 * 8192) + decimal.Decimal("4321" * 16384 * 8192)) * decimal.Decimal("9" * 32768 * 4096))' & \
    > time ( for i in {1..10000} ; do gtimeout 2 test_bitcoin --log_level=ALL --run_test=scheduler_tests &>/dev/null ; echo "${?}" ; done ) | sort | uniq -c ; \
    > kill %1 %2 %3 %4
    [1] 20843
    [2] 20844
    [3] 20845
    [4] 20846
    10000 0
    ( for i in {1..10000}; do; gtimeout 2 test_bitcoin --log_level=ALL  &> ; echo)  565.09s user 252.40s system 23% cpu 56:46.88 total
    [4]  + terminated  python -c
    [3]  + terminated  python -c
    [2]  - terminated  python -c
    [1]  - terminated  python -c
    
  27. laanwj cross-referenced this on Dec 4, 2015 from issue [Trivial] Disable compiler warnings about unused functions by paveljanik
  28. luke-jr referenced this in commit f03d898094 on Dec 8, 2015
  29. laanwj renamed this:
    test_bitcoin occasionally deadlocks
    scheduler_tests (currently disabled) occasionally deadlocks
    on Apr 28, 2016
  30. laanwj cross-referenced this on May 5, 2016 from issue Get rid of a compiler warning due to #if 0'd test by avar
  31. paveljanik commented at 7:07 AM on May 5, 2016: contributor

    FWIW: OS X physical test machine with 10.10, current master and boosts from 1.54.0 to 1.58.0: no hangs at all. I can't reproduce the test problem here locally :-(

  32. paveljanik commented at 9:52 AM on May 5, 2016: contributor

    openSUSE 13.2 64bit, no hang in 10000 runs with boost 1.54.0.

    Running 1 test case...
    Platform: linux
    Compiler: GNU C++ version 4.8.3 20140627 [gcc-4_8-branch revision 212064]
    STL     : GNU libstdc++ version 20140627
    Boost   : 1.54.0
    
    *** No errors detected
    
  33. paveljanik commented at 10:22 AM on May 5, 2016: contributor

    How to reproduce this so I can look at it?

  34. paveljanik commented at 11:00 AM on May 5, 2016: contributor

    OK, I can sort of reproduce it here on openSUSE 13.2 64bit.

    Running

    for i in {1..100}; do timeout 2000 ./test_bitcoin -d y --log-level=ALL  --run_test=scheduler_tests >>log.$CASE.$i 2>&1 && echo OK || echo ERR $CASE $i; done | sort | uniq -c & CASE=$((CASE+1))
    

    5 times in parallel will keep approx. 2 instances of test_bitcoin running and waiting in

    pthread_cond_wait@@GLIBC_2.3.2
    

    Full bt:

    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f477358605f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f4775d27638 in boost::condition_variable::wait (this=0x7f4777638218, m=...) at /usr/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f47750a13f4 in boost::thread::join_noexcept() () from /usr/lib64/libboost_thread.so.1.54.0
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f4775d27588 in boost::thread::join (this=0x7f47776381a0) at /usr/include/boost/thread/detail/thread.hpp:756
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f4775d28295 in boost::thread_group::join_all (this=0x7ffdcadd52d0) at /usr/include/boost/thread/detail/thread_group.hpp:118
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f4775d25b8a in scheduler_tests::manythreads::test_method (this=0x7ffdcadd5fc7) at test/scheduler_tests.cpp:110
    [#6](/github-metadata-backup-bitcoin-bitcoin/6/)  0x00007f4775d24e60 in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:44
    [#7](/github-metadata-backup-bitcoin-bitcoin/7/)  0x00007f4775bbfff7 in boost::unit_test::ut_detail::invoker<boost::unit_test::ut_detail::unused>::invoke<void (*)()> (this=0x7ffdcadd6037, 
        f=@0x7f47776013c8: 0x7f4775d24e3d <scheduler_tests::manythreads_invoker()>) at /usr/include/boost/test/utils/callback.hpp:56
    [#8](/github-metadata-backup-bitcoin-bitcoin/8/)  0x00007f4775bbfa63 in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=0x7f47776013c0)
        at /usr/include/boost/test/utils/callback.hpp:89
    [#9](/github-metadata-backup-bitcoin-bitcoin/9/)  0x00007f4774c50661 in ?? () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#10](/github-metadata-backup-bitcoin-bitcoin/10/) 0x00007f4774c2bbf6 in boost::execution_monitor::catch_signals(boost::unit_test::callback0<int> const&) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#11](/github-metadata-backup-bitcoin-bitcoin/11/) 0x00007f4774c2c423 in boost::execution_monitor::execute(boost::unit_test::callback0<int> const&) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#12](/github-metadata-backup-bitcoin-bitcoin/12/) 0x00007f4774c50762 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::unit_test::test_case const&) ()
       from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#13](/github-metadata-backup-bitcoin-bitcoin/13/) 0x00007f4774c3a424 in boost::unit_test::framework_impl::visit(boost::unit_test::test_case const&) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#14](/github-metadata-backup-bitcoin-bitcoin/14/) 0x00007f4774c68f13 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) ()
       from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#15](/github-metadata-backup-bitcoin-bitcoin/15/) 0x00007f4774c68f13 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) ()
       from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#16](/github-metadata-backup-bitcoin-bitcoin/16/) 0x00007f4774c3594a in boost::unit_test::framework::run(unsigned long, bool) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#17](/github-metadata-backup-bitcoin-bitcoin/17/) 0x00007f4774c4e344 in boost::unit_test::unit_test_main(bool (*)(), int, char**) () from /usr/lib64/libboost_unit_test_framework.so.1.54.0
    [#18](/github-metadata-backup-bitcoin-bitcoin/18/) 0x00007f4775d88160 in main (argc=5, argv=0x7ffdcadd6e98) at /usr/include/boost/test/unit_test.hpp:59
    

    This is with boost 1.54.0. I'll try newer versions of boost on this machine later today.

  35. laanwj commented at 1:58 PM on May 5, 2016: member

    I could reproduce it at the time, but haven't tried for a while. It was a pretty rare issue, which made it non-trivial to fix. Congrats on getting a backtrace at least. You should probably have dumped one for all threads to get a fuller picture. (waiting in pthread_cond_wait@@GLIBC_2.3.2 is perfectly normal, it's bound to be a combination of threads which puts it into a deadlock)

  36. paveljanik commented at 2:22 PM on May 5, 2016: contributor
    (gdb) thread apply all bt
    
    Thread 4 (Thread 0x7f3c0c330700 (LWP 26007)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f3c0e927fef in thread_proxy () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f3c0ce030a4 in start_thread () from /lib64/libpthread.so.0
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f3c0cb3908d in clone () from /lib64/libc.so.6
    
    Thread 3 (Thread 0x7f3c0b32e700 (LWP 26009)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f3c0e927fef in thread_proxy () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f3c0ce030a4 in start_thread () from /lib64/libpthread.so.0
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f3c0cb3908d in clone () from /lib64/libc.so.6
    
    Thread 2 (Thread 0x7f3c0a32c700 (LWP 26011)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f3c0e927fef in thread_proxy () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  0x00007f3c0ce030a4 in start_thread () from /lib64/libpthread.so.0
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f3c0cb3908d in clone () from /lib64/libc.so.6
    
    Thread 1 (Thread 0x7f3c0f3d4740 (LWP 25800)):
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7f3c11c15d18, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f3c0e92883c in boost::thread::join_noexcept() () from /tmp/BOOST/1.58.0/lib/libboost_thread.so.1.58.0
    [#3](/github-metadata-backup-bitcoin-bitcoin/3/)  0x00007f3c0f504996 in join (this=0x7f3c11c11f20) at /tmp/BOOST/1.58.0/include/boost/thread/detail/thread.hpp:767
    [#4](/github-metadata-backup-bitcoin-bitcoin/4/)  boost::thread_group::join_all (this=this@entry=0x7ffc1eccc2e0) at /tmp/BOOST/1.58.0/include/boost/thread/detail/thread_group.hpp:118
    [#5](/github-metadata-backup-bitcoin-bitcoin/5/)  0x00007f3c0f502b2a in scheduler_tests::manythreads::test_method (this=this@entry=0x7ffc1ecccfd7) at test/scheduler_tests.cpp:110
    [#6](/github-metadata-backup-bitcoin-bitcoin/6/)  0x00007f3c0f502fbe in scheduler_tests::manythreads_invoker () at test/scheduler_tests.cpp:44
    [#7](/github-metadata-backup-bitcoin-bitcoin/7/)  0x00007f3c0f454f27 in invoke<void (*)()> (this=<optimized out>, f=<optimized out>) at /tmp/BOOST/1.58.0/include/boost/test/utils/callback.hpp:56
    [#8](/github-metadata-backup-bitcoin-bitcoin/8/)  boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke (this=<optimized out>) at /tmp/BOOST/1.58.0/include/boost/test/utils/callback.hpp:89
    [#9](/github-metadata-backup-bitcoin-bitcoin/9/)  0x00007f3c0e4d22b1 in boost::unit_test::ut_detail::callback0_impl_t<int, boost::unit_test::(anonymous namespace)::zero_return_wrapper_t<boost::unit_test::callback0<boost::unit_test::ut_detail::unused> > >::invoke() ()
       from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#10](/github-metadata-backup-bitcoin-bitcoin/10/) 0x00007f3c0e4b4946 in boost::execution_monitor::catch_signals(boost::unit_test::callback0<int> const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#11](/github-metadata-backup-bitcoin-bitcoin/11/) 0x00007f3c0e4b5173 in boost::execution_monitor::execute(boost::unit_test::callback0<int> const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#12](/github-metadata-backup-bitcoin-bitcoin/12/) 0x00007f3c0e4d23b2 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::unit_test::test_case const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#13](/github-metadata-backup-bitcoin-bitcoin/13/) 0x00007f3c0e4bbf14 in boost::unit_test::framework_impl::visit(boost::unit_test::test_case const&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#14](/github-metadata-backup-bitcoin-bitcoin/14/) 0x00007f3c0e4eb053 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#15](/github-metadata-backup-bitcoin-bitcoin/15/) 0x00007f3c0e4eb053 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#16](/github-metadata-backup-bitcoin-bitcoin/16/) 0x00007f3c0e4b743a in boost::unit_test::framework::run(unsigned long, bool) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#17](/github-metadata-backup-bitcoin-bitcoin/17/) 0x00007f3c0e4cff94 in boost::unit_test::unit_test_main(bool (*)(), int, char**) () from /tmp/BOOST/1.58.0/lib/libboost_unit_test_framework.so.1.58.0
    [#18](/github-metadata-backup-bitcoin-bitcoin/18/) 0x00007f3c0ca75b05 in __libc_start_main () from /lib64/libc.so.6
    [#19](/github-metadata-backup-bitcoin-bitcoin/19/) 0x00007f3c0f43cdcc in _start () at ../sysdeps/x86_64/start.S:112
    (gdb) 
    
  37. paveljanik commented at 2:27 PM on May 5, 2016: contributor
    (gdb) thread 4
    [Switching to thread 4 (Thread 0x7f3c0c330700 (LWP 26007))]
    [#0](/github-metadata-backup-bitcoin-bitcoin/0/)  0x00007f3c0ce0705f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    (gdb) up
    [#1](/github-metadata-backup-bitcoin-bitcoin/1/)  0x00007f3c0f5046e3 in boost::condition_variable::wait (this=0x7ffc1eccc250, m=...) at /tmp/BOOST/1.58.0/include/boost/thread/pthread/condition_variable.hpp:73
    73                res = pthread_cond_wait(&cond,&internal_mutex);
    (gdb) up
    [#2](/github-metadata-backup-bitcoin-bitcoin/2/)  0x00007f3c0f6db830 in CScheduler::serviceQueue (this=0x7ffc1eccc220) at scheduler.cpp:42
    42                  newTaskScheduled.wait(lock);
    (gdb) list
    37      // is called.
    38      while (!shouldStop()) {
    39          try {
    40              while (!shouldStop() && taskQueue.empty()) {
    41                  // Wait until there is something to do.
    42                  newTaskScheduled.wait(lock);
    43              }
    44  
    
  38. paveljanik commented at 9:30 PM on May 5, 2016: contributor

    Self notice: Do I really need to repeat how I love debugging multiple threads servicing one queue of tasks repeatedly adding other tasks to the same queue?

    The solution to the deadlock is to call notify_one() at the end of ::serviceQueue(), because this could wake up other threads running ::serviceQueues() in such a deadlock case. Without this, there could remain some servicing threads waiting for the task which doesn't come in (because we are finished)!

    Or directly call notify_all() but I prefer _one().

    What do you think?

    diff --git a/src/scheduler.cpp b/src/scheduler.cpp
    index 184ddc2..0a431b2 100644
    --- a/src/scheduler.cpp
    +++ b/src/scheduler.cpp
    @@ -79,6 +81,7 @@ void CScheduler::serviceQueue()
             }
         }
         --nThreadsServicingQueue;
    +    newTaskScheduled.notify_one();
     }
    
     void CScheduler::stop(bool drain)
    
    
  39. paveljanik cross-referenced this on May 6, 2016 from issue Fix multithread CScheduler and reenable test by paveljanik
  40. laanwj closed this on May 10, 2016

  41. bitcoin locked this on Sep 8, 2021
Labels

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:55 UTC