Preserve the `LockData` initial state if "potential deadlock detected" exception thrown #19340

pull hebasto wants to merge 3 commits into bitcoin:master from hebasto:200620-stack changing 3 files +33 −9
  1. hebasto commented at 8:26 PM on June 20, 2020: member

    On master (e3fa3c7d671e34038a262bb2db16b30bee82153d) if the push_lock() throws the "potential deadlock detected" exception (via the potential_deadlock_detected() call), the LockData instance internal state differs from one when the push_lock() was called. This non-well behaviour makes (at least) testing brittle.

    This PR preserves the LockData instance initial state if push_lock() throws an exception, and improves the sync_tests unit test.

  2. hebasto cross-referenced this on Jun 20, 2020 from issue sync: detect double lock from the same thread by vasild
  3. DrahtBot added the label Tests on Jun 20, 2020
  4. DrahtBot commented at 10:03 PM on June 20, 2020: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19337 (sync: detect double lock from the same thread by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. in src/sync.cpp:272 in c810cd4d17 outdated
     263 | @@ -255,6 +264,14 @@ void DeleteLock(void* cs)
     264 |      }
     265 |  }
     266 |  
     267 | +bool LockStackEmpty()
     268 | +{
     269 | +    LockData& lockdata = GetLockData();
     270 | +    std::lock_guard<std::mutex> lock(lockdata.dd_mutex);
     271 | +    LockStack& lock_stack = lockdata.m_lock_stacks[std::this_thread::get_id()];
     272 | +    return lock_stack.empty();
    


    vasild commented at 7:50 AM on June 22, 2020:

    operator[] has a side effect that it would insert a new element in m_lock_stacks if the thread that called it does not own any locks.

        const auto it = lockdata.m_lock_stacks.find(std::this_thread::get_id());
        if (it == lockdata.m_lock_stacks.end()) {
            return true;
        }
        return it->second.empty().
    

    hebasto commented at 11:46 AM on June 22, 2020:
  6. in src/sync.cpp:131 in c810cd4d17 outdated
     130 | @@ -131,7 +131,7 @@ static void potential_deadlock_detected(const LockPair& mismatch, const LockStac
     131 |      throw std::logic_error("potential deadlock detected");
    


    vasild commented at 9:27 AM on June 22, 2020:

    I would suggest to unconditionally remove the entry from the stack here, after printing the stack and before abort()/throw. This cleanup is warranted from all code paths that call EnterCritical().

        // Undo the insertion from push_lock(). We will not lock the mutex.
        s2.pop_back();
        throw std::logic_error("potential deadlock detected");
    

    (this should be before the if, 4 lines above, but github does not allow me to put the suggestion there)

  7. vasild commented at 9:42 AM on June 22, 2020: contributor

    It would be good if the offending entry is still in the lock stack by the time potential_deadlock_detected() is called because it prints the stack and we want to see the offender in the debug log.

    Also, I think that it would be best to undo all mods before abort()/throw. We also inserted into lockdata.lockorders and in lockdata.invlockorders. Maybe something like:

    diff --git i/src/sync.cpp w/src/sync.cpp
    index 3098e402c..b21154d6e 100644
    --- i/src/sync.cpp
    +++ w/src/sync.cpp
    @@ -124,17 +124,12 @@ static void potential_deadlock_detected(const LockPair& mismatch, const LockStac
             }
             if (i.first == mismatch.second) {
                 LogPrintf(" (2)"); /* Continued */
             }
             LogPrintf(" %s\n", i.second.ToString());
         }
    -    if (g_debug_lockorder_abort) {
    -        tfm::format(std::cerr, "Assertion failed: detected inconsistent lock order at %s:%i, details in debug log.\n", __FILE__, __LINE__);
    -        abort();
    -    }
    -    throw std::logic_error("potential deadlock detected");
     }
     
     static void double_lock_detected(const void* mutex, LockStack& lock_stack)
     {
         LogPrintf("DOUBLE LOCK DETECTED\n");
         LogPrintf("Lock order:\n");
    @@ -186,14 +181,28 @@ static void push_lock(MutexType* c, const CLockLocation& locklocation)
             if (lockdata.lockorders.count(p1))
                 continue;
             lockdata.lockorders.emplace(p1, lock_stack);
     
             const LockPair p2 = std::make_pair(c, i.first);
             lockdata.invlockorders.insert(p2);
    -        if (lockdata.lockorders.count(p2))
    +        if (lockdata.lockorders.count(p2)) {
                 potential_deadlock_detected(p1, lockdata.lockorders[p2], lockdata.lockorders[p1]);
    +
    +            lockdata.invlockorders.erase(p2);
    +            lockdata.lockorders.erase(p1);
    +            lock_stack.pop_back();
    +
    +            if (g_debug_lockorder_abort) {
    +                tfm::format(std::cerr,
    +                    "Assertion failed: detected inconsistent lock order at %s:%i, "
    +                    "details in debug log.\n",
    +                    __FILE__, __LINE__);
    +                abort();
    +            }
    +            throw std::logic_error("potential deadlock detected");
    +        }
         }
     }
     
     static void pop_lock()
     {
         LockData& lockdata = GetLockData();
    
  8. hebasto commented at 9:56 AM on June 22, 2020: member

    @vasild

    It would be good if the offending entry is still in the lock stack by the time potential_deadlock_detected() is called because it prints the stack and we want to see the offender in the debug log.

    It prints from LockOrders instances, not from the lock stack of the current thread, no?

  9. vasild commented at 10:59 AM on June 22, 2020: contributor

    You are right! I did not realize that the LockStack objects are copied.

    Anyway - I think, for EnterCritical() to be a good citizen, if it throws, it should leave its internal structures in the state they were before it was called.

  10. hebasto force-pushed on Jun 22, 2020
  11. hebasto commented at 11:46 AM on June 22, 2020: member

    Updated c810cd4d17f1f3854045c7e6f9093c856ba28406 -> 236e76047fd0cf260d4aed0c0926e9942e15d828 (pr19340.01 -> pr19340.02, diff):

    operator[] has a side effect that it would insert a new element in m_lock_stacks if the thread that called it does not own any locks. @vasild Anyway - I think, for EnterCritical() to be a good citizen, if it throws, it should leave its internal structures in the state they were before it was called.

    I'd leave it as is since it seems unrelated to this PR goal.

  12. hebasto force-pushed on Jun 22, 2020
  13. hebasto commented at 6:53 PM on June 22, 2020: member

    Reworked after the discussion with @vasild on IRC.

    ~Going to update the OP soon.~ The OP has been updated.

  14. DrahtBot cross-referenced this on Jun 22, 2020 from issue Fix mistakenly swapped "previous" and "current" lock orders by hebasto
  15. hebasto renamed this:
    Fix lock stack after "potential deadlock detected" exception
    Preserve the `LockData` initial state if "potential deadlock detected" exception thrown
    on Jun 22, 2020
  16. vasild approved
  17. vasild commented at 8:00 AM on June 23, 2020: contributor

    ACK f88222a9

  18. JeremyRubin commented at 9:05 PM on July 9, 2020: contributor

    I'm not sure I fully get what's going on here, but the change looks fine to me. Can you explain how making the copy of the lockstack actually helps? It looks like a race condition to even copy it?

  19. vasild commented at 7:58 AM on July 10, 2020: contributor

    @JeremyRubin thanks for asking! The key here is the added lock_stack.pop_back(), not the copying. The copying is needed because without it the prints from potential_deadlock_detected() would not be complete (because we already removed one element). There is no race because that code is operating under lockdata.dd_mutex.

    Let me elaborate with an example:

    • Thread t1 acquires lockA
    • Thread t2 acquires lockB
    • Thread t1 tries to acquire lockB
      • EnterCritical() adds lockB to the lock stack of t1 and succeeds
      • The lock stacks are: t1: lockA, lockB, t2: lockB
      • Now t1 hangs inside the OS mutex lock() function, waiting for lockB to be released.
    • Thread t2 tries to acquire lockA
      • EnterCritical() adds lockA to the lock stack of t2 and it is really upset because some threads first (try to) acquire lockA and then lockB and some other threads try in reverse order. It is going to either abort() or throw, lets assume it is going to throw.
      • The lock stacks are: t1: lockA, lockB, t2: lockB, lockA
      • EnterCritical() prints both lock stacks and throws.
      • The caller (in t2) catches the exception and continues executing (this is only happening from tests).

    Notice that at the end t2's lock stack is lockB, lockA but it actually only owns lockB -- this is the problem being fixed by this PR. This only matters in tests which catch the exception and continue executing. The other code ends up with abort() and the whole program terminates immediately.

  20. DrahtBot cross-referenced this on Jul 28, 2020 from issue Remove mempool global by MarcoFalke
  21. DrahtBot cross-referenced this on Jul 30, 2020 from issue Disable fee estimation in blocksonly mode (by removing the fee estimates global) by darosior
  22. in src/sync.cpp:154 in f88222a9a1 outdated
     142 | @@ -143,14 +143,17 @@ static void push_lock(void* c, const CLockLocation& locklocation)
     143 |              break;
     144 |  
     145 |          const LockPair p1 = std::make_pair(i.first, c);
     146 | +        const LockPair p2 = std::make_pair(c, i.first);
     147 | +        if (lockdata.lockorders.count(p2)) {
    


    MarcoFalke commented at 6:12 AM on July 31, 2020:

    Any reason this is now being done before if (lockdata.lockorders.count(p1))continue?


    hebasto commented at 1:48 PM on August 2, 2020:
  23. in src/sync.h:74 in f88222a9a1 outdated
      71 | @@ -71,6 +72,7 @@ template <typename MutexType>
      72 |  void static inline AssertLockHeldInternal(const char* pszName, const char* pszFile, int nLine, MutexType* cs) ASSERT_EXCLUSIVE_LOCK(cs) {}
      73 |  void static inline AssertLockNotHeldInternal(const char* pszName, const char* pszFile, int nLine, void* cs) {}
      74 |  void static inline DeleteLock(void* cs) {}
    


    MarcoFalke commented at 6:12 AM on July 31, 2020:
    void inline DeleteLock(void* cs) {}
    

    Could remove the confusing static's here?


    hebasto commented at 1:48 PM on August 2, 2020:
  24. MarcoFalke approved
  25. MarcoFalke commented at 6:13 AM on July 31, 2020: member

    ACK f88222a9a1b218b85c442654ee50ddce290484b6

    just some questions. Feel free to ignore.

  26. hebasto force-pushed on Aug 2, 2020
  27. Preserve initial state if push_lock() throws exception 1f96be25b0
  28. test: Repeat deadlock tests 42b2a95373
  29. test: Add LockStackEmpty() 63e9e40b73
  30. hebasto force-pushed on Aug 2, 2020
  31. hebasto commented at 1:47 PM on August 2, 2020: member

    Updated f88222a9a1b218b85c442654ee50ddce290484b6 -> cab80f4578e5c9998206b3a8d3cc1019d0841bba (pr19340.03 -> pr19340.04, diff):

    Any reason this is now being done before if (lockdata.lockorders.count(p1))continue?

    Could remove the confusing static's here?


    Rebased cab80f4578e5c9998206b3a8d3cc1019d0841bba -> 63e9e40b73507b0c9361fc8728d4e97fd72c9ec9 (pr19340.04 -> pr19340.05) to fix Travis "lint" job errors.

  32. vasild approved
  33. vasild commented at 11:24 AM on August 4, 2020: contributor

    ACK 63e9e40

  34. MarcoFalke commented at 2:59 PM on August 4, 2020: member

    re-ACK 63e9e40b73

  35. MarcoFalke merged this on Aug 4, 2020
  36. MarcoFalke closed this on Aug 4, 2020

  37. hebasto deleted the branch on Aug 4, 2020
  38. sidhujag referenced this in commit 6f9d723560 on Aug 4, 2020
  39. deadalnix referenced this in commit 100e3ff738 on Sep 7, 2021
  40. kwvg referenced this in commit da19a4c732 on Oct 8, 2021
  41. kwvg referenced this in commit 4f35330c45 on Oct 12, 2021
  42. PastaPastaPasta referenced this in commit 790a4f9d60 on Oct 12, 2021
  43. PastaPastaPasta referenced this in commit 3264e26813 on Oct 20, 2021
  44. pravblockc referenced this in commit 3f992e344f on Nov 18, 2021
  45. bitcoin locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-19 06:53 UTC