Fix zmq test flakiness #20934

issue MarcoFalke opened this issue on January 14, 2021
  1. MarcoFalke commented at 3:26 PM on January 14, 2021: member

    There are many reports of the test being flaky: #20672 (comment)

    Thus, it should be made more robust, as described in #20538 (comment)

    Useful skills:

    • Background in our functional test suite (python3)
    • Background in zmq

    Want to work on this issue?

    For guidance on contributing, please read CONTRIBUTING.md before opening your pull request.

  2. MarcoFalke added the label Tests on Jan 14, 2021
  3. MarcoFalke added the label good first issue on Jan 14, 2021
  4. MarcoFalke cross-referenced this on Jan 14, 2021 from issue Tracking CI false positive rates by sdaftuar
  5. adamjonas commented at 8:15 PM on January 14, 2021: member

    Of the last 571 failures, 22 are from the interface_zmq.py functional tests (3.8%). According to the numbers, it's the flakiest functional tests we have. @domob1812 @theStack @mruddy @n-thumann are any of you willing to give this a shot?

  6. adamjonas cross-referenced this on Jan 14, 2021 from issue qa: Intermittent failure in interface_zmq.py "Resource temporarily unavailable" by hebasto
  7. theStack cross-referenced this on Jan 17, 2021 from issue test: dedup zmq test setup code (node restart, topics subscription) by theStack
  8. theStack commented at 5:49 PM on January 17, 2021: contributor

    Took some time to look at the problem, it seems to be quite tricky to solve in a solid way. I tried the suggested method of "syncing up" via repeatedly generating a block and waiting for the expected message (until it doesn't timeout anymore), but generating a block seems to interfere with some of the sub-tests. It also already generates notification messages for our subs that are received later (even if we are not connected yet). Maybe something like this would work:

    • restart node with additional pubhashtx test publisher (on a port not used by any of the test subs)
    • repeatedly generate block and wait for expected messages from test publisher, until it doesn't time out anymore
    • invalidate generated blocks
    • clear mempool (needed?)
    • read from our subscriber sockets until there is no data (a "reverse flush" so to say)

    Maybe I'm thinking too complicated though. Whatever the solution will be, at least having a common test setup method should serve as a better basis for solving this issue: #20953

  9. instagibbs commented at 2:58 AM on January 18, 2021: member

    but generating a block seems to interfere with some of the sub-tests

    Yes it would require making all the subtests more robust I think.

    alternative setup

    Seems pretty complicated, and with intentional block rollbacks things can get weird.

  10. fanquake referenced this in commit 3734adba39 on Jan 21, 2021
  11. sidhujag referenced this in commit 4dceb42b8b on Jan 21, 2021
  12. MarcoFalke commented at 8:06 AM on January 22, 2021: member

    Could a mempool tx be used to sync up instead of a block?

  13. theStack cross-referenced this on Jan 26, 2021 from issue test: fix zmq test flakiness, improve speed by theStack
  14. practicalswift commented at 11:29 AM on January 26, 2021: contributor

    What about temporarily disabling interface_zmq.py in CI until this is fixed?

    It seems to me that interface_zmq.py as it is currently working is a net negative from a CI testing perspective due to its extreme flakiness :)

  15. instagibbs commented at 11:53 AM on January 26, 2021: member

    How often is it failing?

    On Tue, Jan 26, 2021, 7:30 PM practicalswift notifications@github.com wrote:

    What about temporarily disabling interface_zmq.py in CI until this is fixed?

    It seems to me that interface_zmq.py as it is currently working is a net negative from a testing perspective due to its extreme flakiness :)

    — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/issues/20934#issuecomment-767482690, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAFU3JOQHWO4XZFSHAKP3S32RTTANCNFSM4WCTAKVQ .

  16. MarcoFalke commented at 11:57 AM on January 26, 2021: member

    Of the last 571 failures, 22 are from the interface_zmq.py functional tests (3.8%). According to the numbers, it's the flakiest functional tests we have.

    (quote from @adamjonas )

  17. MarcoFalke closed this on Feb 16, 2021

  18. sidhujag referenced this in commit 31ef542332 on Feb 16, 2021
  19. adamjonas reopened this on Mar 1, 2021

  20. adamjonas commented at 4:03 PM on March 1, 2021: member

    interface_zmq.py flakiness is back and I think #21008 is hurting more than helping.

    Before merge of #21008 on 2/16 (Feb 12-15): Failed 1 time on 1 PR (1,274 bullds)

    Same Friday to Monday time period after merge (Feb 19-22): Failed 11 times across different 9 PRs (1,470 total builds)

  21. MarcoFalke closed this on Mar 2, 2021

  22. MarcoFalke commented at 10:31 AM on March 2, 2021: member

    Fixed in #21216 ?

  23. adamjonas commented at 10:59 PM on March 2, 2021: member

    ref #21310

  24. Fabcien referenced this in commit 30b874af38 on Nov 30, 2021
  25. bitcoin locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:54 UTC