test: Build fuzz targets into seperate executables #15043

pull MarcoFalke wants to merge 2 commits into bitcoin:master from MarcoFalke:Mf1812-buildFuzzTargets changing 7 files +531 −243
  1. MarcoFalke commented at 4:59 PM on December 27, 2018: member

    Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...

    Solve that by building each fuzz target into their own executable.

  2. MarcoFalke added the label Tests on Dec 27, 2018
  3. MarcoFalke added the label Build system on Dec 27, 2018
  4. MarcoFalke added the label Needs gitian build on Dec 27, 2018
  5. MarcoFalke force-pushed on Dec 27, 2018
  6. MarcoFalke force-pushed on Dec 27, 2018
  7. practicalswift commented at 5:39 PM on December 27, 2018: contributor

    Concept ACK

    Thanks for doing this!

  8. MarcoFalke force-pushed on Dec 27, 2018
  9. MarcoFalke force-pushed on Dec 27, 2018
  10. in src/test/test_bitcoin_fuzzy-blockheader_deserialize.cpp:16 in 0293c5d91e outdated
      11 | +    CDataStream ds(buffer, SER_NETWORK, INIT_PROTO_VERSION);
      12 | +    try {
      13 | +        int nVersion;
      14 | +        ds >> nVersion;
      15 | +        ds.SetVersion(nVersion);
      16 | +    } catch (const std::ios_base::failure& e) {
    


    practicalswift commented at 10:33 AM on December 28, 2018:

    Drop the unused e variable. Applies throughout this PR.

  11. in src/test/test_bitcoin_fuzzy-transaction_deserialize.cpp:9 in 0293c5d91e outdated
       0 | @@ -0,0 +1,25 @@
       1 | +// Copyright (c) 2009-2018 The Bitcoin Core developers
       2 | +// Distributed under the MIT software license, see the accompanying
       3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 | +
       5 | +#include <primitives/transaction.h>
       6 | +
       7 | +#include <test/test_bitcoin_fuzzy.hpp>
       8 | +
       9 | +static void test_one_input(std::vector<uint8_t> buffer)
    


    practicalswift commented at 10:35 AM on December 28, 2018:

    Make buffer const reference. Applies throughout this PR :-)

  12. in src/test/test_bitcoin_fuzzy.hpp:51 in 0293c5d91e outdated
      46 | +    test_one_input(std::vector<uint8_t>(data, data + size));
      47 | +    return 0;
      48 | +}
      49 | +
      50 | +// This function is used by libFuzzer
      51 | +extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv)
    


    practicalswift commented at 10:36 AM on December 28, 2018:

    Nit: Could drop unused variable names argc and argv.


    fanquake commented at 10:46 AM on December 28, 2018:
  13. MarcoFalke commented at 11:25 AM on December 28, 2018: member

    Would be nice if someone could look at the build system changes, since the code is mostly just moved around.

  14. DrahtBot commented at 11:56 AM on December 28, 2018: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
    • #13989 (add avx512 instrinsic by fingera)
    • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  15. practicalswift cross-referenced this on Dec 29, 2018 from issue Utils and libraries: Drops the boost/algorithm/string/split.hpp dependency by l2a5b1
  16. practicalswift cross-referenced this on Dec 30, 2018 from issue Offering my Bitcoin fuzzers by guidovranken
  17. MarcoFalke force-pushed on Dec 31, 2018
  18. MarcoFalke force-pushed on Dec 31, 2018
  19. MarcoFalke force-pushed on Dec 31, 2018
  20. MarcoFalke force-pushed on Dec 31, 2018
  21. MarcoFalke force-pushed on Dec 31, 2018
  22. MarcoFalke force-pushed on Dec 31, 2018
  23. MarcoFalke force-pushed on Dec 31, 2018
  24. MarcoFalke force-pushed on Dec 31, 2018
  25. laanwj commented at 11:53 AM on January 2, 2019: member

    Concept ACK

    I think this should be behind a configure flag, building so many executables is going to be slow when statically linking, or with slow filesystems, and it contributes nothing to testing unless the user is planning to do fuzzing.

  26. cjdelisle commented at 12:35 PM on January 2, 2019: none

    As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you're proposing to change.

    I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer's ability to efficiently find paths.

    My reasoning is as follows:

    1. Personally, I find it better to allow someone with a large machine to be able to run a simple command and begin fuzzing with minimal effort, if each case is fuzzed separately then this is not really practical.
    2. Secondly, I personally am a big fan of maximum fuzz coverage, which means any test that can plausibly make use of randomized data should be accessible from the fuzzer. So clearly I would not want to require people to launch each test separately.
    3. It seems like if one would like to fuzz a particular test case, their need can be easily supported by allowing them to pass an argument to the running which causes it to return 0 if the test is specified to anything else, thus the fuzzer will quickly prune attempts which touch other test cases.
    4. If there is a large corpus of generated test data, it seems that one could use a simple python script to rename the test files to contain the name of the test case which they're touching, solving the concern about organization.

    Thanks, Caleb

  27. Crypt-iQ commented at 9:25 PM on January 2, 2019: contributor

    Hey @cjdelisle,

    I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target.

    If we have two inputs, input A meant for address_deserialize and input B meant for banentry_deserialize and they are spliced together to create input C (which is a nonsense input), input C could be selected to be in the queue for further mutation if it provides the same edge coverage as input A or B (depending on its prefix) and cause even more of an efficiency problem. The splicing mutation is really meant to be used on similar, but diverse test cases.

    Also I think seeding the fuzzer with "bad" corpus inputs in general isn't a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag.

    Anyways I'm pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf's blog, and the discussion on https://github.com/bitcoin/bitcoin/issues/11045

  28. practicalswift commented at 11:29 PM on January 2, 2019: contributor

    @cjdelisle See @kcc:s comment in #11045 (comment) :-)

  29. cjdelisle commented at 8:25 AM on January 3, 2019: none

    Thanks @Crypt-iQ and @practicalswift , for those who weren't following carefully, my understanding is that it is the recommendation of Google's OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:

    • Some of the fuzzers have O(executable_size) performance
    • Forcing all fuzzing to enter through one entry-point multiplies the number of possible search paths and fuzzers have limited memory

    So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it's always better to do what works.

  30. Crypt-iQ commented at 3:34 PM on January 3, 2019: contributor

    I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included? Maybe this can be done in another PR.

    See @kcc comment: #11045 (comment)

    OSS Fuzz recommendations: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

  31. in doc/fuzzing.md:29 in f0d75314cb outdated
      25 | @@ -26,7 +26,7 @@ To build Bitcoin Core using AFL instrumentation (this assumes that the
      26 |  ./configure --disable-ccache --disable-shared --enable-tests CC=${AFLPATH}/afl-gcc CXX=${AFLPATH}/afl-g++
      27 |  export AFL_HARDEN=1
      28 |  cd src/
      29 | -make test/test_bitcoin_fuzzy
      30 | +make -C src/test
    


    Crypt-iQ commented at 4:11 PM on January 4, 2019:

    User is already in the src directory so this will fail. I ran make with no arguments in the src directory and it built the fuzz targets.


    MarcoFalke commented at 7:19 PM on January 14, 2019:

    Done (changed it to make)

  32. laanwj commented at 1:45 PM on January 8, 2019: member

    I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included?

    The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here.

  33. Crypt-iQ commented at 3:28 PM on January 8, 2019: contributor

    @laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing.

  34. laanwj cross-referenced this on Jan 9, 2019 from issue docs: Remove outdated link in doc/fuzzing.md by RandyMcMillan
  35. laanwj commented at 6:59 PM on January 9, 2019: member

    Probably want to remove the links to test inputs from doc/fuzzing.md here, as they'll no longer work as-is and one of the links is broken (#15028).

  36. DrahtBot added the label Needs rebase on Jan 10, 2019
  37. DrahtBot removed the label Needs gitian build on Jan 11, 2019
  38. MarcoFalke force-pushed on Jan 14, 2019
  39. MarcoFalke force-pushed on Jan 14, 2019
  40. MarcoFalke force-pushed on Jan 14, 2019
  41. MarcoFalke added the label Needs gitian build on Jan 14, 2019
  42. bitcoin deleted a comment on Jan 14, 2019
  43. DrahtBot removed the label Needs rebase on Jan 14, 2019
  44. DrahtBot removed the label Needs gitian build on Jan 15, 2019
  45. bitcoin deleted a comment on Jan 15, 2019
  46. MarcoFalke force-pushed on Jan 15, 2019
  47. MarcoFalke force-pushed on Jan 15, 2019
  48. MarcoFalke force-pushed on Jan 16, 2019
  49. MarcoFalke force-pushed on Jan 16, 2019
  50. MarcoFalke force-pushed on Jan 16, 2019
  51. MarcoFalke commented at 4:20 AM on January 16, 2019: member

    @laanwj The previous corpus can still be used by stripping the first few (4?) bytes off of the seeds. I will do that (and move them to a separate repo) after this pull has been merged.

  52. MarcoFalke commented at 7:19 PM on January 24, 2019: member

    This hasn't received any review, but I am going to merge it tomorrow unless someone objects to that.

  53. sipa commented at 12:43 AM on January 25, 2019: member

    Wouldn't it be easier to accomplish this by having a since source file for all fuzz target, but with a macro define (-D... argument) to select which fuzzer is invoked by main?

  54. MarcoFalke commented at 1:58 AM on January 25, 2019: member

    @sipa Wouldn't that require to pass in and compile for each fuzz target? That seems to move the verbosity to the user/ci script/fuzz runner.

    Also having a single source file makes it harder to see what each single fuzz test does (and depends on).

  55. sipa commented at 4:32 AM on January 25, 2019: member

    @MarcoFalke The build system would automatically build all of them, supplying the necessary -D arguments for each binary.

    This would avoid all the boilerplate in all those source files.

  56. MarcoFalke force-pushed on Jan 25, 2019
  57. MarcoFalke force-pushed on Jan 25, 2019
  58. MarcoFalke commented at 11:07 PM on January 25, 2019: member

    Ah, I see. Done and removed 400 lines of boilerplate and headers.

  59. MarcoFalke force-pushed on Jan 25, 2019
  60. [test] fuzz: make test_one_input return void
    The return value is always 0 and not used, so might as well return void
    fab4bed68a
  61. MarcoFalke force-pushed on Jan 26, 2019
  62. MarcoFalke force-pushed on Jan 26, 2019
  63. MarcoFalke commented at 12:10 AM on January 26, 2019: member

    Now split into two commits, where the top commit is some move-only:

    git diff 2ca632e5b44a8385989c8539cc4e30e60fdee16c~ 2ca632e5b44a8385989c8539cc4e30e60fdee16c --color-moved=dimmed-zebra src/test
    
  64. MarcoFalke added the label Needs gitian build on Jan 26, 2019
  65. DrahtBot removed the label Needs gitian build on Jan 28, 2019
  66. test: Build fuzz targets into seperate executables 2ca632e5b4
  67. in configure.ac:150 in 02606835e6 outdated
     145 | @@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests],
     146 |      [use_extended_functional_tests=$enableval],
     147 |      [use_extended_functional_tests=no])
     148 |  
     149 | +AC_ARG_ENABLE([fuzz],
     150 | +    AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]),
    


    practicalswift commented at 7:08 PM on January 28, 2019:

    Bikeshedding perhaps but --enable-fuzzing feels more natural to me than --enable-fuzz.

  68. MarcoFalke force-pushed on Jan 30, 2019
  69. bitcoin deleted a comment on Jan 30, 2019
  70. in src/Makefile.test.include:177 in 2ca632e5b4
     194 | +if ENABLE_FUZZ
     195 | +test_fuzz_block_deserialize_SOURCES = $(FUZZ_SUITE) test/test_bitcoin_fuzzy.cpp
     196 | +test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1
     197 | +test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
     198 | +test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
     199 | +test_fuzz_block_deserialize_LDADD = \
    


    ryanofsky commented at 6:38 PM on January 30, 2019:

    You could deduplicate some boilerplate by defining common variables like:

    fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
    fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)
    

    and then individual targets could be shortened to:

    test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
    test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
    ...
    
    test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
    test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
    ...
    

    At least this is what I did in 2060a30063866ed1599134fe28846a69deda773c for #10102. One catch is that LDFLAGS, LDADD, etc suffix can't be capitalized or the variables will be treated specially by automake.

  71. in src/test/test_bitcoin_fuzzy.cpp:40 in 2ca632e5b4
      84 |      }
      85 |  
      86 | -    switch(test_id) {
      87 | -        case CBLOCK_DESERIALIZE:
      88 | -        {
      89 | +#if BLOCK_DESERIALIZE
    


    ryanofsky commented at 6:43 PM on January 30, 2019:

    I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don't seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.


    MarcoFalke commented at 8:00 PM on January 30, 2019:

    This was my initial solution, but I changed it back to minimize the diff

  72. ryanofsky approved
  73. ryanofsky commented at 6:47 PM on January 30, 2019: contributor

    utACK 2ca632e5b44a8385989c8539cc4e30e60fdee16c

  74. laanwj merged this on Jan 30, 2019
  75. laanwj closed this on Jan 30, 2019

  76. laanwj referenced this in commit 9553102c38 on Jan 30, 2019
  77. MarcoFalke deleted the branch on Jan 30, 2019
  78. ryanofsky cross-referenced this on Jan 31, 2019 from issue Add fee_est tool for debugging fee estimation code by ryanofsky
  79. Sjors cross-referenced this on Feb 1, 2019 from issue OSX compile issue - configure: error: No working boost sleep implementation found. by satindergrewal
  80. practicalswift cross-referenced this on Feb 8, 2019 from issue Feature request: Make Bitcoin libFuzzer-friendly and consider integration into the OSS-Fuzz project by practicalswift
  81. furszy cross-referenced this on Mar 18, 2021 from issue Fuzzing framework support by furszy
  82. random-zebra referenced this in commit 44b5327e61 on May 28, 2021
  83. kwvg referenced this in commit 8614068123 on Aug 2, 2021
  84. kwvg referenced this in commit cdd3ee5626 on Aug 5, 2021
  85. kwvg referenced this in commit cacbbffa90 on Aug 5, 2021
  86. PastaPastaPasta referenced this in commit 9c16a4122d on Aug 6, 2021
  87. kwvg referenced this in commit de2b7eee83 on Aug 8, 2021
  88. kwvg referenced this in commit 0e7fe9e6ab on Aug 11, 2021
  89. PastaPastaPasta referenced this in commit 90e7119a8b on Aug 11, 2021
  90. 5tefan referenced this in commit 8f4c933f3e on Aug 12, 2021
  91. bitcoin locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-19 06:54 UTC