test: Build fuzz targets into seperate executables #15043

pull MarcoFalke wants to merge 2 commits into bitcoin:master from MarcoFalke:Mf1812-buildFuzzTargets changing 7 files +531 −243

MarcoFalke commented at 4:59 PM on December 27, 2018: member

Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to "learn" how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might "leak" into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target...

Solve that by building each fuzz target into their own executable.
MarcoFalke added the label Tests on Dec 27, 2018
MarcoFalke added the label Build system on Dec 27, 2018
MarcoFalke added the label Needs gitian build on Dec 27, 2018
MarcoFalke force-pushed on Dec 27, 2018
MarcoFalke force-pushed on Dec 27, 2018
practicalswift commented at 5:39 PM on December 27, 2018: contributor

Concept ACK

Thanks for doing this!
MarcoFalke force-pushed on Dec 27, 2018
MarcoFalke force-pushed on Dec 27, 2018

in src/test/test_bitcoin_fuzzy-blockheader_deserialize.cpp:16 in 0293c5d91e outdated

  11 | +    CDataStream ds(buffer, SER_NETWORK, INIT_PROTO_VERSION);
  12 | +    try {
  13 | +        int nVersion;
  14 | +        ds >> nVersion;
  15 | +        ds.SetVersion(nVersion);
  16 | +    } catch (const std::ios_base::failure& e) {

practicalswift commented at 10:33 AM on December 28, 2018:

Drop the unused e variable. Applies throughout this PR.

in src/test/test_bitcoin_fuzzy-transaction_deserialize.cpp:9 in 0293c5d91e outdated

   0 | @@ -0,0 +1,25 @@
   1 | +// Copyright (c) 2009-2018 The Bitcoin Core developers
   2 | +// Distributed under the MIT software license, see the accompanying
   3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
   4 | +
   5 | +#include <primitives/transaction.h>
   6 | +
   7 | +#include <test/test_bitcoin_fuzzy.hpp>
   8 | +
   9 | +static void test_one_input(std::vector<uint8_t> buffer)

practicalswift commented at 10:35 AM on December 28, 2018:

Make buffer const reference. Applies throughout this PR :-)

in src/test/test_bitcoin_fuzzy.hpp:51 in 0293c5d91e outdated
```
  46 | +    test_one_input(std::vector<uint8_t>(data, data + size));
  47 | +    return 0;
  48 | +}
  49 | +
  50 | +// This function is used by libFuzzer
  51 | +extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv)
```
practicalswift commented at 10:36 AM on December 28, 2018:

Nit: Could drop unused variable names argc and argv.

fanquake commented at 10:46 AM on December 28, 2018:

Related: https://llvm.org/docs/LibFuzzer.html#startup-initialization
MarcoFalke commented at 11:25 AM on December 28, 2018: member

Would be nice if someone could look at the build system changes, since the code is mostly just moved around.
DrahtBot commented at 11:56 AM on December 28, 2018: contributor


The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.



Conflicts

Reviewers, this pull request conflicts with the following ones:
- #14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
- #13989 (add avx512 instrinsic by fingera)
- #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
practicalswift cross-referenced this on Dec 29, 2018 from issue Utils and libraries: Drops the boost/algorithm/string/split.hpp dependency by l2a5b1
practicalswift cross-referenced this on Dec 30, 2018 from issue Offering my Bitcoin fuzzers by guidovranken
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
MarcoFalke force-pushed on Dec 31, 2018
laanwj commented at 11:53 AM on January 2, 2019: member

Concept ACK

I think this should be behind a configure flag, building so many executables is going to be slow when statically linking, or with slow filesystems, and it contributes nothing to testing unless the user is planning to do fuzzing.
cjdelisle commented at 12:35 PM on January 2, 2019: none
As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you're proposing to change.

I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer's ability to efficiently find paths.

My reasoning is as follows:
1. Personally, I find it better to allow someone with a large machine to be able to run a simple command and begin fuzzing with minimal effort, if each case is fuzzed separately then this is not really practical.
2. Secondly, I personally am a big fan of maximum fuzz coverage, which means any test that can plausibly make use of randomized data should be accessible from the fuzzer. So clearly I would not want to require people to launch each test separately.
3. It seems like if one would like to fuzz a particular test case, their need can be easily supported by allowing them to pass an argument to the running which causes it to return 0 if the test is specified to anything else, thus the fuzzer will quickly prune attempts which touch other test cases.
4. If there is a large corpus of generated test data, it seems that one could use a simple python script to rename the test files to contain the name of the test case which they're touching, solving the concern about organization.
Thanks, Caleb
Crypt-iQ commented at 9:25 PM on January 2, 2019: contributor

Hey @cjdelisle,

I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target.

If we have two inputs, input A meant for address_deserialize and input B meant for banentry_deserialize and they are spliced together to create input C (which is a nonsense input), input C could be selected to be in the queue for further mutation if it provides the same edge coverage as input A or B (depending on its prefix) and cause even more of an efficiency problem. The splicing mutation is really meant to be used on similar, but diverse test cases.

Also I think seeding the fuzzer with "bad" corpus inputs in general isn't a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag.

Anyways I'm pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf's blog, and the discussion on https://github.com/bitcoin/bitcoin/issues/11045
practicalswift commented at 11:29 PM on January 2, 2019: contributor

@cjdelisle See @kcc:s comment in #11045 (comment) :-)
cjdelisle commented at 8:25 AM on January 3, 2019: none
Thanks @Crypt-iQ and @practicalswift , for those who weren't following carefully, my understanding is that it is the recommendation of Google's OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:
- Some of the fuzzers have O(executable_size) performance
- Forcing all fuzzing to enter through one entry-point multiplies the number of possible search paths and fuzzers have limited memory
So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it's always better to do what works.
Crypt-iQ commented at 3:34 PM on January 3, 2019: contributor

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included? Maybe this can be done in another PR.

See @kcc comment: #11045 (comment)

OSS Fuzz recommendations: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

in doc/fuzzing.md:29 in f0d75314cb outdated

  25 | @@ -26,7 +26,7 @@ To build Bitcoin Core using AFL instrumentation (this assumes that the
  26 |  ./configure --disable-ccache --disable-shared --enable-tests CC=${AFLPATH}/afl-gcc CXX=${AFLPATH}/afl-g++
  27 |  export AFL_HARDEN=1
  28 |  cd src/
  29 | -make test/test_bitcoin_fuzzy
  30 | +make -C src/test

Crypt-iQ commented at 4:11 PM on January 4, 2019:

User is already in the src directory so this will fail. I ran make with no arguments in the src directory and it built the fuzz targets.

MarcoFalke commented at 7:19 PM on January 14, 2019:

Done (changed it to make)

laanwj commented at 1:45 PM on January 8, 2019: member

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it's not currently included?

The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here.
Crypt-iQ commented at 3:28 PM on January 8, 2019: contributor

@laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing.
laanwj cross-referenced this on Jan 9, 2019 from issue docs: Remove outdated link in doc/fuzzing.md by RandyMcMillan
laanwj commented at 6:59 PM on January 9, 2019: member

Probably want to remove the links to test inputs from doc/fuzzing.md here, as they'll no longer work as-is and one of the links is broken (#15028).
DrahtBot added the label Needs rebase on Jan 10, 2019
DrahtBot removed the label Needs gitian build on Jan 11, 2019
MarcoFalke force-pushed on Jan 14, 2019
MarcoFalke force-pushed on Jan 14, 2019
MarcoFalke force-pushed on Jan 14, 2019
MarcoFalke added the label Needs gitian build on Jan 14, 2019
bitcoin deleted a comment on Jan 14, 2019
DrahtBot removed the label Needs rebase on Jan 14, 2019
DrahtBot removed the label Needs gitian build on Jan 15, 2019
bitcoin deleted a comment on Jan 15, 2019
MarcoFalke force-pushed on Jan 15, 2019
MarcoFalke force-pushed on Jan 15, 2019
MarcoFalke force-pushed on Jan 16, 2019
MarcoFalke force-pushed on Jan 16, 2019
MarcoFalke force-pushed on Jan 16, 2019
MarcoFalke commented at 4:20 AM on January 16, 2019: member

@laanwj The previous corpus can still be used by stripping the first few (4?) bytes off of the seeds. I will do that (and move them to a separate repo) after this pull has been merged.
MarcoFalke commented at 7:19 PM on January 24, 2019: member

This hasn't received any review, but I am going to merge it tomorrow unless someone objects to that.
sipa commented at 12:43 AM on January 25, 2019: member

Wouldn't it be easier to accomplish this by having a since source file for all fuzz target, but with a macro define (-D... argument) to select which fuzzer is invoked by main?
MarcoFalke commented at 1:58 AM on January 25, 2019: member

@sipa Wouldn't that require to pass in and compile for each fuzz target? That seems to move the verbosity to the user/ci script/fuzz runner.

Also having a single source file makes it harder to see what each single fuzz test does (and depends on).
sipa commented at 4:32 AM on January 25, 2019: member

@MarcoFalke The build system would automatically build all of them, supplying the necessary -D arguments for each binary.

This would avoid all the boilerplate in all those source files.
MarcoFalke force-pushed on Jan 25, 2019
MarcoFalke force-pushed on Jan 25, 2019
MarcoFalke commented at 11:07 PM on January 25, 2019: member

Ah, I see. Done and removed 400 lines of boilerplate and headers.
MarcoFalke force-pushed on Jan 25, 2019

[test] fuzz: make test_one_input return void

The return value is always 0 and not used, so might as well return void

fab4bed68a

MarcoFalke force-pushed on Jan 26, 2019
MarcoFalke force-pushed on Jan 26, 2019

MarcoFalke commented at 12:10 AM on January 26, 2019: member

Now split into two commits, where the top commit is some move-only:

git diff 2ca632e5b44a8385989c8539cc4e30e60fdee16c~ 2ca632e5b44a8385989c8539cc4e30e60fdee16c --color-moved=dimmed-zebra src/test

MarcoFalke added the label Needs gitian build on Jan 26, 2019
DrahtBot removed the label Needs gitian build on Jan 28, 2019
test: Build fuzz targets into seperate executables 2ca632e5b4

in configure.ac:150 in 02606835e6 outdated

 145 | @@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests],
 146 |      [use_extended_functional_tests=$enableval],
 147 |      [use_extended_functional_tests=no])
 148 |  
 149 | +AC_ARG_ENABLE([fuzz],
 150 | +    AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]),

practicalswift commented at 7:08 PM on January 28, 2019:

Bikeshedding perhaps but --enable-fuzzing feels more natural to me than --enable-fuzz.

MarcoFalke force-pushed on Jan 30, 2019
bitcoin deleted a comment on Jan 30, 2019

in src/Makefile.test.include:177 in 2ca632e5b4

 194 | +if ENABLE_FUZZ
 195 | +test_fuzz_block_deserialize_SOURCES = $(FUZZ_SUITE) test/test_bitcoin_fuzzy.cpp
 196 | +test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1
 197 | +test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
 198 | +test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
 199 | +test_fuzz_block_deserialize_LDADD = \

ryanofsky commented at 6:38 PM on January 30, 2019:

You could deduplicate some boilerplate by defining common variables like:

fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)

and then individual targets could be shortened to:

test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
...

test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
...

At least this is what I did in 2060a30063866ed1599134fe28846a69deda773c for #10102. One catch is that LDFLAGS, LDADD, etc suffix can't be capitalized or the variables will be treated specially by automake.

in src/test/test_bitcoin_fuzzy.cpp:40 in 2ca632e5b4
```
  84 |      }
  85 |  
  86 | -    switch(test_id) {
  87 | -        case CBLOCK_DESERIALIZE:
  88 | -        {
  89 | +#if BLOCK_DESERIALIZE
```
ryanofsky commented at 6:43 PM on January 30, 2019:

I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don't seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.

MarcoFalke commented at 8:00 PM on January 30, 2019:

This was my initial solution, but I changed it back to minimize the diff
ryanofsky approved
ryanofsky commented at 6:47 PM on January 30, 2019: contributor

utACK 2ca632e5b44a8385989c8539cc4e30e60fdee16c
laanwj merged this on Jan 30, 2019
laanwj closed this on Jan 30, 2019
laanwj referenced this in commit 9553102c38 on Jan 30, 2019
MarcoFalke deleted the branch on Jan 30, 2019
ryanofsky cross-referenced this on Jan 31, 2019 from issue Add fee_est tool for debugging fee estimation code by ryanofsky
Sjors cross-referenced this on Feb 1, 2019 from issue OSX compile issue - configure: error: No working boost sleep implementation found. by satindergrewal
practicalswift cross-referenced this on Feb 8, 2019 from issue Feature request: Make Bitcoin libFuzzer-friendly and consider integration into the OSS-Fuzz project by practicalswift
furszy cross-referenced this on Mar 18, 2021 from issue Fuzzing framework support by furszy
random-zebra referenced this in commit 44b5327e61 on May 28, 2021
kwvg referenced this in commit 8614068123 on Aug 2, 2021
kwvg referenced this in commit cdd3ee5626 on Aug 5, 2021
kwvg referenced this in commit cacbbffa90 on Aug 5, 2021
PastaPastaPasta referenced this in commit 9c16a4122d on Aug 6, 2021
kwvg referenced this in commit de2b7eee83 on Aug 8, 2021
kwvg referenced this in commit 0e7fe9e6ab on Aug 11, 2021
PastaPastaPasta referenced this in commit 90e7119a8b on Aug 11, 2021
5tefan referenced this in commit 8f4c933f3e on Aug 12, 2021
bitcoin locked this on Dec 16, 2021

Contributors

Labels

Linked (view graph)

#10364 Feature request: Make Bitcoin libFuzzer-friendly and consider integration into the OSS-Fuzz project #10443 Add fee_est tool for debugging fee estimation code #11036 OSX compile issue - configure: error: No working boost sleep implementation found.#11045 Offering my Bitcoin fuzzers #13751 Utils and libraries: Drops the boost/algorithm/string/split.hpp dependency #15028 docs: Remove outdated link in doc/fuzzing.md #33944 Request: fix typos in "release-notes-0.18.0.md" and "release-notes-0.15.0.1.md"

test: Build fuzz targets into seperate executables #15043

Conflicts