Use std::unordered_set instead of set in blockfilter interface #14074

pull jimpo wants to merge 2 commits into bitcoin:master from jimpo:blockfilter-unordered-set changing 18 files +283 −208
  1. jimpo commented at 7:04 PM on August 26, 2018: contributor

    Use std::unordered_set (hash set) instead of std::set (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven't measured.

    This moves CSipHasher to it's own file crypto/siphash.h, so that it can be used in the libbitcoin_util library without including hash.{h,cpp}. I'm open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

  2. DrahtBot commented at 8:31 PM on August 26, 2018: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #14387 (Faster Input Deduplication Algorithm by JeremyRubin)
    • #14224 (Document intentional and unintentional unsigned integer overflows (wraparounds) using annotations by practicalswift)
    • #14121 (Index for BIP 157 block filters by jimpo)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot cross-referenced this on Aug 26, 2018 from issue utils: Convert Windows args to utf-8 string by ken2812221
  4. DrahtBot cross-referenced this on Aug 26, 2018 from issue Docs: Create default bitcoin.conf file on startup by leishman
  5. DrahtBot cross-referenced this on Aug 26, 2018 from issue -masterdatadir for datadir bootstrapping by kallewoof
  6. DrahtBot cross-referenced this on Aug 26, 2018 from issue bitcoin-cli: -stdinwalletpassphrase and non-echo stdin passwords by kallewoof
  7. fanquake added the label Refactoring on Aug 26, 2018
  8. in src/util.h:362 in aa370700df outdated
     357 | +    uint64_t m_k0, m_k1;
     358 | +
     359 | +public:
     360 | +    ByteVectorHash();
     361 | +    size_t operator()(const std::vector<unsigned char>& input) const;
     362 | +};
    


    laanwj commented at 6:41 PM on August 27, 2018:

    I don't think util.h is the best place for a generic utility data structure such as this, it's more for assorted operating system functions


    jimpo commented at 9:47 PM on August 27, 2018:

    @laanwj What would be the right place? A new file utiltypes.{h,cpp}? Or util/types.{h,cpp}?


    laanwj commented at 10:32 AM on August 28, 2018:

    or maybe support/bytevectorhash.h ? unless there's a strong need to group this with other things

  9. in src/crypto/siphash.h:1 in aa370700df outdated
       0 | @@ -0,0 +1,47 @@
       1 | +// Copyright (c) 2016-2018 The Bitcoin Core developers
    


    laanwj commented at 6:41 PM on August 27, 2018:

    ACK on moving siphash to a separate unit

  10. jimpo force-pushed on Aug 28, 2018
  11. laanwj commented at 12:51 PM on August 31, 2018: member

    utACK f4608359e4e643c6e197df822e03226146e37a49 verified move-onlyness of extraction commit

  12. DrahtBot cross-referenced this on Aug 31, 2018 from issue Index for BIP 157 block filters by jimpo
  13. in src/support/bytevectorhash.cpp:7 in f4608359e4 outdated
       0 | @@ -0,0 +1,18 @@
       1 | +// Copyright (c) 2018 The Bitcoin Core developers
       2 | +// Distributed under the MIT software license, see the accompanying
       3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 | +
       5 | +#include <crypto/siphash.h>
       6 | +#include <support/bytevectorhash.h>
       7 | +#include <random.h>
    


    practicalswift commented at 6:37 AM on September 10, 2018:
    src/support/bytevectorhash.cpp:6:1: warning: #includes are not sorted properly [llvm-include-order]
    
  14. in src/crypto/siphash.cpp:25 in f4608359e4 outdated
      20 | +    v[0] = 0x736f6d6570736575ULL ^ k0;
      21 | +    v[1] = 0x646f72616e646f6dULL ^ k1;
      22 | +    v[2] = 0x6c7967656e657261ULL ^ k0;
      23 | +    v[3] = 0x7465646279746573ULL ^ k1;
      24 | +    count = 0;
      25 | +    tmp = 0;
    


    practicalswift commented at 6:41 AM on September 10, 2018:

    Initialize count and tmp using default member initializers or in the member initializer list of the constructor?


    jimpo commented at 6:39 PM on September 10, 2018:

    This is a move-only commit. I'd rather not modify the contents of the code here.


    practicalswift commented at 7:55 PM on September 10, 2018:

    Makes sense!

  15. laanwj commented at 8:11 PM on September 10, 2018: member

    @ryanofsky can you take a look here please? according to @jimpo this was your suggestion

  16. DrahtBot cross-referenced this on Sep 15, 2018 from issue Document intentional and unintentional unsigned integer overflows (wraparounds) using annotations by practicalswift
  17. DrahtBot cross-referenced this on Oct 4, 2018 from issue Faster Input Deduplication Algorithm by JeremyRubin
  18. in src/support/bytevectorhash.h:9 in f4608359e4 outdated
       0 | @@ -0,0 +1,23 @@
       1 | +// Copyright (c) 2018 The Bitcoin Core developers
       2 | +// Distributed under the MIT software license, see the accompanying
       3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 | +
       5 | +#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
       6 | +#define BITCOIN_SUPPORT_BYTEVECTORHASH_H
       7 | +
       8 | +/**
       9 | + * Implementation of Hash named requirement for a byte vector. This may be used the hash function in
    


    promag commented at 9:37 PM on October 8, 2018:

    This may be used as the hash.. ?

  19. in src/support/bytevectorhash.h:15 in f4608359e4 outdated
      10 | + * std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
      11 | + * instance of SipHash-2-4 is used.
      12 | + */
      13 | +class ByteVectorHash
      14 | +{
      15 | +private:
    


    promag commented at 9:41 PM on October 8, 2018:

    Drop private.


    jimpo commented at 1:34 AM on October 9, 2018:

    I think it's more clear to leave it explicit even if the default for class is private.

  20. in src/txmempool.h:18 in f4608359e4 outdated
      14 | @@ -15,6 +15,7 @@
      15 |  
      16 |  #include <amount.h>
      17 |  #include <coins.h>
      18 | +#include <crypto/siphash.h>
    


    promag commented at 9:50 PM on October 8, 2018:

    Can be removed?


    jimpo commented at 3:57 PM on October 10, 2018:

    It's used in SaltedTxidHasher

  21. in src/util.cpp:9 in f4608359e4 outdated
       5 | @@ -6,6 +6,7 @@
       6 |  #include <util.h>
       7 |  
       8 |  #include <chainparamsbase.h>
       9 | +#include <crypto/siphash.h>
    


    promag commented at 9:56 PM on October 8, 2018:

    Can be removed?

  22. in src/support/bytevectorhash.h:8 in f4608359e4 outdated
       0 | @@ -0,0 +1,23 @@
       1 | +// Copyright (c) 2018 The Bitcoin Core developers
       2 | +// Distributed under the MIT software license, see the accompanying
       3 | +// file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 | +
       5 | +#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
       6 | +#define BITCOIN_SUPPORT_BYTEVECTORHASH_H
       7 | +
       8 | +/**
    


    promag commented at 9:57 PM on October 8, 2018:

    Missing includes <stdint.h> and <vector>.

  23. in src/support/bytevectorhash.h:13 in f4608359e4 outdated
       8 | +/**
       9 | + * Implementation of Hash named requirement for a byte vector. This may be used the hash function in
      10 | + * std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
      11 | + * instance of SipHash-2-4 is used.
      12 | + */
      13 | +class ByteVectorHash
    


    promag commented at 9:58 PM on October 8, 2018:

    nit, final.

  24. promag commented at 10:37 PM on October 8, 2018: member

    @jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?

    Do you have numbers to support this change?

    Left some nits.

  25. in src/support/bytevectorhash.h:10 in f4608359e4 outdated
       5 | +#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
       6 | +#define BITCOIN_SUPPORT_BYTEVECTORHASH_H
       7 | +
       8 | +/**
       9 | + * Implementation of Hash named requirement for a byte vector. This may be used the hash function in
      10 | + * std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
    


    ryanofsky commented at 12:58 AM on October 9, 2018:

    In commit "blockfilter: Use unordered_set instead of set in blockfilter." (f4608359e4e643c6e197df822e03226146e37a49)

    The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>, prevector, or std::string, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span conversion.

    I'd suggest:

    1. Mentioning in this comment that operator() overloads for new types could be added here in the future.
    2. Changing the name of the class from ByteVectorHash to something like RandomizedSipHash to avoid tying it to std::vector.
    3. Maybe replacing std::vector with Span.

    jimpo commented at 4:07 PM on October 10, 2018:

    I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.

  26. ryanofsky approved
  27. ryanofsky commented at 2:07 AM on October 9, 2018: contributor

    utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.

  28. jimpo force-pushed on Oct 10, 2018
  29. jimpo force-pushed on Oct 10, 2018
  30. etscrivner commented at 10:21 PM on October 10, 2018: contributor

    utACK cfabca8749a132220e649d4563fb876afc21ceec

  31. sipa commented at 9:38 AM on October 11, 2018: member

    Just one not comment: I always imagined the things in 'support' as low level features that are independently usable, and find it a bit strange to put something there that depends on random and crypto/siphash. Perhaps others have a different understanding, as I don't think it was ever written down somewhere what it was for, but I would just put it in src/.

  32. jimpo commented at 7:37 PM on October 11, 2018: contributor

    @sipa I'd prefer to start creating more directory structure. How would you feel about a util/ directory instead and we can put it in there?

  33. sipa commented at 8:01 PM on October 12, 2018: member

    @jimpo Sounds great, but perhaps something to do independently? I'd prefer to not start a new convention, and then have it linger in a half finished state if there's unclarity about how to move forward.

  34. ryanofsky commented at 6:00 PM on October 19, 2018: contributor

    utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final specifier, and tweaking includes.

  35. jimpo cross-referenced this on Oct 23, 2018 from issue Move util files to directory by jimpo
  36. laanwj referenced this in commit bccb4d29a8 on Nov 5, 2018
  37. Extract CSipHasher to it's own file in crypto/ directory.
    This is a move-only commit with the exception of changes to includes.
    4fb789e9b2
  38. blockfilter: Use unordered_set instead of set in blockfilter. fef5adcc33
  39. jimpo force-pushed on Nov 5, 2018
  40. jimpo commented at 11:12 PM on November 5, 2018: contributor

    @sipa I moved bytevectorhash to util/ now.

  41. laanwj commented at 2:36 PM on November 6, 2018: member

    utACK fef5adcc331c4d7b92b71e03fc8a73343a865599

  42. laanwj merged this on Nov 6, 2018
  43. laanwj closed this on Nov 6, 2018

  44. laanwj referenced this in commit 880bc728b4 on Nov 6, 2018
  45. jimpo deleted the branch on Nov 6, 2018
  46. kwvg referenced this in commit 904da1fca5 on Jun 15, 2021
  47. kwvg referenced this in commit 7c564a4316 on Jun 15, 2021
  48. kwvg referenced this in commit a4ee94238b on Jun 16, 2021
  49. kwvg cross-referenced this on Jun 16, 2021 from issue partial bitcoin#15638, #21966, #16889, merge #14555, #20499, #14074, #17073: util refactoring by kwvg
  50. kwvg referenced this in commit abf1c91bf2 on Jun 25, 2021
  51. kwvg referenced this in commit f90ceb395a on Jun 25, 2021
  52. kwvg referenced this in commit fa92d7eb88 on Jun 26, 2021
  53. kwvg referenced this in commit 67bb702341 on Jun 26, 2021
  54. kwvg referenced this in commit 9a5a285cfd on Jun 27, 2021
  55. kwvg referenced this in commit 680319643f on Jun 27, 2021
  56. UdjinM6 referenced this in commit 7d664c7c53 on Jun 27, 2021
  57. bitcoin locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:54 UTC