refactor: introduce single-separator split helper (boost::split replacement) #22953

pull theStack wants to merge 3 commits into bitcoin:master from theStack:202109-refactor-boost_split_replacement_single_sep changing 14 files +97 −71
  1. theStack commented at 12:17 PM on September 11, 2021: contributor

    This PR adds a simple string split helper SplitString that takes use of the spanparsing Split function that was first introduced in #13697 (commit fe8a7dcd78cfeedc9a7c705e91384f793822912b). This enables to replace most calls to boost::split, in the cases where only a single separator character is used. Note that while previous attempts to replace boost::split were controversial (e.g. #13751), this one has a trivial implementation: it merely uses an internal helper (that is unit tested and in regular use with output descriptiors) and converts its result from spans to strings. As a drawback though, not all boost::split instances can be tackled.

    As a possible optimization, one could return a vector of std::string_views (available since C++17) instead of strings, to avoid copies. This would need more carefulness on the caller sites though, to avoid potential lifetime issues, and it's probably not worth it, considering that none of the places where strings are split are really performance-critical.

  2. fanquake added the label Refactoring on Sep 11, 2021
  3. in src/util/string.h:23 in e332fcd463 outdated
      15 | @@ -15,6 +16,15 @@
      16 |  #include <string>
      17 |  #include <vector>
      18 |  
      19 | +[[nodiscard]] inline std::vector<std::string> SplitString(const std::string& str, char sep)
      20 | +{
      21 | +    std::vector<Span<const char>> span_strs = spanparsing::Split(str, sep);
      22 | +    std::vector<std::string> result;
      23 | +    for (const auto& span_str : span_strs)
    


    MarcoFalke commented at 1:37 PM on September 11, 2021:

    nit: missing {. Also, could move the impl to the cpp file?

        for (const auto& span_str : span_strs) {
    

    theStack commented at 6:22 PM on September 11, 2021:

    Thanks for the review, done (was surprised to see that the string.cpp is actually empty on master).

  4. MarcoFalke commented at 1:38 PM on September 11, 2021: member

    Seems ok to use the already existing split function.

  5. theStack force-pushed on Sep 11, 2021
  6. theStack commented at 6:33 PM on September 11, 2021: contributor

    Force-pushed with changes as suggested by MarcoFalke (https://github.com/bitcoin/bitcoin/pull/22953#discussion_r706614179).

    Pinging @practicalswift as someone who has always shown interest in potential boost replacements and string processing helpers, with lots of contributions in that area. Maybe you feel like taking a look :)

  7. in src/util/string.cpp:8 in e3c7fe4418 outdated
       2 | @@ -3,3 +3,14 @@
       3 |  // file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 |  
       5 |  #include <util/string.h>
       6 | +#include <util/spanparsing.h>
       7 | +
       8 | +[[nodiscard]] std::vector<std::string> SplitString(const std::string& str, char sep)
    


    MarcoFalke commented at 9:32 AM on September 12, 2021:

    Is the annotation needed in the cpp file as well? At least lock annotations are only supposed to go to the declaration.


    theStack commented at 9:53 AM on September 12, 2021:

    Good point. Quickly greping the [[nodiscard]] annotation in our codebase shows that there are barely any occurences in .cpp files (only for static functions). Removed it.

  8. theStack force-pushed on Sep 12, 2021
  9. kiminuo commented at 10:27 AM on September 13, 2021: contributor

    I would still add a unit test even though there are tests for the internal helper. Maybe it's a little bit over the top so feel free to ignore my suggestion.

    (Something like this possibly: https://github.com/kiminuo/bitcoin/commit/ac6dd5a921407c3957a7c15a6a5a82af826decb6)

  10. MarcoFalke commented at 10:34 AM on September 13, 2021: member

    Any reason the tests aren't fixed up as well?

    src/test/fuzz/script_assets_test_minimizer.cpp:    boost::algorithm::split(words, str, boost::algorithm::is_any_of(","));
    

    Also, if you add a unit test, might as extend the string fuzz target.

  11. kiminuo commented at 10:42 AM on September 13, 2021: contributor

    There non-tests occurrences too:

    src\rpc\server.cpp:
      410          std::vector<std::string> vargNames;
      411:         boost::algorithm::split(vargNames, argNamePattern, boost::algorithm::is_any_of("|"));
      412          auto fr = argsIn.end();
    
    src\test\transaction_tests.cpp:
      72      std::vector<std::string> words;
      73:     boost::algorithm::split(words, strFlags, boost::algorithm::is_any_of(","));
      74  
    
    src\test\fuzz\script_assets_test_minimizer.cpp:
      133      std::vector<std::string> words;
      134:     boost::algorithm::split(words, str, boost::algorithm::is_any_of(","));
      135  
    

    If I'm not missing something.

    edit: Ah, I'm not aware what the difference is between boost::algorithm::split and boost::split. Seems like none?

  12. theStack force-pushed on Sep 13, 2021
  13. theStack commented at 11:19 AM on September 13, 2021: contributor

    I would still add a unit test even though there are tests for the internal helper. Maybe it's a little bit over the top so feel free to ignore my suggestion.

    I agree that it's a good idea to add a unit test.

    (Something like this possibly: kiminuo@ac6dd5a)

    Thanks! As quick feedback, can you change the commit subject to be more verbose (e.g. "test: add unit tests for SplitString helper") and put the test into src/test/util_tests.cpp, instead of creating a new file? I'm happy to either add your commit to this PR, or review your PR if you decide to open your own, based on this one.

    Any reason the tests aren't fixed up as well? ...

    There non-tests occurrences too: ...

    Oh, I only greped for boost::split, not being aware that there is also boost::algorithm::split, which seems to be a synonym. Force-pushed with changes to tackle also the remaining occurences. Also found that there were some unnecessary boost includes in src/torcontrol.cpp that I forgot to remove in the previous push.

  14. kiminuo commented at 11:28 AM on September 13, 2021: contributor

    (Something like this possibly: kiminuo@ac6dd5a)

    Thanks! As quick feedback, can you change the commit subject to be more verbose (e.g. "test: add unit tests for SplitString helper") and put the test into src/test/util_tests.cpp, instead of creating a new file? I'm happy to either add your commit to this PR, or review your PR if you decide to open your own, based on this one.

    Modified here: https://github.com/kiminuo/bitcoin/commit/4516727e02feb1338285068541e869c36d8b1368. Anyway, feel free to modify the commit in whatever way you like :)

  15. MarcoFalke commented at 11:47 AM on September 13, 2021: member

    The fuzz test I mentioned: (might not compile)

    diff --git a/src/test/fuzz/string.cpp b/src/test/fuzz/string.cpp
    index 0c1b45b86c..e5a09c0013 100644
    --- a/src/test/fuzz/string.cpp
    +++ b/src/test/fuzz/string.cpp
    @@ -127,6 +127,7 @@ FUZZ_TARGET(string)
             int64_t amount_out;
             (void)ParseFixedPoint(random_string_1, fuzzed_data_provider.ConsumeIntegralInRange<int>(0, 1024), &amount_out);
         }
    +    (void)SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>());
         {
             (void)Untranslated(random_string_1);
             const bilingual_str bs1{random_string_1, random_string_2};
    
  16. theStack commented at 11:54 AM on September 13, 2021: contributor

    The fuzz test I mentioned: (might not compile)

    Oh, I didn't see that in your previous comment. It compiles, I added another commit 👌

  17. in src/util/string.cpp:13 in 25915e4772 outdated
       8 | +std::vector<std::string> SplitString(const std::string& str, char sep)
       9 | +{
      10 | +    std::vector<Span<const char>> span_strs = spanparsing::Split(str, sep);
      11 | +    std::vector<std::string> result;
      12 | +    for (const auto& span_str : span_strs) {
      13 | +        result.push_back(std::string(span_str.begin(), span_str.end()));
    


    martinus commented at 7:14 PM on September 13, 2021:

    You could use emplace_back(span_str.begin(), span_str.end()). That way no temporary string is created and moved. It's also shorter :)


    theStack commented at 2:28 PM on September 14, 2021:

    Good idea, thanks! Done.

  18. in src/util/string.cpp:8 in 25915e4772 outdated
       2 | @@ -3,3 +3,14 @@
       3 |  // file COPYING or http://www.opensource.org/licenses/mit-license.php.
       4 |  
       5 |  #include <util/string.h>
       6 | +#include <util/spanparsing.h>
       7 | +
       8 | +std::vector<std::string> SplitString(const std::string& str, char sep)
    


    martinus commented at 7:26 PM on September 13, 2021:

    Not sure if that's nicer, but another way to implement this would be to make spanparsing::Splita templated function, where the template argument is the vector's output type. If I'm not mistaken this should work for Span, string, string_view.Then it would be possible to directly call e.g. spanparsing::Split<std::string>(x) without the need for a wrapper function.


    theStack commented at 2:32 PM on September 14, 2021:

    Seems like a nice way to solve this! Unfortunately, I fear my C++ template skills are currently too lousy to come up with this. Will give it a try on the weekend. That said, I'm also happy to close this PR for another one with a better solution :)


    martinus commented at 5:28 PM on September 14, 2021:

    I think moving the code into the header and changing it into something like that should work:

    template <typename T = Span<const char>>
    std::vector<T> Split(const Span<const char>& sp, char sep)
    {
        std::vector<T> ret;
        auto it = sp.begin();
        auto start = it;
        while (it != sp.end()) {
            if (*it == sep) {
                ret.emplace_back(start, it);
                start = it + 1;
            }
            ++it;
        }
        ret.emplace_back(start, it);
        return ret;
    }
    

    So the code is practically identical, only the return take is now a template argument. Note that I'm typing this on my phone because I don't have access to a computer the next few days so this probably won't compile :)


    MarcoFalke commented at 9:41 AM on September 20, 2021:

    If you decide to do that, it would be possible to define an alias like so:

    constexpr auto SplitString = &spanparsing::Split<std::string>;
    

    theStack commented at 12:46 PM on September 20, 2021:

    @martinus: Thanks, I will try that out in a bit! @MarcoFalke: That's a good idea with the alias, then I only need to change the implementation and keep the rest as it is now.

  19. theStack force-pushed on Sep 14, 2021
  20. fanquake commented at 2:39 AM on September 16, 2021: member

    Concept ACK - this is low-overhead for dropping more Boost dependence. Surprised we can't also nuke anything from test/lint/lint-includes.sh

  21. laanwj commented at 7:41 PM on September 16, 2021: member

    Concept ACK. I don't think this boost use is very bad (it's a header-only library) but if we can replace it with functionality we already have internally, who not.

    Concept ACK - this is low-overhead for dropping more Boost dependence. Surprised we can't also nuke anything from test/lint/lint-includes.sh

    Well, split is still used in some places (httprpc.cpp for example), this only replaces the single-separator case. It will take some more work to get rid of all boost/algorithm/string.hpp use, if that's worthwhile.

  22. theStack force-pushed on Sep 20, 2021
  23. theStack commented at 2:42 PM on September 20, 2021: contributor

    Force-pushed the alternative variant as suggested by @martinus, changing spanparsing::Split to a templated function and defining an alias SplitString as suggested by @MarcoFalke (see #22953 (review) and #22953 (review)).

  24. in src/test/util_tests.cpp:2397 in ed24bf5989 outdated
    2121 | +
    2122 | +    // Case-sensitivity of the separator.
    2123 | +    {
    2124 | +        std::vector<std::string> result = SplitString("AAA", 'a');
    2125 | +        BOOST_CHECK_EQUAL(result.size(), 1);
    2126 | +        BOOST_CHECK_EQUAL(result[0], "AAA");
    


    MarcoFalke commented at 3:02 PM on September 20, 2021:

    weird issue on macos:

    test/util_tests.cpp:2087: error: in "util_tests/test_SplitString": check result[0] == "" has failed [ != ]
    test/util_tests.cpp:2095: error: in "util_tests/test_SplitString": check result[1] == "" has failed [ != ]
    test/util_tests.cpp:2104: error: in "util_tests/test_SplitString": check result[2] == "" has failed [ != ]
    test/util_tests.cpp:2111: error: in "util_tests/test_SplitString": check result[0] == "abc" has failed [abc != abc]
    test/util_tests.cpp:2119: error: in "util_tests/test_SplitString": check result[1] == "b" has failed [b != b]
    test/util_tests.cpp:2126: error: in "util_tests/test_SplitString": check result[0] == "AAA" has failed [AAA != AAA]
    

    martinus commented at 5:02 PM on September 20, 2021:

    I think the issue here is that the conversion of the string literals to Span<const char> works so that the \0 is part of the span. So for "" the size of the span is actually 1 with content "\0", while for std::string the size would be 0. The tests should work fine for std::string("") as the argument


    theStack commented at 5:34 PM on September 20, 2021:

    Ah, good point, that also explains why only the last element of each result vector failed the test. Thanks, fixed.

  25. theStack force-pushed on Sep 20, 2021
  26. in src/util/string.h:19 in 4eaebd78ba outdated
      15 | @@ -15,6 +16,8 @@
      16 |  #include <string>
      17 |  #include <vector>
      18 |  
      19 | +constexpr auto SplitString = &spanparsing::Split<std::string>;
    


    martinus commented at 5:16 AM on September 21, 2021:

    I think it would be safer to use a more restrictive interface for SplitString, because it's not really intuitive how SplitString behaves with string literals. Something like

    inline std::vector<std::string> SplitString(std::string_view str, char sep) {
        return spanparsing::Split<std::string>(str, sep);
    }
    

    Other than that, almost-ACK 8949509, code-reviewed and tested


    theStack commented at 10:45 AM on September 21, 2021:

    Thanks, used your suggested interface/implementation and also add the [[nodiscard]] attribute.

  27. in src/test/util_tests.cpp:2124 in 8949509e00 outdated
    2119 | +        BOOST_CHECK_EQUAL(result[1], "b");
    2120 | +    }
    2121 | +
    2122 | +    // Case-sensitivity of the separator.
    2123 | +    {
    2124 | +        std::vector<std::string> result = SplitString(std::string("AAA"), 'a');
    


    MarcoFalke commented at 8:56 AM on September 21, 2021:

    It seems dangerous to offer a function that silently does the wrong thing when passed the "wrong" type


    theStack commented at 10:46 AM on September 21, 2021:

    Agree. This is fixed now by using an explicit interface instead of an alias, as suggested by martinus in #22953 (review).

  28. MarcoFalke dismissed
  29. MarcoFalke commented at 8:56 AM on September 21, 2021: member

    Not a fan of forcing devs to remember to construct a std::string manually each time before calling the function.

  30. theStack force-pushed on Sep 21, 2021
  31. MarcoFalke requested review from MarcoFalke on Sep 21, 2021
  32. martinus commented at 5:51 AM on September 23, 2021: contributor

    Windows build error seems unrelated, in wallet_address_types.py:

    OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
    
  33. hebasto commented at 6:27 AM on September 23, 2021: member

    See #18548

    On Thu, 23 Sep 2021 at 08:54, Martin Leitner-Ankerl < @.***> wrote:

    Windows build error seems unrelated, in wallet_address_types.py:

    OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/22953#issuecomment-925523479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3PXPQDTK3VEUUU4R2KEY3UDK6JVANCNFSM5D246WWQ .

    --

    -- Hennadii Stepanov

  34. theStack force-pushed on Sep 27, 2021
  35. theStack commented at 10:38 AM on September 27, 2021: contributor

    Rebased on master to (hopefully) get rid of the Windows CI failure.

  36. practicalswift commented at 2:01 PM on September 29, 2021: contributor

    Concept ACK

  37. DrahtBot cross-referenced this on Nov 18, 2021 from issue Remove strtol in torcontrol by MarcoFalke
  38. DrahtBot commented at 8:54 AM on November 18, 2021: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24764 (Modernize util/strencodings and util/string: string_view and optional by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  39. DrahtBot added the label Needs rebase on Nov 25, 2021
  40. theStack force-pushed on Nov 25, 2021
  41. theStack commented at 1:08 PM on November 25, 2021: contributor

    Rebased on master.

  42. DrahtBot removed the label Needs rebase on Nov 25, 2021
  43. DrahtBot added the label Needs rebase on Dec 3, 2021
  44. theStack force-pushed on Dec 7, 2021
  45. theStack commented at 6:51 PM on December 7, 2021: contributor

    Rebased on master again.

  46. DrahtBot removed the label Needs rebase on Dec 7, 2021
  47. DrahtBot cross-referenced this on Jan 11, 2022 from issue Add defaults to vDeployments to avoid uninitialized variables by ajtowns
  48. kiminuo commented at 10:08 AM on January 17, 2022: contributor

    @theStack Is there a bug or does the PR need another rebase to pass all tests?

  49. theStack force-pushed on Jan 17, 2022
  50. theStack commented at 4:59 PM on January 17, 2022: contributor

    @kiminuo: Thanks for noticing, indeed a rebaster on master was needed since #17631 (commit ef7c8228fd5cf45526518ae2bd5ebdd483e65525) introduced two other single-separator uses of boost::split in ./src/rest.cpp. Done!

  51. DrahtBot cross-referenced this on Jan 17, 2022 from issue rest: Use query parameters to control resource loading by stickies-v
  52. DrahtBot cross-referenced this on Jan 19, 2022 from issue rest: Use query parameters to control resource loading by stickies-v
  53. DrahtBot cross-referenced this on Jan 29, 2022 from issue Enforce Taproot script flags whenever WITNESS is set by MarcoFalke
  54. DrahtBot cross-referenced this on Mar 2, 2022 from issue refactor: Split ArgsManager out of util/system by Empact
  55. fanquake added this to the milestone 24.0 on Mar 11, 2022
  56. DrahtBot cross-referenced this on Mar 19, 2022 from issue deploymentstatus: move g_versionbitscache global to ChainstateManager by ajtowns
  57. DrahtBot added the label Needs rebase on Mar 25, 2022
  58. fanquake cross-referenced this on Mar 25, 2022 from issue refactor: remove unused boost include in bitcoin-util.cpp by fanquake
  59. MarcoFalke referenced this in commit 7878c8655c on Mar 25, 2022
  60. theStack force-pushed on Mar 25, 2022
  61. theStack commented at 7:14 PM on March 25, 2022: contributor

    Rebased on master.

  62. DrahtBot removed the label Needs rebase on Mar 25, 2022
  63. sidhujag referenced this in commit 63c27b9e74 on Apr 2, 2022
  64. fanquake cross-referenced this on Apr 2, 2022 from issue build: prune Boost headers in depends by fanquake
  65. DrahtBot cross-referenced this on Apr 4, 2022 from issue Modernize util/strencodings and util/string: `string_view` and `optional` by sipa
  66. DrahtBot added the label Needs rebase on Apr 6, 2022
  67. fanquake cross-referenced this on Apr 6, 2022 from issue refactor: Get rid of `boost/algorithm/string.hpp` by hebasto
  68. refactor: introduce single-separator split helper `SplitString`
    This helper uses spanparsing::Split internally and enables to replace
    all calls to boost::split where only a single separator is passed.
    
    Co-authored-by: Martin Ankerl <Martin.Ankerl@gmail.com>
    Co-authored-by: MarcoFalke <falke.marco@gmail.com>
    9cc8e876e4
  69. test: add unit tests for `SplitString` helper 4fad7e46d9
  70. fuzz: add `SplitString` fuzz target a62e84438d
  71. theStack force-pushed on Apr 11, 2022
  72. theStack commented at 9:28 PM on April 11, 2022: contributor

    Rebased on master again (there was a conflict in src/rest.cpp after #24098 has been merged).

  73. DrahtBot removed the label Needs rebase on Apr 11, 2022
  74. fanquake commented at 8:48 AM on April 25, 2022: member

    @martinus want to take another look here?

  75. martinus commented at 5:49 AM on April 26, 2022: contributor

    Code review ACK a62e84438d27ee6213219fe2c233e58814fcbb5d. Ran all tests. I also like that with boost::split it was not obvious that the resulting container was cleared, and with SplitString API that's obvious.

    After that PR there is only one usage of boost::split left in core_read.cpp, that would be a nice followup PR :slightly_smiling_face:

  76. martinus approved
  77. fanquake merged this on Apr 26, 2022
  78. fanquake closed this on Apr 26, 2022

  79. theStack deleted the branch on Apr 26, 2022
  80. sidhujag referenced this in commit 3e0e7fb140 on Apr 26, 2022
  81. martinus cross-referenced this on May 3, 2022 from issue refactor: replace remaining boost::split with SplitString by martinus
  82. fanquake referenced this in commit bde5836f99 on May 4, 2022
  83. sidhujag referenced this in commit ed600cecaa on May 4, 2022
  84. kwvg referenced this in commit 94a225686c on Dec 21, 2022
  85. kwvg referenced this in commit 3133a4ceb0 on Dec 26, 2022
  86. kwvg referenced this in commit 35268285ff on Jan 3, 2023
  87. kwvg referenced this in commit eceda6a82f on Jan 13, 2023
  88. kwvg referenced this in commit 25a35284f0 on Jan 15, 2023
  89. kwvg referenced this in commit 135bfdec57 on Jan 19, 2023
  90. PastaPastaPasta referenced this in commit 2d7a448fdc on Jan 19, 2023
  91. bitcoin locked this on Apr 26, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:53 UTC