rpc: allow dumptxoutset to dump human-readable data #18689

pull pierreN wants to merge 1 commits into bitcoin:master from pierreN:feature-utxo-ascii changing 6 files +126 −30
  1. pierreN commented at 7:40 PM on April 17, 2020: contributor

    Adds additional optional arguments to dumptxoutset. If any are present, a human-readable file is written to disk instead of the compact binary serialized form currently in use. This does not change the current default behavior of dumptxoutset.


    Thanks to the future assumeutxo feature (#15605), we now have a dumptxoutset RPC (#16899) which can write the whole UTXO set to disk. However, the current format, although compact, is not easily readable by standard tools (e.g. for someone who would like to study the UTXO set). Plus this binary format might change in the future AFAIK.

    Providing power users an easy way to have a human-readable dump of the UTXOs would be a useful feature. We would this way replace 3rd party hackish tools with possible side effects.

    On my machine (slow SSD):

    • dumping the whole original 4GB binary UTXO set takes around 1mn40
    • dumping the set in whole ASCII form takes less than 9GB and 3mn30 (ofc file size/time depends on which ASCII data you write to disk; you can select them via the format argument).

    Thanks!

  2. hebasto commented at 8:03 PM on April 17, 2020: member

    What are possible/expected use cases?

  3. MarcoFalke commented at 8:09 PM on April 17, 2020: member

    Concept ACK

  4. DrahtBot added the label RPC/REST/ZMQ on Apr 17, 2020
  5. DrahtBot added the label Tests on Apr 17, 2020
  6. DrahtBot commented at 10:57 PM on April 17, 2020: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #21850 (Remove GetDataDir(net_specific) function by kiminuo)
    • #21526 (validation: UpdateTip/CheckBlockIndex assumeutxo support by jamesob)
    • #20664 (Add scanblocks RPC call by jonasschnelli)
    • #20295 (rpc: getblockfrompeer by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  7. DrahtBot cross-referenced this on Apr 18, 2020 from issue rpc: remove deprecated CRPCCommand constructor by MarcoFalke
  8. DrahtBot cross-referenced this on Apr 18, 2020 from issue rpc: add extensive file checks for dumptxoutset and dumpwallet by brakmic
  9. pierreN commented at 2:16 AM on April 18, 2020: contributor

    @hebasto the most common use case will be for users to easily study the whole UTXO set in only a few minutes.

    With this PR it should be really easy to dump any format you want via the format parameter (type of data, number of occurrences and order are all respected). Also, just adding one element to the ascii_types vector allows you to dump any new type of ASCII data.

    For example, you can now trivially plot a graph of when current UTXOs were created (took 3mn30 on my machine):

    $ bitcoin-cli dumptxoutset utxos.dat '["height","value"]' false ' '
    $ awk '{if($1 in map) { map[$1] += $2; } else { map[$1] = $2 }} END { for(height in map) { print height, map[height]; }}' utxos.dat | sort -n > plot.dat
    $ gnuplot -e "plot 'plot.dat' w l; pause -1;"
    

    image

    Another example would be to sum the amount of coinbase values in the UTXO set (took 2mn30):

    $ bitcoin-cli dumptxoutset utxos.dat '["coinbase","value"]' false ' '
    $ fgrep "1 " utxos.dat | awk '{sum += $2} END {print sum/100000000, "btc unspent coinbase"}'
    1.76077e+06 btc unspent coinbase
    

    Or really, just anything the user wants. With some more work from the user, it could also make it easier to track some indicators (such as bootstrapping/syncing SOPR).

    Another example: doing bitcoin file archaeology. Since all methods using scriptPubSig to etch data on the blockchain spam the UTXO set, you can retrieve a superset of all TXIDs of "files stored in the blockchain" via this RPC call.

  10. MarcoFalke cross-referenced this on Apr 25, 2020 from issue [Feature] Unspent Transaction Outputs Index by kyledrake
  11. DrahtBot cross-referenced this on Apr 29, 2020 from issue rpc: Do not advertise dumptxoutset as a way to flush the chainstate by MarcoFalke
  12. DrahtBot added the label Needs rebase on Apr 30, 2020
  13. pierreN force-pushed on May 1, 2020
  14. pierreN force-pushed on May 1, 2020
  15. pierreN commented at 5:09 AM on May 1, 2020: contributor

    rebased cd20cb8

  16. DrahtBot removed the label Needs rebase on May 1, 2020
  17. brakmic commented at 1:50 PM on May 1, 2020: contributor

    ACK cd20cb886deb0ef91ab89c66bbfb511e89eb77ee

    Built, run and tested on macOS Catalina 10.15.4

    ./test/functional/rpc_dumptxoutset.py
    2020-05-01T13:40:08.556000Z TestFramework (INFO): Initializing test directory /var/folders/7q/4ffytzk562dd2ky4bfg9_w7h0000gn/T/bitcoin_func_test_oc2tzp0y
    2020-05-01T13:40:11.072000Z TestFramework (INFO): no_option
    2020-05-01T13:40:11.113000Z TestFramework (INFO): all_data
    2020-05-01T13:40:11.216000Z TestFramework (INFO): partial_data_1
    2020-05-01T13:40:11.304000Z TestFramework (INFO): partial_data_order
    2020-05-01T13:40:11.369000Z TestFramework (INFO): partial_data_double
    2020-05-01T13:40:11.447000Z TestFramework (INFO): no_header
    2020-05-01T13:40:11.537000Z TestFramework (INFO): separator
    2020-05-01T13:40:11.617000Z TestFramework (INFO): all_options
    2020-05-01T13:40:11.748000Z TestFramework (INFO): Stopping nodes
    2020-05-01T13:40:12.313000Z TestFramework (INFO): Cleaning up /var/folders/7q/4ffytzk562dd2ky4bfg9_w7h0000gn/T/bitcoin_func_test_oc2tzp0y on exit
    2020-05-01T13:40:12.313000Z TestFramework (INFO): Tests successful
    
    ./src/bitcoin-cli -regtest dumptxoutset dump.dat '["txid", "vout"]' false ':'
    {
      "coins_written": 407,
      "base_hash": "01ba165996f7a7899e56b37584398adb892a5df7566b95e8de457ab588784740",
      "base_height": 407,
      "path": "/Users/brakmic/Library/Application Support/Bitcoin/regtest/dump.dat"
    }
    
    cat "/Users/brakmic/Library/Application Support/Bitcoin/regtest/dump.dat"
    208c48f15ed2971709d81da915b72255e50b9251c558dc45981632ed6e4cd300:0
    338e1fde4b86e2daaba2bd7cb4f8d77e600f47e7814645aafb480f56f4f41103:0
    e73b0564bd56d359bd8df64fa3b9fd8586c3ff0430081aff1f97a9600c834403:0
    23ff11ec2801f1c4838fc19863f7fa8d9283ac29e644180a6eaba160fd2a9c03:0
    03ba026c466ab490a19b0aa8a39abeeccc6cff24d4a24a34d5f4304ae21e5304:0
    98e8ceec62fb6442acaa939461482a46d7c03968082430661854815571eca204:0
    4b84d555d8a19ab3cc38152e446fdbd059ec535ab67806a61628f238e495ff04:0
    [...snip...]
    
  18. DrahtBot cross-referenced this on May 7, 2020 from issue assumeutxo by jamesob
  19. luke-jr referenced this in commit 8cf4bf7651 on Jun 9, 2020
  20. in src/rpc/blockchain.cpp:2298 in cd20cb886d outdated
    2294 | +    const std::string separator = request.params[3].isNull() ? "," : request.params[3].get_str();
    2295 | +    std::vector<std::pair<std::string, cb_t>> requested;
    2296 | +    if (!is_compact) {
    2297 | +        const auto& arr = request.params[1].get_array();
    2298 | +        const std::unordered_map<std::string, cb_t> ascii_map(std::begin(ascii_types), std::end(ascii_types));
    2299 | +        for(auto i = 0; i < arr.size(); ++i) {
    


    luke-jr commented at 11:51 PM on June 9, 2020:

    auto doesn't really work in this context...

    warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
    

    pierreN commented at 8:57 AM on June 14, 2020:

    Ha, funny that a compiler can get the warning but don't properly choose the type of i. I guess the 0 must confuse it.

    Thanks for catching this, I've just updated the branch.

  21. luke-jr referenced this in commit 6c0bf8881c on Jun 10, 2020
  22. pierreN force-pushed on Jun 14, 2020
  23. DrahtBot cross-referenced this on Jun 16, 2020 from issue Replace boost::filesystem with std::filesystem by kiminuo
  24. DrahtBot cross-referenced this on Jul 8, 2020 from issue Use operator/ in fs::absolute to prepare for C++17 by kiminuo
  25. DrahtBot cross-referenced this on Jul 13, 2020 from issue Log RPC parameters (arguments) if -debug=rpcparams by LarryRuane
  26. DrahtBot cross-referenced this on Aug 26, 2020 from issue validation: UTXO snapshot activation by jamesob
  27. jamesob commented at 1:48 AM on August 26, 2020: member

    Cool, I'll take a look in the next few days.

  28. DrahtBot cross-referenced this on Aug 31, 2020 from issue Assert that RPCArg names are equal to CRPCCommand ones (blockchain,rawtransaction) by MarcoFalke
  29. DrahtBot added the label Needs rebase on Sep 22, 2020
  30. jamesob commented at 3:02 PM on November 19, 2020: member

    Concept ACK - will review soon.

  31. MarcoFalke removed the label Tests on Nov 20, 2020
  32. luke-jr referenced this in commit c0f780cda5 on Nov 25, 2020
  33. in src/rpc/blockchain.cpp:2297 in 82046cf7fa outdated
    2293 | +    const bool show_header = request.params[2].isNull() || request.params[2].get_bool();
    2294 | +    const std::string separator = request.params[3].isNull() ? "," : request.params[3].get_str();
    2295 | +    std::vector<std::pair<std::string, cb_t>> requested;
    2296 | +    if (!is_compact) {
    2297 | +        const auto& arr = request.params[1].get_array();
    2298 | +        const std::unordered_map<std::string, cb_t> ascii_map(std::begin(ascii_types), std::end(ascii_types));
    


    luke-jr commented at 11:58 PM on November 26, 2020:

    I'm not sure if it's a compiler bug or PR bug, but ascii_types is invalid here when compiled with GCC 9.3.0.

    ==48759== Thread 22 b-httpworker.3:
    ==48759== Invalid read of size 8
    ==48759==    at 0x4BC68C: __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::fun
    ction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > const*, std::vector<std::pair<std
    ::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::
    allocator<char> > (COutPoint const&, Coin const&)> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<cha
    r> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > > > >::__normal_ite
    rator(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_
    traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > const* const&) (stl_iterator.h:807)
    ==48759==    by 0x4BC637: std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx1
    1::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> >, std::allocator<std::pair<std::__cxx11::basic_stri
    ng<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (CO
    utPoint const&, Coin const&)> > > >::begin() const (stl_vector.h:818)
    ==48759==    by 0x4BA12F: decltype (({parm#1}.begin)()) std::begin<std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::all
    ocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> >, std::all
    ocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char
    _traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > > > >(std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<cha
    r>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)>
     >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<cha
    r, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > > > const&) (range_access.h:59)
    ==48759==    by 0x48EC97: dumptxoutset()::$_37::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (blockchain.cpp:2692)
    
  34. luke-jr referenced this in commit 05d8ff8877 on Nov 30, 2020
  35. graymauser cross-referenced this on Mar 3, 2021 from issue btcposbal2csv.py doesn't extract segwit by Jolly-Pirate
  36. in src/rpc/blockchain.cpp:2257 in 82046cf7fa outdated
    2252 | +        // add any other desired items here
    2253 | +    };
    2254 | +
    2255 | +    std::vector<RPCArg> ascii_args;
    2256 | +    std::transform(std::begin(ascii_types), std::end(ascii_types), std::back_inserter(ascii_args),
    2257 | +            [](const std::pair<std::string, cb_t>& t) { return RPCArg{t.first, RPCArg::Type::STR, RPCArg::Optional::OMITTED, "Info to write for a given UTXO"}; });
    


    benthecarman commented at 10:09 PM on March 6, 2021:

    It'd be nice if these were more descriptive. It's unclear exactly what the serialization is for these argsr

  37. MarcoFalke cross-referenced this on Apr 13, 2021 from issue How to convert dumptxoutset rpc result to human readable JSON by Stonica
  38. MarcoFalke added the label Up for grabs on Apr 13, 2021
  39. MarcoFalke commented at 6:42 PM on April 13, 2021: member

    Still needs rebase

  40. DrahtBot removed the label Needs rebase on May 6, 2021
  41. rpc: allow dumptxoutset to dump human-readable data 65d0697fe3
  42. pierreN force-pushed on May 6, 2021
  43. pierreN commented at 9:39 PM on May 6, 2021: contributor

    Sorry for the few months delay. I have a bit more time now and will try to follow through with this PR.

    I'll update the branch in a few days (I was syncing when my old SSD died).

  44. DrahtBot cross-referenced this on May 7, 2021 from issue Remove `GetDataDir(net_specific)` function by kiminuo
  45. DrahtBot cross-referenced this on May 7, 2021 from issue validation: UpdateTip/CheckBlockIndex assumeutxo support by jamesob
  46. DrahtBot cross-referenced this on May 7, 2021 from issue Add scanblocks RPC call by jonasschnelli
  47. DrahtBot cross-referenced this on May 7, 2021 from issue rpc: getblockfrompeer by Sjors
  48. Sjors commented at 1:41 PM on May 21, 2021: member

    Consider moving this functionality to the new bitcoin-util instead. You could add a command that converts the binary format to human readable.

  49. DrahtBot added the label Needs rebase on May 24, 2021
  50. DrahtBot commented at 9:40 AM on May 24, 2021: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->

    🐙 This pull request conflicts with the target branch and needs rebase.

    <sub>Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".</sub>

  51. in src/rpc/blockchain.cpp:2611 in 65d0697fe3
    2607 | @@ -2564,7 +2608,7 @@ static RPCHelpMan dumptxoutset()
    2608 |      };
    2609 |  }
    2610 |  
    2611 | -UniValue CreateUTXOSnapshot(NodeContext& node, CChainState& chainstate, CAutoFile& afile)
    2612 | +UniValue CreateUTXOSnapshot(const bool is_compact, const bool show_header, const std::string& separator, NodeContext& node, CChainState& chainstate, CAutoFile& afile, const std::vector<std::pair<std::string, coinascii_cb_t>>& requested)
    


    luke-jr commented at 6:17 AM on October 11, 2021:

    IMO it'd be nicer to avoid the two mutually-exclusive bools. Maybe a good case for a class enum?

  52. in src/test/validation_chainstatemanager_tests.cpp:186 in 65d0697fe3
     182 | @@ -183,7 +183,7 @@ CreateAndActivateUTXOSnapshot(NodeContext& node, const fs::path root, F malleati
     183 |      FILE* outfile{fsbridge::fopen(snapshot_path, "wb")};
     184 |      CAutoFile auto_outfile{outfile, SER_DISK, CLIENT_VERSION};
     185 |  
     186 | -    UniValue result = CreateUTXOSnapshot(node, node.chainman->ActiveChainstate(), auto_outfile);
     187 | +    UniValue result = CreateUTXOSnapshot(false, false, "", node, node.chainman->ActiveChainstate(), auto_outfile, {});
    


    luke-jr commented at 6:19 AM on October 11, 2021:

    The first false here should be true, as ActivateSnapshot can only handle the binary/compact format.

  53. luke-jr changes_requested
  54. josibake commented at 10:43 AM on December 28, 2021: contributor

    Concept ACK @pierreN are you still working on this? I'm happy to try and take it over the finish line if you're not.

  55. jamesob commented at 7:10 PM on January 3, 2022: member

    re-Concept ACK and at a high-level the code looks pretty good. Nice job on the tests. In need of a rebase though.

  56. w0xlt cross-referenced this on Jan 29, 2022 from issue rpc: allow dumptxoutset to dump human-readable data by w0xlt
  57. luke-jr referenced this in commit 17d3ed773f on Feb 8, 2022
  58. luke-jr referenced this in commit 6511adb193 on Feb 8, 2022
  59. luke-jr referenced this in commit 648dc4dcd7 on Feb 8, 2022
  60. luke-jr referenced this in commit 61f3da3c04 on Feb 8, 2022
  61. luke-jr commented at 9:04 PM on February 8, 2022: member

    If you decide to revive this PR, I've done an extensive rebase at https://github.com/bitcoin/bitcoin/compare/master...luke-jr:rpc_dumptxoutset_hr (leave off the last commit), rebasing it on top of (but not compatible with) #24202, and splitting up the different functionality across multiple commits.

  62. prusnak cross-referenced this on Mar 21, 2022 from issue rpc: dumptxoutset as sqlite file by prusnak
  63. MarcoFalke commented at 9:11 AM on March 22, 2022: member

    I think this was picked up in #https://github.com/bitcoin/bitcoin/pull/24202 , so can be closed?

  64. MarcoFalke removed the label Up for grabs on Mar 22, 2022
  65. dunxen cross-referenced this on Apr 26, 2022 from issue rpc: Add sqlite format option for dumptxoutset by dunxen
  66. fanquake closed this on May 12, 2022

  67. theStack cross-referenced this on Apr 6, 2023 from issue contrib: add tool to convert compact-serialized UTXO set to SQLite database by theStack
  68. bitcoin locked this on May 12, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:54 UTC