Build txindex in parallel with validation #13033

pull jimpo wants to merge 12 commits into bitcoin:master from jimpo:txindex-refactor-take2 changing 18 files +757 −86
  1. jimpo commented at 12:11 AM on April 20, 2018: contributor

    I'm re-opening #11857 as a new pull request because the last one stopped loading for people


    This refactors the tx index code to be in it's own class and get built concurrently with validation code. The main benefit is decoupling and moving the txindex into a separate DB. The primary motivation is to lay the groundwork for other indexers that might be desired (such as the compact filters). The basic idea is that the TxIndex spins up its own thread, which first syncs the txindex to the current block index, then once in sync the BlockConnected ValidationInterface hook writes new blocks.

    DB changes

    At the suggestion of some other developers, the txindex has been split out into a separate database. A data migration runs at startup on any nodes with a legacy txindex. Currently the migration blocks node initialization until complete.

    Open questions

    • Should the migration of txindex data from the old DB to the new DB block in init or should it happen in a background thread? The downside to backgrounding it is that getrawtransaction would return an error message saying the txindex is syncing while the migration is running.

    Impact

    In a sample size n=1 test where I synced nodes from scratch, the average time Index writing was 3.36ms in master and 1.72ms in this branch. The average time between UpdateTip log lines for sequential blocks between 400,000 and IBD end on mainnet was 0.297204s in master and 0.286134s in this branch. Most likely this is just variance in IBD times, but I can try with some more trials if people want.

  2. jimpo cross-referenced this on Apr 20, 2018 from issue Build tx index in parallel with validation by jimpo
  3. fanquake added the label UTXO Db and Indexes on Apr 20, 2018
  4. fanquake added this to the "Blockers" column in a project

  5. Sjors commented at 9:49 AM on April 20, 2018: member

    If anyone wants to read the old PR and has difficulty loading the page, it helps to use another browser session that's logged out.

    Only comment of mine I'd like to export to this PR is the suggestion to backport "the migration code that removes txindex without requiring a reindex would be useful for folks who regret having set txindex=1 on a node with slow hardware."

    Tested 3272b17637c5, including kill -9ing the migration from legacy to the new db.

  6. in doc/release-notes-pr11857.md:1 in 3272b17637 outdated
       0 | @@ -0,0 +1,11 @@
       1 | +Transaction index changes
    


    promag commented at 2:24 PM on April 24, 2018:

    Rename file? PR changed.

  7. in src/index/txindex.cpp:1 in 3272b17637 outdated
       0 | @@ -0,0 +1,308 @@
       1 | +// Copyright (c) 2017 The Bitcoin Core developers
    


    promag commented at 2:25 PM on April 24, 2018:

    nit, 2018? (and 2 more files)

  8. in src/rpc/rawtransaction.cpp:51 in 3272b17637 outdated
      47 | @@ -47,6 +48,8 @@ void TxToJSON(const CTransaction& tx, const uint256 hashBlock, UniValue& entry)
      48 |      TxToUniv(tx, uint256(), entry, true, RPCSerializationFlags());
      49 |  
      50 |      if (!hashBlock.IsNull()) {
      51 | +        LOCK(cs_main);
    


    promag commented at 2:46 PM on April 24, 2018:

    In order to avoid locking and lookup here, how about adding the argument CBlockIndex* pindex to TxToJSON? (I know this is only called from getrawtransaction).


    jimpo commented at 5:56 PM on April 24, 2018:

    I'm pretty against adding more to this PR. Seems like a good follow-on cleanup item.


    promag commented at 6:17 PM on April 24, 2018:

    Fair.

  9. in src/index/txindex.h:79 in 3272b17637 outdated
      74 | +    ///
      75 | +    /// @param[in]   tx_hash  The hash of the transaction to be returned.
      76 | +    /// @param[out]  block_hash  The hash of the block the transaction is found in.
      77 | +    /// @param[out]  tx  The raw transaction itself.
      78 | +    /// @return  true if transaction is found, false otherwise
      79 | +    bool FindTx(const uint256& tx_hash, uint256& block_hash, CTransactionRef& tx) const;
    


    promag commented at 3:07 PM on April 24, 2018:

    nit, LookupTransaction.

  10. in src/index/txindex.h:73 in 3272b17637 outdated
      68 | +    /// the current state of the block chain. This only blocks if the index has gotten in sync once
      69 | +    /// and only needs to process blocks in the ValidationInterface queue. If the index is catching
      70 | +    /// up from far behind, this method does not block and immediately returns false.
      71 | +    bool BlockUntilSyncedToCurrentChain();
      72 | +
      73 | +    /// Look up a raw transaction by hash.
    


    promag commented at 3:07 PM on April 24, 2018:

    Drop raw?


    jimpo commented at 6:05 PM on April 24, 2018:

    Your wish is my command.

  11. in src/index/txindex.h:77 in 3272b17637 outdated
      72 | +
      73 | +    /// Look up a raw transaction by hash.
      74 | +    ///
      75 | +    /// @param[in]   tx_hash  The hash of the transaction to be returned.
      76 | +    /// @param[out]  block_hash  The hash of the block the transaction is found in.
      77 | +    /// @param[out]  tx  The raw transaction itself.
    


    promag commented at 3:07 PM on April 24, 2018:

    Drop raw?

  12. in src/index/txindex.cpp:293 in 3272b17637 outdated
     288 | +void TxIndex::Start()
     289 | +{
     290 | +    // Need to register this ValidationInterface before running Init(), so that
     291 | +    // callbacks are not missed if Init sets m_synced to true.
     292 | +    RegisterValidationInterface(this);
     293 | +    if (!Init()) {
    


    promag commented at 3:34 PM on April 24, 2018:

    I don't understand this case, should exit instead?


    jimpo commented at 6:02 PM on April 24, 2018:

    My thinking is that logging and running without the TxIndex is better than FatalError, but I could see it the other way too. Probably a good idea to add a log line at the very least.


    promag commented at 8:30 PM on April 24, 2018:

    I'd say running without the desired functionality should exit.


    sipa commented at 10:20 PM on April 24, 2018:

    I agree with @promag. If the daemon can't provide the service that is requested, it should fail early. Otherwise you may just complicate debugging.


    jimpo commented at 11:22 PM on April 24, 2018:

    Changed to FatalError.

  13. in src/rpc/rawtransaction.cpp:252 in 3272b17637 outdated
     256 | -        for (const auto& tx : setTxids) {
     257 | -            const Coin& coin = AccessByTxid(*pcoinsTip, tx);
     258 | -            if (!coin.IsSpent()) {
     259 | -                pblockindex = chainActive[coin.nHeight];
     260 | -                break;
     261 | +        LOCK(cs_main);
    


    promag commented at 3:43 PM on April 24, 2018:

    Instead of nesting, repeat the lock:

    if (!request.params[1].isNull()) {
        hashBlock = uint256S(request.params[1].get_str());
        LOCK(cs_main);
        ...
    } else {
        LOCK(cs_main);
        ...
    }
    
  14. promag commented at 3:46 PM on April 24, 2018: member

    Tested ACK 3272b17 (round 1).

  15. jimpo force-pushed on Apr 24, 2018
  16. in src/index/txindex.cpp:111 in 40eabf6a71 outdated
     106 | +                LogPrintf("Syncing txindex with block chain from height %d\n", pindex->nHeight);
     107 | +                last_log_time = current_time;
     108 | +            }
     109 | +
     110 | +            if (last_locator_write_time + SYNC_LOCATOR_WRITE_INTERVAL < current_time) {
     111 | +                WriteBestBlock();
    


    sipa commented at 10:06 PM on April 24, 2018:

    As this is called in the middle of ThreadSync(), shouldn't this be writing a locator for pindex rather than chainActive?


    jimpo commented at 11:13 PM on April 24, 2018:

    Yeah, wow, oops. Good catch.

  17. jimpo force-pushed on Apr 24, 2018
  18. in src/index/txindex.cpp:88 in 8f1bfe1488 outdated
      83 | +
      84 | +        int64_t last_log_time = 0;
      85 | +        int64_t last_locator_write_time = 0;
      86 | +        while (true) {
      87 | +            if (m_interrupt) {
      88 | +                WriteBestBlock();
    


    sipa commented at 11:40 PM on April 24, 2018:

    This function takes an argument now.

  19. jimpo force-pushed on Apr 25, 2018
  20. Sjors cross-referenced this on Apr 25, 2018 from issue Interpret absense of prune= as prune=1 if there are pruned blocks by Sjors
  21. jimpo force-pushed on Apr 25, 2018
  22. jimpo force-pushed on Apr 25, 2018
  23. jimpo force-pushed on Apr 25, 2018
  24. [db] Create separate database for txindex.
    The new TxIndexDB class will be used by a future commit in this
    change set.
    0cb8303241
  25. [db] Migration for txindex data to new, separate database. c88bcec93f
  26. [index] Create new TxIndex class.
    The TxIndex will be responsible for building the transaction index
    concurrently with the main validation thread by implementing
    ValidationInterface. This does not process blocks concurrently yet.
    34d68bf3a3
  27. [index] TxIndex initial sync thread.
    TxIndex starts up a background thread to get in sync with the block
    index before blocks are processed through the ValidationInterface.
    94b4f8bbb9
  28. [index] Allow TxIndex sync thread to be interrupted. 70d510d93c
  29. [index] TxIndex method to wait until caught up.
    In order to preserve getrawtransaction RPC behavior, there needs to be
    a way for a thread to ensure the transaction index is in sync with the
    current state of the blockchain.
    f90c3a62f5
  30. [init] Initialize and start TxIndex in init code. 8181db88f6
  31. [validation] Replace tx index code in validation code with TxIndex. e0a3b80033
  32. [index] Move disk IO logic from GetTransaction to TxIndex::FindTx. a03f804f2a
  33. [rpc] Public interfaces to GetTransaction block until synced.
    Now that the transaction index is updated asynchronously, in order to
    preserve the current behavior of public interfaces, the code blocks
    until the transaction index is caught up with the current state of the
    blockchain.
    6d772a3d44
  34. [test] Simple unit test for TxIndex. ed77dd6b30
  35. [doc] Include txindex changes in the release notes. 9b2704777c
  36. jimpo force-pushed on Apr 25, 2018
  37. sipa commented at 1:20 AM on April 26, 2018: member

    utACK 9b2704777ceeca48d57ce058ae91674c7764b143

    Also compared with a re-merge of master with 806b2f1 (which had utACK from @ryanofsky and @Sjors), with a re-merge of master with 523dd76 (which had utACK from @TheBlueMatt), and with a re-merge of master with ea8be45 (which had Tested ACK from @jamesob) to find the only differences are:

    • Adding an abstracted-out function TxIndex::WriteBestBlock, which is invoked additionally after interrupting the background sync and periodically.
    • Checks in TxIndex::SetBestChain to see if callbacks are received after BlockConnected.
    • Push down cs_main locks in gettxoutproof.
    • Simplification of TxIndexDB::ReadBestBlock.
    • Changes in comments and logging
  38. sipa merged this on Apr 26, 2018
  39. sipa closed this on Apr 26, 2018

  40. sipa referenced this in commit a07e8caa5d on Apr 26, 2018
  41. ryanofsky cross-referenced this on Apr 26, 2018 from issue Refactor: separate wallet from node by ryanofsky
  42. ryanofsky cross-referenced this on Apr 26, 2018 from issue Make LOCK, LOCK2, TRY_LOCK work with CWaitableCriticalSection by ryanofsky
  43. jamesob commented at 4:50 AM on April 26, 2018: member

    :tada: nice work, @jimpo.

  44. Bushstar cross-referenced this on Apr 26, 2018 from issue commits from bitcoin/master by Bushstar
  45. fanquake removed this from the "Blockers" column in a project

  46. jimpo cross-referenced this on May 1, 2018 from issue RPC: Improve error messages on RPC endpoints that use GetTransaction by jimpo
  47. jimpo deleted the branch on May 1, 2018
  48. mgrychow cross-referenced this on Aug 11, 2018 from issue refactor: Removal of circular dependency between index/txindex, validation and index/base by mgrychow
  49. sipa added the label Needs release notes on Aug 14, 2018
  50. mgrychow cross-referenced this on Aug 23, 2018 from issue Utxoscriptindex by mgrychow
  51. marcinja cross-referenced this on Aug 24, 2018 from issue Add address-based index (attempt 4?) by marcinja
  52. laanwj cross-referenced this on Aug 30, 2018 from issue TODO for release notes 0.17.0 by laanwj
  53. Sjors cross-referenced this on Sep 8, 2018 from issue Allow txindex in prune mode by jonasschnelli
  54. promag cross-referenced this on Oct 28, 2018 from issue Bus error (core dumped) crash on Fedora (txindex migration?) by Rspigler
  55. fanquake removed the label Needs release note on Mar 20, 2019
  56. UdjinM6 referenced this in commit 2924424924 on May 21, 2021
  57. UdjinM6 referenced this in commit f3ce0156a1 on May 25, 2021
  58. UdjinM6 referenced this in commit 7ff6515c88 on May 25, 2021
  59. UdjinM6 referenced this in commit 0f9f133f78 on Jun 5, 2021
  60. UdjinM6 referenced this in commit bcc8b35194 on Jun 5, 2021
  61. jnewbery cross-referenced this on Aug 3, 2021 from issue Remove txindex migration code by jnewbery
  62. Zero-1729 cross-referenced this on Aug 4, 2021 from issue Remove txindex migration code by MarcoFalke
  63. bitcoin locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:55 UTC