Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 #18071

pull JeremyRubin wants to merge 3 commits into bitcoin:master from JeremyRubin:refactoring-hashers changing 3 files +52 −19
  1. JeremyRubin commented at 8:21 AM on February 5, 2020: contributor

    These are a couple common refactoring changes in both Taproot and CTV that I think can be merged as is. Other than that the two PRs should largely be conflict free (except for caching, which is a relatively trivial fix).

    These essentially just allow us to get access to the single SHA256 version of Get{Prevouts,Sequence,Outputs} hash which are used both in the Taproot and CTV spec. They use the single hash versions because they are cheaper to compute, which reduces validation overhead. These refactors are non-functional, and just expose the ability to get the single hash when needed.

    The names of Get*Hash are renamed to Get*SHA256 to avoid confusion with the value returned previously.

  2. JeremyRubin requested review from sipa on Feb 5, 2020
  3. fanquake added the label Refactoring on Feb 5, 2020
  4. DrahtBot commented at 10:14 AM on February 5, 2020: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19601 (Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 (Alternative to #18071) by JeremyRubin)
    • #17977 (Implement BIP 340-342 validation (Schnorr/taproot/tapscript) by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. JeremyRubin force-pushed on Feb 5, 2020
  6. JeremyRubin renamed this:
    [WIP] Refactoring CHashWriter & moving some hashed fields around
    Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256
    on Feb 5, 2020
  7. JeremyRubin marked this as ready for review on Feb 5, 2020
  8. JeremyRubin force-pushed on Feb 7, 2020
  9. JeremyRubin force-pushed on Feb 7, 2020
  10. DrahtBot cross-referenced this on Feb 11, 2020 from issue Implement BIP 340-342 validation (Schnorr/taproot/tapscript) by sipa
  11. prestwich commented at 6:44 PM on February 17, 2020: none

    utACK

  12. DrahtBot cross-referenced this on Mar 22, 2020 from issue Return BlockValidationState from ProcessNewBlock if CheckBlock/AcceptBlock fails by jnewbery
  13. DrahtBot cross-referenced this on Mar 22, 2020 from issue Refactor: Initialize PrecomputedTransactionData in CheckInputScripts by jnewbery
  14. DrahtBot added the label Needs rebase on Apr 16, 2020
  15. JeremyRubin force-pushed on Apr 21, 2020
  16. DrahtBot removed the label Needs rebase on Apr 21, 2020
  17. DrahtBot cross-referenced this on May 2, 2020 from issue The Zero Allocations project by jb55
  18. DrahtBot cross-referenced this on May 16, 2020 from issue bloom: use Span instead of std::vector for `insert` and `contains` [ZAP3] by jb55
  19. DrahtBot cross-referenced this on Jun 19, 2020 from issue Simplify hash.h interface using Spans by sipa
  20. Sjors commented at 9:30 AM on July 24, 2020: member

    Concept ACK on getting this refactor in separate from #17977 (Taproot) and #19055 (MuHash). Will review later.

  21. in src/hash.h:141 in 3f2b537fa8 outdated
     160 |       */
     161 |      inline uint64_t GetCheapHash() {
     162 | -        unsigned char result[CHash256::OUTPUT_SIZE];
     163 | -        ctx.Finalize(result);
     164 | -        return ReadLE64(result);
     165 | +        uint256 result = GetHash();
    


    Sjors commented at 10:22 AM on July 24, 2020:

    Shouldn't GetCheapHash be calling GetSHA256? I.e. "Cheap" refers to a single rather than a double hash.


    JeremyRubin commented at 4:53 PM on July 24, 2020:
  22. Sjors cross-referenced this on Jul 24, 2020 from issue Add MuHash3072 implementation by fjahr
  23. in src/script/interpreter.cpp:1262 in 09b832b57b outdated
    1258 | @@ -1259,33 +1259,33 @@ class CTransactionSignatureSerializer
    1259 |  };
    1260 |  
    1261 |  template <class T>
    1262 | -uint256 GetPrevoutHash(const T& txTo)
    1263 | +uint256 GetPrevoutSHA256(const T& txTo)
    


    Sjors commented at 10:45 AM on July 24, 2020:

    Might as well rename these to plural prevouts and sequences: GetPrevoutsSHA256 and GetSequencesSHA256.


    JeremyRubin commented at 5:57 PM on July 24, 2020:

    done.

  24. in src/script/interpreter.cpp:1303 in 09b832b57b outdated
    1301 | -        hashSequence = GetSequenceHash(txTo);
    1302 | -        hashOutputs = GetOutputsHash(txTo);
    1303 | +        hashPrevouts = GetPrevoutSHA256(txTo);
    1304 | +        hashSequence = GetSequenceSHA256(txTo);
    1305 | +        hashOutputs = GetOutputsSHA256(txTo);
    1306 | +        SHA256Uint256(hashPrevouts);
    


    Sjors commented at 10:47 AM on July 24, 2020:

    Would prefer to do this as a one-liner: hashPrevouts = SHA256Uint256(GetPrevoutSHA256(txTo)), as you do below.


    JeremyRubin commented at 4:54 PM on July 24, 2020:

    The point of not being a one liner is to make it easier for future work to just add a line of copying.

  25. Sjors commented at 10:50 AM on July 24, 2020: member

    09b832b could mention in the description that several proposals require us to "get access to the single SHA256".

    Should we rename GetHash() to GetDoubleSha256()?

  26. jnewbery commented at 12:11 PM on July 24, 2020: member

    There are some changes in #17977 to these commits, which should be added here if we merge this separately.

  27. JeremyRubin force-pushed on Jul 24, 2020
  28. JeremyRubin commented at 5:58 PM on July 24, 2020: contributor

    @Sjors requested changes made where applicable. @jnewbery could you be more specific? I scanned over the diffs and am not sure what you're referring to.

  29. JeremyRubin commented at 8:29 PM on July 24, 2020: contributor

    Ah those changes are intentional for SHA256Uint256, since we use a r-value version at a callsite I figure it's better to have an explicit version that does that.

    In this version, I make the r-value version return a hash (forcing lifetime consumption of it's parameter) and make the reference version overwrite it's parameter and return void. It should be harder to misuse this API as the types are more clear in the places we use it.

    In any case, if there is consensus preferring the approach currently in #17977, we'd want to minimally see changes to the Taproot version to both:

    1. change the parameter names for consistency.
    2. make the SHA256Uint256 take a const uint256& instead of a uint256&.

    I think those are all the differences, aside from the precomputed caches (which would be inappropriate to modify in this PR)? If there are other changes, please let me know.

  30. fjahr approved
  31. fjahr commented at 1:45 PM on July 25, 2020: contributor

    Code review ACK b0ac6eb86f98258d14800a8f913f977902aed856

    Micro-nit: the last commit has a typo in the commit message: Prvouts.

  32. Sjors commented at 9:02 AM on July 27, 2020: member

    It's still unresolved if the CheapHash function can use GetHash (instead of GetSHA256) without breaking Addrman's bucketing. See #19055 (review)

  33. jnewbery commented at 10:35 AM on July 27, 2020: member

    If the intention here is to have function signatures that are different from #17977, then I'm NACKish on this, since it would create additional rebase work for the author and reviewers of that PR.

  34. fjahr commented at 12:34 PM on July 27, 2020: contributor

    It's still unresolved if the CheapHash function can use GetHash (instead of GetSHA256) without breaking Addrman's bucketing. See #19055 (comment)

    I think that consideration can be left for a follow-up. It is a further behavior change that makes sense to review separately to make sure nothing breaks. And this change here is valuable enough without it for #17977 etc. So after it is merged the other PRs can be rebased and the CheapHash discussion can go on without blocking anything.

  35. JeremyRubin commented at 5:41 PM on July 27, 2020: contributor

    I agree with @fjahr, it's best if this PR can make no functional changes. Am happy to make a follow-up PR that can be reviewed independently switching cheaphash to a single sha256 (or as sipa points out, even siphash). @Sjors is that OK? @jnewbery this PR has been open for half a year, and is a part of my review of the Taproot code. The below should be about the only diff required as a result of interface changes, which mostly have to be patched into 41d08f5d77f52bec0e31bb081d85fff2d67d0467. This code is more clear because m_*_hash is not touched once it has been set.

    Regardless of this PR, the function signatures should be changed to a const ref. It's not clear to me that it's defined behavior to pass an r-value as a non-const ref, which is done in the taproot code currently (hence the interface changes here -- clearly no UB). If there is consensus that the interfaces should not change, preferring the approach in #17977, I can refactor this PR to be similar.

    edit: I guess it's not using UB... I could have sworn somewhere that the input to SHA256Uint256 was a non const...

    diff --git a/src/script/interpreter.cpp b/src/script/interpreter.cpp
    index 9ec960d17..003c873d9 100644
    --- a/src/script/interpreter.cpp
    +++ b/src/script/interpreter.cpp
    @@ -1408,12 +1408,12 @@ void PrecomputedTransactionData::Init(const T& txTo, std::vector<CTxOut> spent_o
     
         // Cache is calculated only for transactions with witness
         if (txTo.HasWitness()) {
    -        m_prevouts_hash = GetPrevoutHash(txTo);
    -        hashPrevouts = SHA256Uint256(m_prevouts_hash);
    -        m_sequences_hash = GetSequenceHash(txTo);
    -        hashSequence = SHA256Uint256(m_sequences_hash);
    -        m_outputs_hash = GetOutputsHash(txTo);
    -        hashOutputs = SHA256Uint256(m_outputs_hash);
    +        hashPrevouts = m_prevouts_hash = GetPrevoutsSHA256(txTo);
    +        SHA256Uint256(hashPrevouts);
    +        hashSequence = m_sequences_hash = GetSequencesSHA256(txTo);
    +        SHA256Uint256(hashSequence);
    +        hashOutputs = m_outputs_hash = GetOutputsSHA256(txTo);
    +        SHA256Uint256(hashOutputs);
    
  36. JeremyRubin cross-referenced this on Jul 27, 2020 from issue Refactoring CHashWriter & Get{Prevouts,Sequence,Outputs}Hash to SHA256 (Alternative to #18071) by JeremyRubin
  37. in src/hash.cpp:83 in b0ac6eb86f outdated
      76 | @@ -77,3 +77,14 @@ void BIP32Hash(const ChainCode &chainCode, unsigned int nChild, unsigned char he
      77 |      num[3] = (nChild >>  0) & 0xFF;
      78 |      CHMAC_SHA512(chainCode.begin(), chainCode.size()).Write(&header, 1).Write(data, 32).Write(num, 4).Finalize(output);
      79 |  }
      80 | +
      81 | +void SHA256Uint256(uint256& hash_io)
      82 | +{
      83 | +    CSHA256().Write(hash_io.begin(), 32).Finalize(hash_io.begin());
    


    instagibbs commented at 8:35 PM on July 30, 2020:

    nano nit: hash_io.size()


    JeremyRubin commented at 8:55 PM on July 30, 2020:

    We do this a ton of places in the codebase -- might be nice for someone to add a Write/Finalize interface that is uint256 aware and fix these all at once?


    sipa commented at 9:14 PM on July 30, 2020:

    See #19326 (which only does it for the hash.h ones and not the crypto/* ones, but certainly makes sense as a follow-up there too).

  38. in src/hash.cpp:88 in b0ac6eb86f outdated
      83 | +    CSHA256().Write(hash_io.begin(), 32).Finalize(hash_io.begin());
      84 | +}
      85 | +
      86 | +uint256 SHA256Uint256(uint256&& hash_io)
      87 | +{
      88 | +    CSHA256().Write(hash_io.begin(), 32).Finalize(hash_io.begin());
    


    instagibbs commented at 8:35 PM on July 30, 2020:

    nano nit: hash_io.size()

  39. instagibbs approved
  40. instagibbs commented at 8:38 PM on July 30, 2020: member

    I have a mild preference for this PR. Both seem correct.

    code review ACK https://github.com/bitcoin/bitcoin/pull/18071/commits/b0ac6eb86f98258d14800a8f913f977902aed856

    Nano nits can be ignored.

  41. luke-jr approved
  42. luke-jr commented at 12:59 AM on July 31, 2020: member

    utACK

  43. Sjors commented at 5:32 PM on July 31, 2020: member

    I agree with @fjahr, it's best if this PR can make no functional changes. Am happy to make a follow-up PR that can be reviewed independently switching cheaphash to a single sha256 (or as sipa points out, even siphash). @Sjors is that OK?

    That makes no sense; cheaphash uses a single sha256. This PR changes it to a double hash, so it's a functional change. Unless I'm reading it wrong.

  44. JeremyRubin commented at 5:42 PM on July 31, 2020: contributor

    @Sjors yes you are reading it wrong. The type of ctx changes from double hashed writer to a single hashed writer.

  45. DrahtBot added the label Needs rebase on Aug 3, 2020
  46. Add single sha256 call to CHashWriter 97fe971b11
  47. Add SHA256Uint256 helper functions 425ba916d8
  48. JeremyRubin force-pushed on Aug 3, 2020
  49. JeremyRubin force-pushed on Aug 3, 2020
  50. Refactor Get{Prevout,Sequence,Outputs}Hash to Get{Prevouts,Sequences,Outputs}SHA256.
    Several proposals (Taproot, MuHash, CTV) require access to the single
    hash.
    03b1cf8e93
  51. JeremyRubin force-pushed on Aug 3, 2020
  52. DrahtBot removed the label Needs rebase on Aug 3, 2020
  53. JeremyRubin commented at 7:54 PM on August 3, 2020: contributor

    I've rebased #18071 following the hash.h span changes (but not #19601 -- I can rebase that one if there is a preference for that approach).

    pre-rebase summary: @jnewbery nack-ish (#17977 conflicts may prefer #19601 approach?) @instagibbs code review ACK (slightly prefers #18071 over #19601) @fjahr code review ACK (micro nit fixed) @luke-jr utack @prestwich utack @Sjors concept-ack

    Please check the rebase diff at your convenience! @sipa it would be good if you could weigh in on if this is acceptable or if #19601 is better.

    I think since merge of the span.h changes causes #17977 to need to rebase, it might be a decent time to rebase onto this branch without too much extra work.

  54. in src/hash.h:201 in 425ba916d8 outdated
     193 | @@ -194,6 +194,12 @@ uint256 SerializeHash(const T& obj, int nType=SER_GETHASH, int nVersion=PROTOCOL
     194 |      return ss.GetHash();
     195 |  }
     196 |  
     197 | +/** Single-SHA256 a 32-byte input (represented as uint256).
     198 | + *  hash_io is read and written over with the result.
     199 | + */
     200 | +void SHA256Uint256(uint256& hash_io);
     201 | +uint256 SHA256Uint256(uint256&& hash_io);
    


    sipa commented at 8:04 PM on August 3, 2020:

    What is the point of having these functions instead of a single uint256 SHA256Uint256(const uint256& input) ?


    JeremyRubin commented at 12:36 AM on August 4, 2020:

    One benefit is that both versions operate completely in-place on their arguments (otherwise we're reading from a pointer, writing to the stack, then writing to a pointer). Another benefit is that it is slightly harder to misuse this because when you are operating on a l-value there is no return value, which otherwise you might forget to assign.


    sipa commented at 7:33 PM on August 4, 2020:

    That could be solved using NODISCARD instead. I don't think saving a super brief 32 bytes on the stack is even going to be be observable.

  55. Sjors commented at 9:11 AM on August 4, 2020: member

    Ah, I missed the switch from CHash256 (double) to CSHA256 (single). I wrote a test, which this PR indeed doesn't break:

    BOOST_AUTO_TEST_CASE(getcheaphash)
    {
        CHashWriter ss(SER_DISK, CLIENT_VERSION);
        ss << 1;
        BOOST_CHECK_EQUAL(ss.GetCheapHash(), 0x8D07CCE5F258F741ULL);    
    }
    
  56. JeremyRubin commented at 9:04 PM on August 7, 2020: contributor

    Since #19601 seems to be preferred by @sipa, @practicalswift, and @jnewbery, going to close #18071.

  57. JeremyRubin closed this on Aug 7, 2020

  58. fanquake referenced this in commit f8462a6d27 on Aug 25, 2020
  59. sidhujag referenced this in commit 16d841beaa on Aug 25, 2020
  60. bitcoin locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:54 UTC