SHA256 implementations based on Intel SHA Extensions #13386

pull sipa wants to merge 3 commits into bitcoin:master from sipa:201806_shani changing 7 files +464 −32
  1. sipa commented at 11:32 PM on June 3, 2018: member

    Based on #13191.

    This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.

    In addition to #13191, two extra implementations are provided:

    • (a) A variable-length SHA256 implementation using SHA extensions.
    • (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.

    Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:

    • Using generic C++ code (pre-#10821): 6.1ms
    • Using SSE4 (master, #10821): 4.6ms
    • Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
    • Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
    • Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms

    Benchmarks for 32-byte SHA256 on the same system:

    • Using SSE4 (master, #10821): 190ns
    • Using SHA-NI (this PR): 53ns

    Benchmarks for 1000000-byte SHA256 on the same system:

    • Using SSE4 (master, #10821): 2.5ms
    • Using SHA-NI (this PR): 0.51ms
  2. sipa cross-referenced this on Jun 3, 2018 from issue Specialized double-SHA256 with 64 byte inputs with SSE4.1 and AVX2 by sipa
  3. in src/crypto/sha256.cpp:541 in 1773155bb2 outdated
     539 | +        TransformD64 = TransformD64Wrapper<sha256_sse4::Transform>;
     540 | +        ret = "sse4";
     541 | +    }
     542 | +
     543 | +#if defined(ENABLE_SSE41) && !defined(BUILD_BITCOIN_INTERNAL)
     544 | +    if (have_sse4) {
    


    kallewoof commented at 12:19 AM on June 4, 2018:

    What about

            ret = "sse4";
    #if defined(ENABLE_SSE41) && !defined(BUILD_BITCOIN_INTERNAL)
            TransformD64_4way = sha256d64_sse41::Transform_4way;
            ret += ",sse41";
    #endif
        }
    

    ?


    sipa commented at 12:35 AM on June 4, 2018:

    Done.

  4. sipa force-pushed on Jun 4, 2018
  5. fanquake added the label Validation on Jun 4, 2018
  6. ghost commented at 9:08 AM on June 4, 2018: none

    @sipa

    Great works as usual!

    I just came cross this thread

    https://github.com/armfazh/flo-shani-aesni/blob/master/README.md

    I hope you will have time to look at what they did!

  7. DrahtBot commented at 12:41 PM on June 4, 2018: contributor

    Needs rebase

  8. promag commented at 12:45 PM on June 4, 2018: member

    Concept ACK, nice numbers.

  9. sipa force-pushed on Jun 4, 2018
  10. sipa commented at 4:54 PM on June 4, 2018: member

    Rebased. @Kick1986 Nice, I'll have a look.

  11. DrahtBot cross-referenced this on Jun 5, 2018 from issue Enable double-SHA256-for-64-byte code on 32-bit x86 by sipa
  12. DrahtBot commented at 12:49 AM on June 5, 2018: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->Note to reviewers: This pull request conflicts with the following ones:

    • #13442 (Convert the 1-way SSE4 SHA256 code from asm to intrinsics by sipa)
    • #13203 (Add POWER8 ASM for 4-way SHA256 by TheBlueMatt)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  13. Empact commented at 9:34 AM on June 5, 2018: member

    For clang version, looks like they were added in 3.4, but never noted in the release notes. Source: went from commit date[1] to release date[2] to file in release version[3]. Did not check for every intrinsic used. [1] https://github.com/llvm-mirror/clang/commit/b83f5a77335ca8a5b41ba4e17aa8d4bb6248c1e4#diff-c4f203a0f202bf56c364a657b0aecae9 [2] http://releases.llvm.org/ [3] https://github.com/llvm-mirror/clang/blob/release_34/lib/Headers/shaintrin.h

  14. in src/crypto/sha256_shani.cpp:87 in bb80ab2596 outdated
      86 | +{
      87 | +    __m128i m0, m1, m2, m3, s0, s1, so0, so1;
      88 | +
      89 | +    /* Load state */
      90 | +    s0 = _mm_loadu_si128((const __m128i*)s);
      91 | +    s1 = _mm_loadu_si128((const __m128i*)(s + 4));
    


    theuni commented at 8:33 PM on June 5, 2018:

    I think these could be _mm_load_si128 if s[] was alignas(16).

  15. in src/Makefile.am:35 in bb80ab2596 outdated
      31 | @@ -32,6 +32,7 @@ LIBBITCOIN_UTIL=libbitcoin_util.a
      32 |  LIBBITCOIN_CRYPTO=crypto/libbitcoin_crypto.a
      33 |  LIBBITCOIN_CRYPTO_SSE41=crypto/libbitcoin_crypto_sse41.a
      34 |  LIBBITCOIN_CRYPTO_AVX2=crypto/libbitcoin_crypto_avx2.a
      35 | +LIBBITCOIN_CRYPTO_SHANI=crypto/libbitcoin_crypto_shani.a
    


    theuni commented at 8:49 PM on June 5, 2018:

    These are starting to get out of hand. I think we should take @TheBlueMatt's suggestion and treat LIBBITCOIN_CRYPTO as a collection of these helpers. That way we can just add $(LIBBITCOIN_CRYPTO) everywhere, and that will pull in the cpu-specific libs as well. Something like:

    ...
    LIBBITCOIN_CRYPTO=crypto/libbitcoin_crypto.a
    LIBBITCOIN_CRYPTO_AVX2=crypto/libbitcoin_crypto_avx2.a
    LIBBITCOIN_CRYPTO_SHANI=crypto/libbitcoin_crypto_shani.a
    LIBBITCOIN_CRYPTO+=$(LIBBITCOIN_CRYPTO_AVX2)
    LIBBITCOIN_CRYPTO+=$(LIBBITCOIN_CRYPTO_SHANI)
    ...
    

    Then the cpu-specific ones can be dropped from the LDADDs all over the place. I'm happy to do up a patch on top of this if you'd like.


    sipa commented at 12:43 AM on June 6, 2018:

    @theuni Actually, feel like PRing that as a separate PR, before this one goes in? Then @TheBlueMatt and I can both rebase ours on top of yours and not conflict with each other.

  16. theuni commented at 9:03 PM on June 5, 2018: member

    concept ACK.

    I noticed while testing #13400 that I added a bug, but bitcoind started up fine anyway, due to missing sanity checks for the double/4way/8way hashes. Mind adding those?

  17. theuni cross-referenced this on Jun 6, 2018 from issue crypto: cleanup sha256 build by theuni
  18. in src/crypto/sha256_shani.cpp:12 in bb80ab2596 outdated
       8 | +
       9 | +#ifdef ENABLE_SHANI
      10 | +
      11 | +#include <stdint.h>
      12 | +#if defined(_MSC_VER)
      13 | +#include <immintrin.h>
    


    DesWurstes commented at 4:12 AM on June 7, 2018:

    I believe including immintrin.h is enough for both platforms. x86intrin includes immintrin.h, and immintrin.h includes everything that is needed: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/immintrin.h

    EDIT: Tested on Linux GCC and Clang, including immintrin.h is enough for all platforms.


    sipa commented at 4:59 PM on June 11, 2018:

    Thanks, fixed.

  19. laanwj referenced this in commit 70a03c635b on Jun 11, 2018
  20. DrahtBot added the label Needs rebase on Jun 11, 2018
  21. sipa force-pushed on Jun 11, 2018
  22. sipa commented at 4:52 PM on June 11, 2018: member

    Rebased.

  23. DrahtBot removed the label Needs rebase on Jun 11, 2018
  24. DrahtBot commented at 7:00 PM on June 12, 2018: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->Needs rebase

  25. DrahtBot added the label Needs rebase on Jun 12, 2018
  26. sipa force-pushed on Jun 12, 2018
  27. sipa commented at 7:23 PM on June 12, 2018: member

    Rebased.

  28. DrahtBot removed the label Needs rebase on Jun 12, 2018
  29. in src/crypto/sha256_shani.cpp:14 in b318acd6ee outdated
       9 | +#ifdef ENABLE_SHANI
      10 | +
      11 | +#include <stdint.h>
      12 | +#include <immintrin.h>
      13 | +
      14 | +#include "crypto/common.h"
    


    theuni commented at 8:48 PM on June 13, 2018:

    Linter is yelling about include style here.

  30. sipa force-pushed on Jun 14, 2018
  31. DrahtBot cross-referenced this on Jun 14, 2018 from issue For AVX2 code, also check for AVX, XSAVE, and OS support by sipa
  32. DrahtBot cross-referenced this on Jun 14, 2018 from issue Convert the 1-way SSE4 SHA256 code from asm to intrinsics by sipa
  33. DrahtBot cross-referenced this on Jun 15, 2018 from issue WIP [bench] CCoinsView(Cache): measure various scenarios by Sjors
  34. DrahtBot cross-referenced this on Jun 15, 2018 from issue [Tests] Make p2p_segwit easier to debug by jnewbery
  35. DrahtBot cross-referenced this on Jun 15, 2018 from issue Make SER_GETHASH implicit for CHashWriter and SerializeHash by Empact
  36. DrahtBot cross-referenced this on Jun 15, 2018 from issue Wallet: correctly deprecate accounts in getbalance, re-add minconf / include-watch-only by jonasschnelli
  37. DrahtBot cross-referenced this on Jun 15, 2018 from issue gui: Drop qt4 support by laanwj
  38. DrahtBot cross-referenced this on Jun 15, 2018 from issue Make sure LC_ALL=C is set in all shell scripts by practicalswift
  39. DrahtBot cross-referenced this on Jun 15, 2018 from issue rpc: have verifytxoutproof check the number of txns in proof structure by instagibbs
  40. DrahtBot cross-referenced this on Jun 15, 2018 from issue [WIP] support new multisig template in wallet for Solver, signing, and signature combining by instagibbs
  41. DrahtBot added the label Needs rebase on Jun 24, 2018
  42. DrahtBot commented at 2:19 PM on June 24, 2018: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->Needs rebase

  43. [Refactor] CPU feature detection logic for SHA256 268400d318
  44. sipa force-pushed on Jun 24, 2018
  45. sipa commented at 5:56 PM on June 24, 2018: member

    Rebased after #13471 merge.

    Also split up the CPU feature detection logic change into its own commit.

    I noticed while testing #13400 that I added a bug, but bitcoind started up fine anyway, due to missing sanity checks for the double/4way/8way hashes. Mind adding those?

    That was addressed in #13438.

  46. DrahtBot removed the label Needs rebase on Jun 24, 2018
  47. theuni commented at 7:51 PM on June 25, 2018: member
    
    utACK otherwise.
    
  48. Add SHA256 implementation using using Intel SHA intrinsics 4c935e2eee
  49. Use immintrin.h everywhere for intrinsics 66b2cf1ccf
  50. sipa force-pushed on Jun 26, 2018
  51. theuni commented at 6:56 PM on June 26, 2018: member

    Thanks! utACK 66b2cf1ccfad545a8ec3f2a854e23f647322bf30.

  52. jb55 commented at 3:32 AM on June 27, 2018: contributor

    Not sure what compelled me to do this, and it's probably overkill... but...

    Tested ACK 66b2cf1ccfad545a8ec3f2a854e23f647322bf30 with 100k rounds of quickcheck at various optimization levels, but only with the non-two way transform for now.

  53. DesWurstes commented at 11:54 AM on June 27, 2018: contributor

    Just a nit from older pull requests: Now that it has a custom CPUID function https://github.com/bitcoin/bitcoin/blob/d96bdd78307bc5469cb8a4d5ca0e6cbc21fe4073/src/crypto/sha256.cpp#L534-L538 including <cpuid.h> is not necessary now:

    https://github.com/bitcoin/bitcoin/blob/d96bdd78307bc5469cb8a4d5ca0e6cbc21fe4073/src/crypto/sha256.cpp#L12-L16

    Thank you for your awesome contributions!

  54. gmaxwell commented at 8:06 AM on July 7, 2018: contributor

    ACK

  55. laanwj commented at 7:10 PM on July 9, 2018: member

    utACK 66b2cf1ccfad545a8ec3f2a854e23f647322bf30 tested that build passes on FreeBSD+OpenBSD

  56. laanwj merged this on Jul 9, 2018
  57. laanwj closed this on Jul 9, 2018

  58. laanwj referenced this in commit 3a3eabef40 on Jul 9, 2018
  59. Bushstar cross-referenced this on Jul 13, 2018 from issue commits from bitcoin/master by Bushstar
  60. sipa cross-referenced this on Aug 8, 2018 from issue Assembly optimisations are compiled even with --disable-asm by luke-jr
  61. codablock referenced this in commit cd16312712 on Oct 1, 2019
  62. codablock referenced this in commit dda04a300e on Oct 1, 2019
  63. codablock referenced this in commit adfd2e1a83 on Oct 1, 2019
  64. codablock referenced this in commit 1b2252c28c on Oct 1, 2019
  65. sickpig cross-referenced this on Oct 25, 2019 from issue [port] Specialized double-SHA256 with 64 byte inputs with SSE4.1 and AVX2 by sickpig
  66. barrystyle referenced this in commit 84d549d17b on Jan 22, 2020
  67. barrystyle referenced this in commit 9be0431b4c on Jan 22, 2020
  68. bitcoin locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:55 UTC