Add full UTF-8 support to RPC #7892

pull laanwj wants to merge 5 commits into bitcoin:master from laanwj:2016_04_i18n_unicode_rpc changing 14 files +209 −44
  1. laanwj commented at 1:50 PM on April 16, 2016: member

    (the univalue patch should be taken upstream - the RPC test and doc change should end up here. Posting it here for visibility and because this fixes an ancient issue: #2127. I think this is important to many non-English users)

    This adds full UTF-8 support for both on input and output through JSON.

    bitcoin-cli usage

    Before:

    $ src/bitcoin-cli -datadir=/store/tmp/testbtc getnewaddress "рыба"
    1HQCE3H87fZ1er1ExLiF5V4a1Kxf46r3J2
    $ src/bitcoin-cli -datadir=/store/tmp/testbtc listaccounts
    {
    ...  "\u00c3\u0083\u00c2\u0091\u00c3\u0082\u00c2\u0080\u00c3\u0083\u00c2\u0091\u00c3\u0082\u00c2\u008b\u00c3\u0083\u00c2\u0090\u00c3\u0082\u00c2\u00b1\u00c3\u0083\u00c2\u0090\u00c3\u0082\u00c2\u00b0": 0.00000000
    }
    $ src/bitcoin-cli -datadir=/store/tmp/testbtc getaccount 1HQCE3H87fZ1er1ExLiF5V4a1Kxf46r3J2
    рыба
    

    After:

    $ src/bitcoin-cli -regtest getaccountaddress "рыба"
    mrVZ8GaURJmL3LEUu6ygU2CrUrZ8979iK2
    $ src/bitcoin-cli -regtest listaccounts
    {
    ...
      "рыба": 0.00000000
    }
    $ src/bitcoin-cli -regtest getaccount mrVZ8GaURJmL3LEUu6ygU2CrUrZ8979iK2                                       
    рыба
    
    

    GUI

    This also affects the GUI debug console.

    Before:

    i18n_before

    When strings are passed directly (such as for getaccount's return argument) it works fine, but when they go through JSON formatting/parsing, it fails.

    After:

    i18n_after

    RPC test

    A test for this new functionality has been added to the wallet.py test.

    See the commit message of the first commit for details on what exactly had to be changed in univalue.

  2. laanwj added the label RPC/REST/ZMQ on Apr 16, 2016
  3. jonasschnelli commented at 5:51 PM on April 16, 2016: contributor

    Nice! Concept ACK.

    Not sure how we should treat the UniValue changes. Should we try to keep the "upstream" repository bitcoin/univalue in sync?

  4. MarcoFalke commented at 5:59 PM on April 16, 2016: member

    Concept ACK. Tests look good.

    "upstream" repository bitcoin/univalue in sync

    The patch goes to jgarzik/univalue as bitcoin/univalue is only for bitcoin related patches. (Still in the end we need to merge jgarzik/univalue into bitcoin/univalue because we use this as subtree.)

  5. laanwj cross-referenced this on Apr 18, 2016 from issue Handle UTF-8 by laanwj
  6. laanwj force-pushed on Apr 18, 2016
  7. laanwj force-pushed on Apr 28, 2016
  8. gmaxwell commented at 7:48 AM on May 17, 2016: contributor

    Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

  9. in qa/rpc-tests/wallet.py:None in 0ef6abf32d outdated
       0 | @@ -1,4 +1,6 @@
       1 |  #!/usr/bin/env python2
       2 | +# coding=utf-8
       3 | +# ^^^^^^^^^^^^ TODO remove when supporting only Python3
    


    MarcoFalke commented at 8:05 AM on May 17, 2016:

    Needs rebase.

  10. laanwj commented at 10:25 AM on May 18, 2016: member

    Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

    I don't think that's possible with just unicode?

    ANSI escape sequences are a whole different story. In JSON objects they'll be escaped, but when printing strings (say, for help, when there is a single string result) everything is let through. Not sure whether this should be changed, but this would be a change local to bitcoin-cli, not the unicode framework. I'd suggest doing so in a separate pull.

    Nothing new here though: It's always been possible to pass any kind of crap characters from the server (just look at my OP) to bitcoin-cli, this just makes the round-trip sane and conform to UTF-8.

  11. dcousens commented at 1:03 PM on May 19, 2016: contributor

    concept ACK, once-over utACK (did not cross-verify UTF-8 impl.).

    Could probably use more tests for that JSONUTF8Writer...

  12. laanwj commented at 8:49 AM on May 20, 2016: member

    Could probably use more tests for that JSONUTF8Writer... ing

    There are few direct tests for it, but all the other tests that parses JSON with strings in it is testing it. What would be nice is doing some fuzzing using https://github.com/laanwj/univalue/tree/2015_11_unifuzz and see that everything is accepted is valid utf-8.

  13. gmaxwell commented at 4:10 PM on May 25, 2016: contributor

    @pstratem Any interest in fuzzing this?

  14. sipa commented at 4:20 PM on May 25, 2016: member

    Needs rebase.

  15. laanwj force-pushed on Jun 9, 2016
  16. laanwj commented at 6:21 AM on June 9, 2016: member

    Rebased, done and removed the Python3 TODOs

  17. Squashed 'src/univalue/' changes from 2740c4f..f32df99
    f32df99 Merge branch '2016_04_unicode' into bitcoin
    280b191 Merge remote-tracking branch 'jgarzik/master' into bitcoin
    c9a716c Handle UTF-8
    bed8dd9 Version 1.0.2.
    5e7985a Merge pull request #14 from laanwj/2015_11_escape_plan
    
    git-subtree-dir: src/univalue
    git-subtree-split: f32df99e96d99ab49e5eeda16cac93747d388245
    60ab9b2006
  18. Merge commit '60ab9b200654ef0914459711cf2b22be16be3dc2' 63151521fd
  19. test: add ensure_ascii setting to AuthServiceProxy
    Add a setting ensure_ascii to AuthServiceProxy. This setting,
    defaulting to True (backwards compatible),
    is passed through to json.dumps. If set to False, non-ASCII characters
    >0x80 are not escaped. This is useful for testing server
    input processing, as well as slightly more bandwidth friendly in case of
    heavy unicode usage.
    a406fcb6ca
  20. test: test utf-8 for labels in wallet 6bbb4ef399
  21. doc: Mention full UTF-8 support in release notes 7982fce64c
  22. laanwj force-pushed on Jun 10, 2016
  23. laanwj commented at 1:23 PM on June 10, 2016: member

    Ok:

    This should be ready now.

  24. gmaxwell commented at 10:26 AM on June 15, 2016: contributor

    ACK

  25. laanwj merged this on Jun 16, 2016
  26. laanwj closed this on Jun 16, 2016

  27. laanwj cross-referenced this on Jun 16, 2016 from issue Unicode comment strings doesn't handled correctly in "move" API calls to bitcoind. by HandleX
  28. laanwj referenced this in commit 9c3d0fab36 on Jun 16, 2016
  29. laanwj commented at 10:23 AM on June 16, 2016: member

    It looks like the build system doesn't detect changes to univalue source files and will not automatically rebuild the library: if you get RPC test failures concerning unicode, please build from a clean tree

  30. MarcoFalke commented at 10:26 AM on June 16, 2016: member
    make clean
    make distclean
    

    should take care of this, I assume.

  31. laanwj commented at 10:29 AM on June 16, 2016: member

    Yes, make clean should be enough.

  32. kyuupichan cross-referenced this on Apr 30, 2017 from issue [port] bring src/univalue up to core/master by kyuupichan
  33. codablock referenced this in commit adfca782a0 on Sep 16, 2017
  34. codablock referenced this in commit d2d4dfda2e on Sep 19, 2017
  35. codablock referenced this in commit 297aa47f0b on Dec 27, 2017
  36. codablock referenced this in commit 79ad5f7682 on Dec 28, 2017
  37. dagurval cross-referenced this on Jan 30, 2018 from issue Add full UTF-8 support to RPC by dagurval
  38. andvgal referenced this in commit 25dd406c61 on Jan 6, 2019
  39. bitcoin locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-20 06:55 UTC