From #13107 (comment), @laanwj suggested that we should keep our internal strings to be always utf-8 encoded.
This PR adds two C++17 compatible macros: u8string and u8path. The reason that I use macros is that developers might not want to pass second parameter utf8 everytime when they call string or create path objects.
I tried to find all string methods and convert them to u8string except for external api calls. Also tried to convert explicit or implicit path to u8path. Tell me if you find something that I missed.
Caution: The user must change their config file to be utf-8 encoded if the file contains any non-ascii characters in it. Before 0.18, they are read as ansi-encoded characters on Windows.
Empact
commented at 11:24 PM on June 10, 2018:
member
ken2812221
commented at 12:20 AM on June 11, 2018:
contributor
Can we use path.imbued.locale instead for this?
We still need to use ansi string when we call bdb and leveldb api, so I think that adding a new function is needed. Also, it needs to add boost::locale dependency in order to use boost locale generator.
ken2812221 renamed this: [WIP, bugfix] Add u8path and u8string to boost to fix #13103 [bugfix] Add u8path and u8string to boost to fix #13103 on Jun 12, 2018
fanquake requested review from theuni on Jun 12, 2018
ken2812221 force-pushed on Jun 12, 2018
theuni
commented at 8:27 PM on June 12, 2018:
member
I'm not really comfortable patching boost to make this work. Firstly because it would mean that only depends-builds can build for Windows, but also because I'm just not familiar with it.
Maybe start by upstreaming it and see where the discussion goes?
ken2812221
commented at 8:32 PM on June 12, 2018:
contributor
Firstly because it would mean that only depends-builds can build for Windows
#11911 (Free CDBEnv instances when not in use by ryanofsky)
#11625 (WIP: Add BitcoinApplication & RPCConsole tests by ryanofsky)
#10973 (Refactor: separate wallet from node by ryanofsky)
#10102 ([experimental] Multiprocess bitcoin by ryanofsky)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
Looks like the only use is in util.h, if you implement those as methods in util.cpp, declared in util.h, you can limit variable access to here util.cpp. I may be wrong though! Reviewing on a cellphone. :P
Is it better to do these as methods in the namespace? Preprocessor use should be minimized IMO.
ken2812221
commented at 10:10 PM on June 15, 2018:
I just thought that pass the second parameter at every .string() and path() is pretty annoying and easy to forget. If most developers don't want to use macro function, I could do a scripted-diff.
I'll defer to others as to whether this is worthwhile, but absent other options, I would make a function ala u8path for string as well, and hide g_utf8
ken2812221 renamed this: [bugfix] Add u8path and u8string to fix #13103 [bugfix] Add u8path and u8string to fix encoding issue for Windows on Jul 17, 2018
laanwj added this to the milestone 0.17.0 on Jul 19, 2018
DrahtBot added the label Needs rebase on Jul 20, 2018
ken2812221 force-pushed on Jul 20, 2018
fanquake removed the label Needs rebase on Jul 21, 2018
fanquake
commented at 4:57 AM on July 21, 2018:
member
@theuni We probably want your thoughts here again.
For reference nothing has happened at the upstream PR, however that is less relevant now that we aren't patching Boost.
DrahtBot cross-referenced this on Jul 22, 2018 from issue PSBT key path cleanups by sipa
Sjors
commented at 12:15 PM on July 26, 2018:
member
Now you can use any walletname you want even if you are using English language setting
Thanks, that worked.
Can you change some of the functional tests to use unicode characters? I think wallet_multiwallet.py and wallet_labels.py would be good examples, since they use RPC arguments as well as filenames. I renamed label e in labels tests which worked fine on macOS. The multiwallet test was less happy when I renamed w3: UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-17: ordinal not in range(128). @jnewbery thoughts?
I also tested on macOS and that (still) works fine.
<img width="529" alt="schermafbeelding 2018-07-26 om 13 35 04" src="https://user-images.githubusercontent.com/10217/43260095-bd8235dc-90d8-11e8-9ea6-a3a159525205.png">
ken2812221
commented at 6:08 PM on July 28, 2018:
contributor
Update: Now we can specify any characters in -datadir.
ken2812221 closed this on Jul 28, 2018
ken2812221 reopened this on Jul 28, 2018
Sjors
commented at 1:26 PM on July 30, 2018:
member
Some Travis builds are unhappy:
<img width="1215" alt="schermafbeelding 2018-07-30 om 15 26 10" src="https://user-images.githubusercontent.com/10217/43400037-ebf6e2a0-940c-11e8-84b8-37f1d5ac85f7.png">
ken2812221
commented at 2:17 PM on July 30, 2018:
contributor
@Sjors Those needs upstream changes bitcoin-core/leveldb#18, you can see travis result on #13787.
ken2812221 force-pushed on Jul 31, 2018
Make FileLock support utf8 for Windowsc3e0340da8
Use _wfopen and _wreopen on Windows2d86754743
Make leveldb works with utf8087792e3c0
Use Unicode version of MoveFileEx4723aed698
Add fsbridge::i/ofstream class014aa3e61a
Change fs/std::i/ofstream to fsbridge::i/ofstream2fa7d421d1
Empact
commented at 4:43 AM on August 3, 2018:
member
If you call CommandLineToArgvW within gArgs.ParseParameters instead, you can avoid touching src/bitcoind.cpp and src/bench/bench_bitcoin.cpp, at least. There may be other opportunities for minimization.
ken2812221
commented at 7:17 AM on August 3, 2018:
contributor
If you call CommandLineToArgvW within gArgs.ParseParameters
It would break the ParseParameters in tests if we do that.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2026-05-19 06:54 UTC