Unsigned char fix & fix undefined phexdigits[255] #1122

dlitz commented at 4:52 AM on April 18, 2012: contributor

This pull request contains two fixes:

Fix test failures on Linux on ARM, which is a platform where "char" is unsigned ("char" is signed on x86). This causes IsHex() to erroneously return true, and Debian doesn't automatically move new versions of bitcoin to its testing distribution, due to the build failures on arm.
Fix a bug where the statically-defined phexdigits is missing an entry, because there's a newline with no comma between "-1" and "-1".

Cheers,

Dwayne

Fix phexdigits[255] is undefined. a6fa147c8d

Fix bugs on 'unsigned char' platforms.

In ISO C++, the signedness of 'char' is undefined.  On some platforms (e.g.
ARM), 'char' is an unsigned type, but some of the code relies on 'char' being
signed (as it is on x86).  This is indicated by compiler warnings like this:

 bignum.h: In constructor 'CBigNum::CBigNum(char)':
 bignum.h:81:59: warning: comparison is always true due to limited range of data type [-Wtype-limits]

 util.cpp: In function 'bool IsHex(const string&)':
 util.cpp:427:28: warning: comparison is always false due to limited range of data type [-Wtype-limits]

In particular, IsHex erroneously returned true regardless of the input
characters, as long as the length of the string was a positive multiple of 2.

Note: For testing, it's possible using GCC to force char to be unsigned by
adding the -funsigned-char parameter to xCXXFLAGS.

8c8e8c2e93

laanwj commented at 5:25 AM on April 18, 2012: member

The 'signed char' doesn't really make me happy. We use 'char' with the implicit assumption that it's signed all over the code base. Many programs do this. We make worse assumptions about data types (see #888). Changing this assumptions will require quite some thought especially in serialize.h.

Why not just pass -fsigned-char for now?

ACK on adding an entry/comma, good catch

dlitz commented at 5:46 AM on April 18, 2012: contributor

Assuming that 'char' is signed is a bug; ISO C and C++ leave it undefined. The fact that Satoshi's prototype was a little bit sloppy isn't much of a reason to never fix these bugs.

Is -fsigned-char portable? (Does clang support it?) What happens when you link against a library that defines char differently?

Since I've already put in the effort of finding these and submitting a patch to fix them, why wouldn't you accept the patch? These bugs aren't going to get fixed, otherwise.

laanwj commented at 5:50 AM on April 18, 2012: member

Yes, clang supports that option. So does MSVC2010. Linking against a library that defines char otherwise should not be a problem as the in-memory representation is the same.

I agree that it's no reason to never fix these, but have you checked all other places where char is used? Are you sure that unsigned chars won't break on some obscure serialization cases? (that are currently not covered by unit tests?)

If not, we can not support compiling with unsigned chars.

laanwj commented at 6:02 AM on April 18, 2012: member

Eventually the fix would be to forbid the usage of plain char completely, and define our own typedefs for signed and unsigned char (or use posix uint8_t / int8_t, even better as we assume 8-bitness?), and force the use of those everywhere.

Edit: but don't get me wrong, I'm not opposed to accepting this. I'm just wary that it might not solve the entire issue.

dlitz commented at 1:41 PM on April 18, 2012: contributor

Actually, not every use of char needs to be changed. Cases where char refers to the characters of a human-readable string should be left as-is (since things like open use const char *).

I searched for every instance of char, and skipped anything that looked like it was just ordinary string handling. GCC was helpful, because it gave warnings where >= 0 checks were being done on an unsigned char variable. Admittedly, I used my intuition to make that call. I'm more familiar with C than I am with C++, and I'm not extremely familiar with the Bitcoin codebase. You're right that this might not solve the entire issue, but I think it's a useful step.

It might be useful for other developers to use -funsigned-char on x86 and see if they get the results they expect. I tried that and it seems to work fine, but I didn't exercise it too hard. bitcoind still segfaults after a while when downloading the block chain when I emulate armel using qemu-user, but it also complains about some unsupported syscall number, so I suspect the crash has more to do with qemu than with bitcoind. (Unfortunately, I don't have real ARM hardware to test on at the moment.)

As for serialization, I wouldn't expect too many problems, because the difference between signed char and unsigned char is mostly irrelevant except for comparisons, or when casting to larger types like int. Cases where a character is used as the index for a look-up table already work, because x86's char is signed by default (so it would have already been broken if it wasn't declared properly).

My motivation for this patch is to get the latest bitcoind package into Debian testing. Right now, they're stuck at 0.3.24 due to some build issues on some architectures, so I've been submitting bug reports with the necessary patches to try to get this resolved.

If you want, I can check this more carefully over the weekend. Pointers to potentially problematic areas not covered by the test suite would be helpful, in that case.

laanwj commented at 7:04 AM on April 19, 2012: member

I don't have specific suggestions. If you check(ed) all occurrences of char it should be OK.

sipa commented at 2:08 PM on April 20, 2012: member

All changes to unsigned seem safe to me, and changes to signed will not influence any current;y supported platform. ACK.

luke-jr commented at 3:20 PM on April 20, 2012: member

I concur with sipa on a read-over basis, but comparing the object files produced (on x86) reveal some disturbing (to me, as someone who isn't familiar with x86 assembly) differences...

The following object files are changed: bitcoinrpc, checkpoints, main, script, Checkpoints_tests, miner_tests

Here is a diff of the assembly in script: http://paste.pocoo.org/show/584682/

laanwj commented at 3:29 PM on April 20, 2012: member

@luke-jr: a lot of those seem different function names to calls? (but same binary representation)

If you want a fair comparison you should remove the ',' from phexdigit, so that the array will stay 255 bytes. (don't know if you did that already)

Also gcc tends to be quite weird, in that small unrelated changes can seemingly randomly change how certain optimizations are done.

Edit: I'm trying with -O0 and can confirm there are minor changes to the generated code, though, even with just commit 8c8e8c2e:

addrman.o: match bitcoinrpc.o: base_uint<256u>::SetHex(char const_) differs checkpoints.o: base_uint<256u>::SetHex(char const_) differs crypter.o: match db.o: match init.o: match irc.o: match key.o: match keystore.o: match main.o: base_uint<256u>::SetHex(char const*) differs net.o: match netbase.o: match noui.o: match protocol.o: match rpcdump.o: match script.o: difference in CScript::DecodeOP_N(opcodetype) and CScript::EncodeOP_N(int) util.o: match version.o: match wallet.o: match walletdb.o: match

A change in SetHex isn't so surprising, but in DecodeOP?!?

0000000000000000 <CScript::DecodeOP_N(opcodetype)>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 20             sub    $0x20,%rsp
   8:   89 7d ec                mov    %edi,-0x14(%rbp)
   b:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  12:   00 00 
  14:   48 89 45 f8             mov    %rax,-0x8(%rbp)
  18:   31 c0                   xor    %eax,%eax
  1a:   83 7d ec 00             cmpl   $0x0,-0x14(%rbp)
  1e:   75 07                   jne    27 <CScript::DecodeOP_N(opcodetype)+0x27>
  20:   b8 00 00 00 00          mov    $0x0,%eax
  25:   eb 2f                   jmp    56 <CScript::DecodeOP_N(opcodetype)+0x56>
  27:   8b 45 ec                mov    -0x14(%rbp),%eax
  2a:   83 f8 50                cmp    $0x50,%eax
  2d:   7e 08                   jle    37 <CScript::DecodeOP_N(opcodetype)+0x37>
  2f:   8b 45 ec                mov    -0x14(%rbp),%eax
  32:   83 f8 60                cmp    $0x60,%eax
  35:   7e 19                   jle    50 <CScript::DecodeOP_N(opcodetype)+0x50>
  37:   b9 00 00 00 00          mov    $0x0,%ecx
-  3c:  ba c0 01 00 00          mov    $0x1c0,%edx
+  3c:  ba c2 01 00 00          mov    $0x1c2,%edx
  41:   be 00 00 00 00          mov    $0x0,%esi
  46:   bf 00 00 00 00          mov    $0x0,%edi
  4b:   e8 00 00 00 00          callq  50 <CScript::DecodeOP_N(opcodetype)+0x50>
  50:   8b 45 ec                mov    -0x14(%rbp),%eax
  53:   83 e8 50                sub    $0x50,%eax
  56:   48 8b 55 f8             mov    -0x8(%rbp),%rdx
  5a:   64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
  61:   00 00 
  63:   74 05                   je     6a <CScript::DecodeOP_N(opcodetype)+0x6a>
  65:   e8 00 00 00 00          callq  6a <CScript::DecodeOP_N(opcodetype)+0x6a>
  6a:   c9                      leaveq 
  6b:   c3                      retq

It's minimal and I have no idea what it means (the change in CScript::EncodeOP_N(int) is similar) .

luke-jr commented at 3:51 PM on April 20, 2012: member

I was only comparing the tip of this pullreq.

dlitz commented at 4:07 PM on April 20, 2012: contributor

@luke-jr That's weird. On my machine, I only see these differences for script.o: http://paste.pocoo.org/show/584704/

laanwj commented at 4:24 PM on April 20, 2012: member

The callq 50 <CScript::DecodeOP_N(opcodetype)+0x50> must be the call to __assert_fail (called if eax<=0x50 or >0x60, which matches the opcode >= OP_1 && opcode <= OP_16 expression ). So a different argument is passed when the assert fails, we can live with that, I guess.

Edit: objdump -rCd confirms this suspicion. The argument that differs is the line number. Pretty boring :) Edit.2: the change in SetHex is simply using a different register (edx/eax) in the zero-extend instruction,

 fb:    0f b6 80 00 00 00 00    movzbl 0x0(%rax),%eax
      fe: R_X86_64_32S  base_uint<256u>::SetHex(char const*)::phexdigit
102:    89 c2                   mov    %eax,%edx

becomes:

 fb:    0f b6 90 00 00 00 00    movzbl 0x0(%rax),%edx
      fe: R_X86_64_32S  base_uint<256u>::SetHex(char const*)::phexdigit

ACK

sipa commented at 4:55 PM on April 20, 2012: member

ACK

laanwj referenced this in commit 00b9c0f4b2 on Apr 20, 2012

laanwj merged this on Apr 20, 2012

laanwj closed this on Apr 20, 2012

dlitz commented at 4:04 PM on April 21, 2012: contributor

Thanks, guys!

coblee referenced this in commit a23ba6c46b on Jul 17, 2012

suprnurd referenced this in commit 9ed18fdcbe on Dec 5, 2017

lateminer referenced this in commit 21a057f29c on Jan 22, 2019

lateminer referenced this in commit 641b1d6bbe on Dec 25, 2019

bitcoin locked this on Sep 8, 2021