Beware of char and its traps

Recently I was involved in testing of the API, which was working since years already, but required some refactoring for future use-cases. The API is used in the automotive ECU, powered by ARM processors. Nothing really special or difficult, but…

The API was using a data and its SHA-256 digest, both represented by std::string. Such an input was then validated internally by calculating the SHA-256 of data and comparing both hashes. Calculated hash was represented by std::vector<uint8_t>. Surprisingly, produced test cases were failing, even if the content was exactly the same!

std::string user_input = "\x9f\x68\x12\x0a";
std::vector<uint8_t> calculated_input {0x9f, 0x68, 0x12, 0x0a};

assert(user_input.size() == calculated_input.size());
assert(std::equal(std::begin(user_input), std::end(user_input), std::begin(calculated_input)));

As the API was surely working, my suspicions turned towards the test code. But after checking it dozen of times, I couldn’t find anything wrong. Suddenly I realized, that test code is compiled and executed in development enviromnent under x86, while the API is used in armv8 target. The std::string is storing and manipulating the sequence of character-like objects defined by character traits. Whether the underlying char (or its variant) type is signed or unsigned, depends on the platform and compiler. In most x86 GNU/Linux and Microsoft systems the char type is signed. For ARM of PowerPC in turn, it is rather unsigned. The trace was confirmed, after running the test cases in the target.

So what really happened? When comparing the std::string with std::vector<uint8_t> content there is a comparison of (signed) char with uint8_t value. In such case the compiler converts signed value to unsigned and performs the comparison of unsigned values. If any value of individual character in std::string exceeds the range of signed char (here: 127 = 0x7F), then it is treated as negative value (overflow). During the comparison with uint8_t value it is casted to some huge unsigned number. In effect the std::equal fails with comparison.

signed char x = 127;
std::cout << "x = " << (int)x << std::endl;           // x = 127
signed char y = 128;
std::cout << "y = " << (int)y << std::endl;           // y = -128
unsigned int z = y;
std::cout << "z = " << (unsigned int)z << std::endl;  // z = 4294967168

How to fix the code and make it working, independent of the processor architecture? Seems that using a simple predicate solves the issue and quarantees that compared values have correct types:

assert(std::equal(std::begin(user_input), std::end(user_input), std::begin(calculated_input), [](uint8_t lhs, uint8_t rhs) { return lhs == rhs; }));

As a conclusion: beware of signed and unsigned value comparison, have in mind potential overflows and care about code portability.

Beware of char and its traps

Beware of char and its traps

Leave a Reply Cancel reply