| Summary: | expressions in include/archive.h (XZ magic) are undefined on big-endian architectures | ||
|---|---|---|---|
| Product: | Busybox | Reporter: | Michael Tokarev <mjt+busybox> |
| Component: | Other | Assignee: | unassigned |
| Status: | RESOLVED FIXED | ||
| Severity: | minor | CC: | busybox-cvs |
| Priority: | P5 | ||
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| Host: | Target: | ||
| Build: | |||
(In reply to comment #0) > This place in particular: > > XZ_MAGIC1a = 256 * (256 * (256 * 0xfd + '7') + 'z') + 'X', > > it depends on the signedness of char datatype (all char constants here). How exactly it depends on signedness of char? > Compiler produces the following warning: > > In file included from archival/ar.c:31:0: > include/archive.h:17:20: warning: integer overflow in expression [-Woverflow] > CC archival/bbunzip.o The warning appears because 0xfd,'7','z','X' as 32-bit big endian constant is 0xfd377a58, which is negative if interpreted as signed value. IOW: in the last multiply step "256 * (...)" arithmetic overflow occurs: positive value 0xfd377a gets multiplied by 0x100 and result overflows - becomes negative value 0xfd377a00. > Why not use computed numbers for all these magic numbers instead of > expressions, like > > XZ_MAGIC1a = 0xFD377A58UL Because it's harder to verify that the magic is correct. Does XZ_MAGIC1a = 256 * (unsigned)(256 * (256 * 0xfd + '7') + 'z') + 'X' work on big endian? I just checked (finally come to it) -- well, yes, the proposed
XZ_MAGIC1a = 256 * (unsigned)(256 * (256 * 0xfd + '7') + 'z') + 'X'
does work on big endian (I tried on mips machine), as in, gcc does not warn about integer overflow anymore in this place. But the result is that XZ_MAGIC1a gets promoted to unsigned long long instead of int or unsigned int, ie, internally it is interpreted as 64bit number. So for the following program:
enum foo {
XZ_MAGIC1a = ((0xfd * 256 + '7') * 256 + 'z') * 256 + 'X',
XZ_MAGIC1b = 256 * (unsigned)(256 * (256 * 0xfd + '7') + 'z') + 'X',
};
#include <stdio.h>
int main() {
printf("1=%08x 2=%08x\n", XZ_MAGIC1a, XZ_MAGIC1b);
return 0;
}
it produces this:
test.c:2:49: warning: integer overflow in expression [-Woverflow]
test.c: In function 'main':
test.c:8:2: warning: format '%x' expects argument of type 'unsigned int', but argument 3 has type 'long long int' [-Wformat]
So I added another cast:
XZ_MAGIC1b = (unsigned)(256 * (unsigned)(256 * (256 * 0xfd + '7') + 'z') + 'X'),
but the result is still interpreted as long long int -- ie, gcc gives exactly the same warning.
Note that 0xfd377a58, when used in enum, is also promoted to unsigned long long. And note also that the original version, while gcc warns about it, actually works correctly.
I think the best will be to use a #define with this direct value (0xfd377a58), which is still much easier to read and verify than any of the expressions used in this file, -- especially if there's a comment around telling that 0x37 is '7', 0x7a is 'z' and 0x58 is 'X'. And it will continue to work even if on a target platform some other - non-ascii - encoding is used, like ebcdic, since the (binary) input will still contain the same 0x7a, 0x58 codes, even if 'z' and 'X' will have different codes on the target platform.
Thanks,
/mjt
Fixed in git: commit b8ff9357d579913a5c699d89b4126d859590eea3 Author: Denys Vlasenko <vda.linux@googlemail.com> Date: Mon Dec 5 04:54:14 2011 +0100 suppress a "integer overflow in expression" waring on big endian |
This place in particular: XZ_MAGIC1a = 256 * (256 * (256 * 0xfd + '7') + 'z') + 'X', it depends on the signedness of char datatype (all char constants here). Compiler produces the following warning: In file included from archival/ar.c:31:0: include/archive.h:17:20: warning: integer overflow in expression [-Woverflow] CC archival/bbunzip.o There's a debian bugreport about this, see http://bugs.debian.org/635370 . Why not use computed numbers for all these magic numbers instead of expressions, like XZ_MAGIC1a = 0xFD377A58UL here and everywhere else around, like suggested in this debian bugreport? This makes things significantly more readable than it is now, both for little- and for big-endian. Thanks, /mjt