Bug 15036 - [PATCH] Awk: fix bitwise functions when operating with large numbers
Summary: [PATCH] Awk: fix bitwise functions when operating with large numbers
Status: NEW
Alias: None
Product: Busybox
Classification: Unclassified
Component: Other (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-08 17:45 UTC by Carlos Ibáñez
Modified: 2022-10-08 17:45 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments
awk-bitwiseop-fix-arch64.patch (3.34 KB, patch)
2022-10-08 17:45 UTC, Carlos Ibáñez
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Ibáñez 2022-10-08 17:45:50 UTC
Created attachment 9376 [details]
awk-bitwiseop-fix-arch64.patch

Hi there!

While working on a small awk program in an arm64 I found that bitwise operations are broken when operating with large numbers.

Looking under the hood I found that awk numbers are doubles[1], whereas bitwise operations are performed over unsigned longs[2].

The problem:
  - double is typically 2^53
  - unsigned long is 2^32 in 32bit archs
  - unsigned long is typically 2^64 in 64bit archs

So, the result of a unsigned long bitwise operation is stored on a double

This means that data is lost in 64bit archs that use 64bit unsigned longs when the result is greater than 2^53. For example, operating with a simple compl(0) on an arm64 or x64 Linux generates unexpected results:

  awk 'BEGIN{print compl(0)%4}'

It returns 0 instead of 3.

But it works on GNU Awk, why?

Well, apparently all gawk bitwise operations return the result of a function called make_integer[3] which in turn calls another function that fixes the issue I described above: adjust_uint[4].

adjust_uint basically truncates sizes greater than 2^53 (like 2^64 unsigned long) to 2^53 from the left, preserving low order bits.

So I went ahead and shamelessly copied adjust_uint into Busybox Awk and it worked!

And here I am submitting a patch with the changes adapted to Busybox :)

This adaptation includes:
  - Replacing uintmax_t with unsigned long on adjust_uint and the count_trailing_zeros helper, as the result of bitwise operations on Busybox is unsigned long
  - Replacing GCC __builtin_ctzll (unsigned long long) with GCC __builtin_ctzl (unsigned long)
  - Including float.h for the FLT_RADIX macro
  - Removing some macros that adapt adjust_uint when gawk numbers are long doubles in some platforms
  - Renaming some macros and their mention in the original gawk comments

Cheers,
Carlos


[1] - https://git.busybox.net/busybox/tree/editors/awk.c?id=c8c1fcdba163f264a503380bc63485aacd09214c#n123
[2] - https://git.busybox.net/busybox/tree/editors/awk.c?id=c8c1fcdba163f264a503380bc63485aacd09214c#n1048
[3] - https://git.savannah.gnu.org/cgit/gawk.git/tree/builtin.c?id=d434cb3ce61e0cc5e26180da914f1a58223897a2#n3565
[4] - https://git.savannah.gnu.org/cgit/gawk.git/tree/floatcomp.c?id=d434cb3ce61e0cc5e26180da914f1a58223897a2#n91