Bug 12506

Summary: printf in awk fails to print zeros
Product: Busybox Reporter: Yuri <yuri>
Component: OtherAssignee: unassigned
Status: NEW ---    
Severity: normal CC: busybox-cvs, wolf+busybox
Priority: P5    
Version: 1.31.x   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Host: Target:
Build:

Description Yuri 2020-01-25 22:43:11 UTC
In an attempt to transfer a binary file to the box over the terminal connection I converted the binary to an ASCII file with decimal ASCII codes in it. However, awk fails to recover the content due to it losing zeros (and maybe other characters):

In BusyBox v1.31.0 zeros are lost:
> # (echo 0; echo 1; echo 2) | awk '{ printf("%c",$0); }' | hexdump 
> 0000000 0201                                   
> 0000002


As a comparison, on FreeBSD all bytes are recovered:
> $ (echo 0; echo 1; echo 2) | awk '{ printf("%c",$0); }' | hd
> 00000000  00 01 02                                          |...|
> 00000003


There is no practical benefit to losing zeros. This can only hurt operations in an embedded system where BusyBox is typically used.
Comment 1 wolf+busybox 2020-01-25 23:25:20 UTC
Printf seems easy enough to fix, but same issue is present also in sprintf, which would require more substantial changes to get right.
Comment 2 wolf+busybox 2020-01-25 23:26:03 UTC
Printf seems easy enough to fix, but same issue is present also in sprintf, which would require more substantial changes to get right.
Comment 3 wolf+busybox 2020-01-26 11:01:54 UTC
One thing worth noting is that current behaviour is strictly speaking POSIX compliant.
Comment 4 Yuri 2020-01-26 11:53:07 UTC
(In reply to wolf+busybox from comment #3)

Interesting. BSD awk also claims to be POSIX compliant:
>     The awk utility is compliant with the IEEE Std 1003.1-2008 (“POSIX.1”)
>     specification, except awk does not support {n,m} pattern matching.

But I would favor practicality over technical standard compliance.
Comment 5 wolf+busybox 2020-01-26 17:31:02 UTC
(In reply to Yuri from comment #4)

> Interesting. BSD awk also claims to be POSIX compliant:

POSIX (on awk) just states that when printf should print \000, the behaviour is
undefined. So both implementation's are compliant.

> But I would favor practicality over technical standard compliance.

Sure, why not. At least for printf it is very easy to resolve. sprintf would
require larger changes. And it could be confusing for printf to allow \000 and
sprintf to not allow it.

I'm curious what conclusion will busybox people reach regarding this one.
Comment 6 Yuri 2020-01-26 17:42:37 UTC
Does awk allow zeros in strings, and handle such strings properly?

STL's std::string for example does allow zeros in the middle of strings, and handle them properly. On the contrary, shells don't allow zeros in strings.

If awk generally doesn't handle zeros properly these two become separate issues. One is to print a zero to the output, the other one is to fix zero handling in strings. IMO, just fix them one by one.