Bug 14336

Summary: busybox sed differs from GNU sed with respect to NUL (0x00)
Product: Busybox Reporter: Christoph Anton Mitterer <calestyo>
Component: OtherAssignee: unassigned
Status: NEW ---    
Severity: normal CC: busybox-cvs
Priority: P5    
Version: 1.30.x   
Target Milestone: ---   
Hardware: All   
OS: All   
Host: Target:
Build:

Description Christoph Anton Mitterer 2021-11-07 23:44:35 UTC
Hey.

Not sure whether this is a "bug" or just something not defined by POSIX (I'm not really sure whether POSIX says anything with respect to sed and NUL),... at least it doesn't seem to be a configure option this time.

I've noted a differing behaviour between busybox' sed and GNU sed with respect to 0x00:

It seems that GNU sed, leaves any 0x00 (as well as other "binary" characters) in the current line and respects it when matching.
busybox' sed OTOH, doesn't do this but seems to terminate the string upon such 0x00.


Example Files:
$ hd test-with-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 0a 62 61 7a  |foo.bar.zer..baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-lone-0x00
00000000  66 6f 6f 0a 62 61 72 0a  00 0a 62 61 7a 0a 7a 65  |foo.bar...baz.ze|
00000010  72 00 0a 65 6e 64 0a                              |r..end.|
00000017
$ hd test-with-0x02-and-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 02 00 0a 62 61 7a  |foo.bar.ze...baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-0x00-followed-by-alpha
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 6f 6f 0a 62  |foo.bar.zer.oo.b|
00000010  61 7a 0a 7a 65 72 00 74  74 0a 65 6e 64 0a        |az.zer.tt.end.|
0000001e


GNU sed:
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00 | hd
00000000  7a 65 72 00 0a                                    |zer..|
00000005
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-lone-0x00 | hd
00000000  00 0a                                             |..|
00000002
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x02-and-0x00 | hd
00000000  7a 65 02 00 0a                                    |ze...|
00000005
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00-followed-by-alpha | hd
00000000  7a 65 72 00 6f 6f 0a                              |zer.oo.|
00000007

(Note that GNU sed's -z option is NOT used.)


busybox' sed:
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00 | hd
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-lone-0x00 | hd
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x02-and-0x00 | hd
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00-followed-by-alpha | hd
$


So it seems that busybox' sed simply does the matching till the 0x00 (which is perhaps used as string terminator), while GNU sed goes fully down the end of line (\n).


Though it's worth to bring this to your attention.


Cheers,
Chris.