Bug 14336 - busybox sed differs from GNU sed with respect to NUL (0x00)
Summary: busybox sed differs from GNU sed with respect to NUL (0x00)
Status: NEW
Alias: None
Product: Busybox
Classification: Unclassified
Component: Other (show other bugs)
Version: 1.30.x
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-07 23:44 UTC by Christoph Anton Mitterer
Modified: 2021-11-07 23:44 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Anton Mitterer 2021-11-07 23:44:35 UTC
Hey.

Not sure whether this is a "bug" or just something not defined by POSIX (I'm not really sure whether POSIX says anything with respect to sed and NUL),... at least it doesn't seem to be a configure option this time.

I've noted a differing behaviour between busybox' sed and GNU sed with respect to 0x00:

It seems that GNU sed, leaves any 0x00 (as well as other "binary" characters) in the current line and respects it when matching.
busybox' sed OTOH, doesn't do this but seems to terminate the string upon such 0x00.


Example Files:
$ hd test-with-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 0a 62 61 7a  |foo.bar.zer..baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-lone-0x00
00000000  66 6f 6f 0a 62 61 72 0a  00 0a 62 61 7a 0a 7a 65  |foo.bar...baz.ze|
00000010  72 00 0a 65 6e 64 0a                              |r..end.|
00000017
$ hd test-with-0x02-and-0x00
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 02 00 0a 62 61 7a  |foo.bar.ze...baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a
$ hd test-with-0x00-followed-by-alpha
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 6f 6f 0a 62  |foo.bar.zer.oo.b|
00000010  61 7a 0a 7a 65 72 00 74  74 0a 65 6e 64 0a        |az.zer.tt.end.|
0000001e


GNU sed:
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00 | hd
00000000  7a 65 72 00 0a                                    |zer..|
00000005
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-lone-0x00 | hd
00000000  00 0a                                             |..|
00000002
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x02-and-0x00 | hd
00000000  7a 65 02 00 0a                                    |ze...|
00000005
$ sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00-followed-by-alpha | hd
00000000  7a 65 72 00 6f 6f 0a                              |zer.oo.|
00000007

(Note that GNU sed's -z option is NOT used.)


busybox' sed:
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00 | hd
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-lone-0x00 | hd
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x02-and-0x00 | hd
$ busybox sed -n '0,/[^[:alnum:][:space:][:punct:]]/{/[^[:alnum:][:space:][:punct:]]/p}' test-with-0x00-followed-by-alpha | hd
$


So it seems that busybox' sed simply does the matching till the 0x00 (which is perhaps used as string terminator), while GNU sed goes fully down the end of line (\n).


Though it's worth to bring this to your attention.


Cheers,
Chris.