Bug 5084 - Small differences between Gnu and BusyBox sed
Summary: Small differences between Gnu and BusyBox sed
Status: RESOLVED WORKSFORME
Alias: None
Product: Busybox
Classification: Unclassified
Component: Standard Compliance (show other bugs)
Version: 1.19.x
Hardware: PC Linux
: P5 minor
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-12 00:09 UTC by dubiousjim
Modified: 2016-04-24 14:51 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dubiousjim 2012-04-12 00:09:27 UTC
BusyBox 1.19.3, built against uClibc 0.9.32, on i686 Linux

(1) BusyBox sed without -r, as well as Gnu sed, rejects unmatched \( and unmatched \) both as invalid patterns. So too do their grep implementations without -E.

Similarly, gnu sed with -r rejects unmatched ( and unmatched ) both as
invalid patterns. And so too do both egrep implementations.

Like everyone else, BusyBox sed with -r rejects unmatched ( as invalid.
But it diverges in treating unmatched ) as literal.


(2) With:

$ printf 'abc\n' | sed -n 's/x*/=&=/gp'

Gnu sed will replace against four zero-length matches. BusyBox sed will only replace against the first.


(3) BusyBox sed accepts multiple adjacent quantifiers without error (for example, "a**"), with or without the -r flag. Gnu sed with with the -r flag behaves the same; without the -r flag it rejects such patterns as invalid. I'm pointing this difference out, in case anyone wants to know, but I'm not sure it's desirable to change BusyBox's existing behavior here.


(4) In Basic Regex settings (grep without -E, sed without -r), Gnu and BusyBox treat quantifiers (*, \?, \+, \{1\}) in bad positions (such as the start of a pattern) as literal expressions. (An exception is that Gnu sed rejects \{1\} in those positions as invalid. Gnu grep and BusyBox grep and sed treat it as literal.)
In Extended Regex settings (egrep, sed with -r), Gnu and BusyBox instead silently drop the quantifiers from the pattern. However the way they drop "{1}" is different: Gnu drops just the leading "{", leaving behind a literal "1}". BusyBox drops the whole phrase "{1}".
Here, too, I'm pointing this difference out, in case anyone wants to know, but I'm not sure it's desirable to change BusyBox's existing behavior.
Comment 1 Denys Vlasenko 2016-04-24 14:51:36 UTC
(In reply to dubiousjim from comment #0)
> $ printf 'abc\n' | sed -n 's/x*/=&=/gp'
> Gnu sed will replace against four zero-length matches. BusyBox sed will only replace against the first.

Works for me in recent versions of bbox

(3)/(4) appear to be dependent on regex implementation in libc. For example:

system sed:

$ printf 'abc\n' | sed -n 's/x**/=&=/gp'
sed: -e expression #1, char 12: Invalid preceding regular expression

bbox sed, uclibc:

$ printf 'abc\n' | ./busybox sed -n 's/x**/=&=/gp'
==a==b==c==

bbox sed, glibc:

$ printf 'abc\n' | ./busybox sed -n 's/x**/=&=/gp'
sed: bad regex 'x**': Invalid preceding regular expression