| Summary: | sed: s-command with "semi-special" delimiters get wrong behaviour | ||
|---|---|---|---|
| Product: | Busybox | Reporter: | Christoph Anton Mitterer <calestyo> |
| Component: | Standard Compliance | Assignee: | unassigned |
| Status: | RESOLVED FIXED | ||
| Severity: | major | CC: | busybox-cvs |
| Priority: | P5 | ||
| Version: | 1.30.x | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | All | ||
| Host: | Target: | ||
| Build: | |||
|
Description
Christoph Anton Mitterer
2022-01-21 16:06:09 UTC
Fixed in git. Just for confirmation: - Since the tests are only about s-commands, does your commit also fix the context addresses? - The test you added for \& handling was just about adding the missing test, right? The behaviour itself already worked before?! - I assume these functions were just for BRE, right?! - Does that commit also "fix" (well POSIX is ambiguous, IMO, so its actually more a "align with GNU sed") the following two: a) \1 in replacement: Consider: s1\(x\)1\11 which, depending on what "literal" means, could be either effectively: s/\(x\)/\1/ (BusyBox sed seems to do this: $ printf '%s\n' 'oxo' | busybox sed 's1\(x\)1\11' oxo $ printf '%s\n' 'owo' | busybox sed 's1\(x\)1\11' owo ) or: s/\(x\)/1/ (GNU sed seems to do this: $ printf '%s\n' 'oxo' | sed 's1\(x\)1\11' o1o $ printf '%s\n' 'owo' | sed 's1\(x\)1\11' owo ) b) \1 in BRE (but not ERE, where \1 isn't defined for the RE part) Consider: s1\(xo\)\11X1 which, depending on what "literal" means, could be either effectively: s/\(xo\)\1/X/ (BusyBox sed seems to do this: $ printf '%s\n' 'xoxo' | busybox sed 's1\(xo\)\11X1' X $ printf '%s\n' 'xo1' | busybox sed 's1\(xo\)\11X1' xo1 ) or: s/\(xo\)1/X/ (GNU sed seems to do this: $ printf '%s\n' 'xoxo' | sed 's1\(xo\)\11X1' xoxo $ printf '%s\n' 'xo1' | sed 's1\(xo\)\11X1' X ) I just went through my list of test cases in https://www.austingroupbugs.net/view.php?id=1551#c5612 : It seems busybox now (with the patch) behaves as GNU, except or one case: GNU: $ printf '%s\n' 'oxo' | sed 's1\(x\)1\11' o1o $ printf '%s\n' 'owo' | sed 's1\(x\)1\11' owo BusyBox with patch: $ printf '%s\n' 'oxo' | ./busybox sed 's1\(x\)1\11' oxo $ printf '%s\n' 'owo' | ./busybox sed 's1\(x\)1\11' owo So in the replacement, the bug doesn't seem to be fixed, yet. Same case,.. the \1 is first "un-delimitered" but then still counted as \1, though the \ should have already been removed because of the "un-delimitering". Thus reopneing. I hadn't seen the 2nd commit, f12fb1e4092900f26f7f8c71cde44b1cd7d26439, when testing. That also fixes the case from comment #3. Now, BusyBox sed seems to behave identically to GNU sed in all the cases I had given in: https://www.austingroupbugs.net/view.php?id=1551#c5612 Especially, it also seems to consider "un-delimitered" delimiters that are also special characters as "still special" (or at least I tried that with '.') - which is, while IMO not clearly defined by POSIX, identical to the behaviour of GNU sed, see https://www.austingroupbugs.net/view.php?id=1551#c5648 for test cases.) Thus closing again. Thanks. |