| Summary: | sed and awk mishandle \b \< \B | ||
|---|---|---|---|
| Product: | Busybox | Reporter: | dubiousjim |
| Component: | Standard Compliance | Assignee: | unassigned |
| Status: | NEW --- | ||
| Severity: | minor | CC: | busybox-cvs |
| Priority: | P5 | ||
| Version: | 1.19.x | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Host: | Target: | ||
| Build: | |||
looks like it does affect busybox (e)grep too, but I agree that the error seems to be inside the regex library itself:
phil@geespaz:busybox$ echo 'azz bz c d' | ./busybox egrep -o '\b[a-z]'
a
z
z
b
z
c
d
phil@geespaz:busybox$ echo 'azz bz c d' | egrep -o '\b[a-z]'
a
b
c
d
My regex library is:
phil@geespaz:busybox$ nm ./busybox_unstripped | grep regexec
U regexec@@GLIBC_2.3.4
|
BusyBox 1.19.3, built against uClibc 0.9.32, on i686 Linux Since this affects both sed and awk, perhaps it's an issue with uClibc. However, it does not affect BusyBox egrep. $ printf 'abcd efgh\n' | sed -n 's/\b[a-z]/<&>/pg' Expected result: <a>bcd <e>fgh Actual result: <a><b><c><d> <e><f><g><h> $ printf 'abcd efgh\n' | sed -n 's/\<[a-z]/<&>/pg' Expected result: <a>bcd <e>fgh Actual result: <a><b><c><d> <e><f><g><h> $ printf 'abcd efgh\n' | sed -n 's/\B[a-z]/<&>/pg' Expected result: a<b><c><d> e<f><g><h> Actual result: a<b>c<d> e<f>g<h> # misses the c and g $ printf 'abcd efgh\n' | awk '{gsub(/\<[a-z]/,"<&>"); print $0}' Expected result: <a>bcd <e>fgh Actual result: <a><b><c><d> <e><f><g><h> $ printf 'abcd efgh\n' | awk '{gsub(/\B[a-z]/,"<&>"); print $0}' Expected result: a<b><c><d> e<f><g><h> Actual result: a<b>c<d> e<f>g<h> # misses the c and g The end-of-word elements all give the expected results: $ printf 'abcd efgh\n' | sed -n 's/[a-z]\b/<&>/pg' abc<d> efg<h> $ printf 'abcd efgh\n' | sed -n 's/[a-z]\>/<&>/pg' abc<d> efg<h> $ printf 'abcd efgh\n' | sed -n 's/[a-z]\B/<&>/pg' <a><b><c>d <e><f><g>h $ printf 'abcd efgh\n' | awk '{gsub(/[a-z]\>/,"<&>"); print $0}' abc<d> efg<h> $ printf 'abcd efgh\n' | awk '{gsub(/[a-z]\B/,"<&>"); print $0}' <a><b><c>d <e><f><g>h