Bug 12531

Summary: awk: backslashes not parsed in EREs passed from variables or "string literals"
Product: Busybox Reporter: Martijn Dekker <martijn>
Component: Standard ComplianceAssignee: unassigned
Status: NEW ---    
Severity: normal CC: busybox-cvs
Priority: P5    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: All   
Host: Target:
Build:
Attachments: my .config, as required

Description Martijn Dekker 2020-02-03 21:12:15 UTC
Created attachment 8356 [details]
my .config, as required

In busybox awk, match(), sub() and gsub() don't parse C-style backslash-escaped special characters in EREs passed from variables or "string literals" (as opposed to /ERE literals/, for which busybox awk behaves correctly).

Below are a couple of test cases. (Note: double quotes remove one level of backslash escaping; the ERE parsing in match(), sub(), gsub() should be removing another)


$ echo $'abc\tdef' | awk '{ ere="\\t"; gsub(ere, "TAB"); print; }'
abc     def

Expected output (as on onetrueawk, gawk, mawk, Solaris awk):
abcTABdef


$ awk 'BEGIN { print !match("\n", "^\\n$"); }'
1

Expected output (as on onetrueawk, gawk, mawk, Solaris awk):
0

Reference to standard:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_04

Regular Expressions: ..."The awk utility shall make use of the extended regular expression notation (see XBD Extended Regular Expressions) except that it shall allow the use of C-language conventions for escaping special characters within the EREs, as specified in the table in XBD File Format Notation ( '\\', '\a', '\b', '\f' , '\n', '\r', '\t', '\v' ) and the following table; these escape sequences shall be recognized both inside and outside bracket expressions."...

RATIONALE: ..."Historical implementations of awk have long supported <backslash>-escape sequences as an extension to extended regular expressions, and this extension has been retained despite inconsistency with other utilities."...