| Summary: | awk: seems to try to falsely expand strings | ||
|---|---|---|---|
| Product: | Busybox | Reporter: | Steffen Nurpmeso <steffen> |
| Component: | Other | Assignee: | unassigned |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | aldot, busybox-cvs |
| Priority: | P2 | ||
| Version: | 1.31.x | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Host: | Target: | ||
| Build: | |||
| Attachments: | Test script for reproduction | ||
I can workaround the bug if i use
if(ia[1] < 0){
++unavail
ia[1] = "(su__ERR_NUMBER_MAX - " unavail ")"
}
instead of
if(ia[1] < 0)
ia[1] = "(su__ERR_NUMBER_MAX - " ++unavail ")"
Can you please provide a small, self-contained reproducer? Also, i think that you should put parentheses around the ++unavail in your original code since there is no defined precenence around string concatenation. The nawk bug is already fixed, heh. Luckily i saved away the script, another version did not reveal the crash there. The final version is also entirely different. I mean hey, i came to bed at half past five in the morning, ok.
But yes i indeed can. Here mawk, mawk-debian, nawk, gawk:
#?0|kent:tmp$ </dev/null awk -v i=1 'BEGIN {print "hey fish " ++i "."}'
hey fish 2.
Compared to
#?0|sdaoden:steffen$ </dev/null awk -v i=1 'BEGIN {print "hey fish " ++i "."}'
01.
Any why has it to be minimal, i never found a test series for busybox (i looked out for one years ago)?
Just asking..
I have asked Aharon Robbins for that "++i should be (++i)" of yours.
He kindly proved me wrong regarding my boolean use of getline() not getline()>0 already, i had to fix that in several places. If he responds to support my prefix/postfix in/decrement has highest priority .. i'll be back :-)
A nice weekend everybody i (who hates web interfaces like this, i wished i could simply respond via email) wish from Germany,
Ciao,
Actually does not seem to work out. I paste it here. Thanks. --- Oh hello i see and try replying! Good evening. Bernhard Reutner-Fischer wrote in <20200606021709.4f5cd8bf@nbbrfq.cc.univie.ac.at>: |On Fri, 05 Jun 2020 19:50:07 +0000 |bugzilla@busybox.net wrote: |> https://bugs.busybox.net/show_bug.cgi?id=12981 ... |> But yes i indeed can. Here mawk, mawk-debian, nawk, gawk: And SunOS 5.9 /usr/bin/xpg4. |That's great, thanks alot for this simplification! Hm. |> #?0|kent:tmp$ </dev/null awk -v i=1 'BEGIN {print "hey fish " ++i "."}' |> hey fish 2. ... |> Any why has it to be minimal, i never found a test series for busybox \ |> (i looked | |I didn't say minimal. I said "small, self-contained". | |In your bug-report, you referenced a |IN="${SRCDIR}"su/gen-errors.h |which you did not attach. Although i could probably construct an input It worked like shown. I gave the necessary command line. Sorry. But please, at least i have stripped 251 lines of perl code before attaching the file. (vim fold.) |that exposes the alleged misbehaviour (or maybe i would not be able to |if it was another kind of bug as we're dealing with here) i would have |to guess. You did not provide a readily testable reproducer for us to |debug. We'd have to guess or suspect the problem to be obscured by an That is not true really. |overly complex, incomplete testcase. This is very discouraging a |situation if you just have a couple of minute to read through a report |while, let's say, you are in the bus on your way home after eight or |thirteen hours of work. Yes, it is good now. |> out for one years ago)? |> Just asking.. | |We indeed have a testsuite. See testsuite/ |as well as |$ make help |$ make check |and, specifically |testsuite/awk.tests | |We've had a testsuite since a long time now. An internal one. We I must have confused something. Indeed testsuite came in Y2K1. |usually test some prominent cases with external testsuites. We used to |(and still some of us do) build our setups with busybox alone so the |most visible stuff supposedly works fine. In order not to regress we |strive to flesh out the internal testsuite. Except for network related |stuff which would usually be rather complicated to regtest in our |current testsuite due to it's simplicity. | |> |> I have asked Aharon Robbins for that "++i should be (++i)" of yours. | |Name dropping usually doesn't work for me for i know nothing, but sure, |if he knows awk land i'm all ears -- whoever that may be.. Please |excuse my ignorance. You know i wondered whether i should response at all because of that pissing stuff. But i did. |> He kindly proved me wrong regarding my boolean use of getline() not \ |> getline()>0 |> already, i had to fix that in several places. If he responds to \ |> support my |> prefix/postfix in/decrement has highest priority .. i'll be back :-) | |looking forward to that. |> |> A nice weekend everybody i (who hates web interfaces like this, i \ |> wished i |> could simply respond via email) wish from Germany, | |web interfaces are great to not forget stuff, but sure, |let's just take this to the list for a wider audience and easier |interaction for both of us. These are in fact almost his words in our thread on the nawk bug, because i do not have a github account and asked him whether private mail would be ok. Does not seem i get a response. |Folks, recap. |So, in this bug-reoprt what we have is: |- there is no awk standard (that i'm aware of at least; SuS is | silent) |- awk has no (well) defined string-concatination precedence |- some (well) documented awk implementations explicitly warn | script-authors to rely on precedence around string concatenation, | like e.g.: | https://www.gnu.org/software/gawk/manual/html_node/Concatenation.html I have never read anything like that in any manual page of all the awks i have or have seen. (On CRUX-Linux there is no info system, no info files.) Many warn due to problematic number / string as in There are no explicit conversions between numbers and strings. To force an expression to be treated as a number, add 0 to it. To force an expression to be treated as a string, concatenate the null string ("") to it. SunOS 5.9 Last change: 7 Jul 2000 7 Over there there is /usr/bin/awk, which only works when passed a script file via -f (where -f - works too), and this one really behaves like the awk you are protecting from moving forward. |---8<--- |Because string concatenation does not have an explicit operator, it \ |is often necessary to ensure that it happens at the right time by using \ |parentheses to enclose the items to concatenate. |---8<--- |- new users of awk usually expect a well resp. clearly defined language | but IMHO we do not face the luxory to deal with that in this context. | Rule of least surprise for the user applies, as usual. Undefined | corners of underspecified tools/behaviour is usually clarified | centrally (by a standard of some sorts to back up claims or document | consensus umong implementors). No such luxory here AFAIK. |- If you write your scripts defensively (i.e. know your stuff, to | phrase it politely) all is well even with current busybox awk (see | below) I think i will in the future not respond to messages coming from you, really. Again, i have never ever read anything about that behaviour in any of the awks i have seen. mawk and debian-mawk (manual date Dec 22 1994) both say Functions can be referenced before they are defined, but the function name and the '(' of the arguments must touch to avoid confusion with concatenation. Hey. I would suspect the rule of least surprise would in a "modern" environment support moving busybox awk to where the others are. |Bad: |$ ./busybox awk -v i=1 'BEGIN {print "fourty " ++i "."}' |01. | |Expected: |$ ./busybox awk -v i=1 'BEGIN {print "fourty " (++i) "."}' |fourty 2. | |Proposed fixes? |Follow up further improvements for awk? Equally robustness improvements \ |to deal with possible similar cases you might make up while testing \ |or debugging stuff? Or maybe you spot some areas that are written way \ |too elaborate and you can save some bytes after having fixed this unspec\ |ified but seemingly widely supported precedence convention? | |> Ciao, | |cheers and have phun, That surely goes in the wrong direction man. :) Ciao and a nice Sunday i wish., --End of <20200606021709.4f5cd8bf@nbbrfq.cc.univie.ac.at> Hello again. Just to add, i got response regarding this awk behaviour from Aharon (Arnold) Robbins (the maintainer of GNU awk and now also nawk), and he says > Also, i think that you should put parentheses around the > ++unavail in your original code since there is no defined > precenence around string concatenation. > > This is for the snippet > > ia[1] = "(su__ERR_NUMBER_MAX - " ++unavail ")" I think he's wrong. The ++unavail would have to be done before converting the result to string for the concatentation. My opinion, of course. Ciao from Germany (what a pity this cannot be driven with email responses, urgh) (In reply to Steffen Nurpmeso from comment #5) > > Also, i think that you should put parentheses around the > > ++unavail in your original code since there is no defined > > precenence around string concatenation. > > > > This is for the snippet > > > > ia[1] = "(su__ERR_NUMBER_MAX - " ++unavail ")" > > I think he's wrong. The ++unavail would have to be done before > converting the result to string for the concatentation. yes, which you can enforce explicitly by adding parenthesis. I looked a bit into our awk implementation yesterday but i'm not yet sure how to fix it properly. We indeed end up calling setvar_i on the lefthand string to set it's number to 0, which clears the string-value of the node. This is of course not what we want, and is the cause of the "01" output. Hi :). One more addition, he responded once again. I say Ciao already here. > Thanks for responding. I copied the above to the busybox bugzilla > thread of this, i hope this is acceptable for you. He (Bernhard > Reutner-Fischer) referred to the gawk info manual even (it is not > in the manual page): > > - some (well) documented awk implementations explicitly warn > script-authors to rely on precedence around string concatenation, > like e.g.: > https://www.gnu.org/software/gawk/manual/html_node/Concatenation.html > ---8<--- > Because string concatenation does not have an explicit operator, > it is often necessary to ensure that it happens at the right > time by using parentheses to enclose the items to concatenate. This is mainly for things like foo bar | getline Is this (foo bar) | getline or foo (bar | getline) ? I stand by my opinion that what you observed in Busybox awk is a bug. "str" ++ i is interpreted as postfix increment of "str". Oops... Fixed in git, thanks! Reopen if not fixed completely, provide examples. |
Created attachment 8491 [details] Test script for reproduction Hello! I have encountered a busybox 1.31.1 awk (AlpineLinux edge) problem that cannot be seen with mawk, mawk-debian nor gawk. nawk segfaults for the script so i cannot say :-) Please find attached a reduced script which must be run like # awk="/bin/busybox awk" CC=gcc TARGET=1 sh ./su-make-errors.sh Sorry for $CC, it can be any ISO C89 compiler i would say. The difference (compared to the other awks running on CRUX-Linux) will be --- .x.gawk 2020-06-04 23:24:13.000905415 +0200 +++ .x.busybox-awk 2020-06-04 23:23:51.210905829 +0200 @@ -2,8 +2,8 @@ #define su__ERR_NUMBER_MAX 0xFFu #define su__ERR_NUMBER_ENUM_C \ su_ERR_NONE = 0,\ - su_ERR_AGA = (su__ERR_NUMBER_MAX - 1),\ - su_ERR_AGA2 = (su__ERR_NUMBER_MAX - 2),\ + su_ERR_AGA = 00),\ + su_ERR_AGA2 = 10),\ These strings come from code which says while(getline dl < dat){ ++cnt split(dl, ia) if(ia[1] < 0) ia[1] = "(su__ERR_NUMBER_MAX - " ++unavail ")" map[ia[2]] = ia[1] print verb "su_ERR_" ia[2] " = " ia[1] ",\134" } close(dat) Ciao!