Created attachment 6246 [details] Examples of corrupted output of ${#var} Given a variable $var, the expression ${#var} is supposed to expand to the length in characters of that variable's contents. As of busybox 1.24.x ash, the expansion of ${#var} is corrupted if $var contains any UTF-8 characters. For each UTF-8 character, one byte of the variable's content is prepended to the number. Bug confirmed in busybox-1.24.1 ash and busybox-current ash. The bug was *not* present in busybox-1.23.2 ash and prior.
Created attachment 6261 [details] Shell script to show bug
Problem exists in sh (with or without bash compatibility) but not in ash
Ignore previous comment. Problem exists in ash (with or without bash compatibility) but not in hush
I have a possible fix but need someone who knows the code to check for any side effects. In file ash.c, line 6703 if (discard) STADJUST(-len, expdest); At this point len is the unicode string length and STADJUST is removing the $var string from the destination. The length of the removal should be the number of bytes in the string not the unicode string length. Changing line 6704 to STADJUST(-strlen (p), expdest); Makes the tests run correctly. I am fairly sure of my analisys but am unsure if this could cause problems in other situations.
An alternative solution, which I think is more robust is as follows: Define a new char * variable named var_start in function varvalue initialising it to the value of expdest : At line 6583 char *var_start = expdest; Then at the end of varvalue, reset expdest to var_start if discard is true : At line 6705 : if (discard) expdest = var_start;
Created attachment 6266 [details] Fix using var_start variable proposed by Hans PUFAL
This regression was introduced by commit d68d1fb, part of a series I submitted in May last year to fix another issue. The proposed patch using the var_start variable doesn't work. Between the time the value of expdest is recorded at the start of the function and used at end, the stack block to which it refers may have been reallocated. Restoring the incorrect value causes the shell to crash. I was able to make this happen with a variable containing ~600 characters. Like Hans I'm also unsure about the other proposed solution. The code in question is rather convoluted: as well as ${#var} it also affects ${var+XXX} and it's not clear to me that it's safe. My suggestion is to discard the unwanted string in the conditional Unicode section while the original value of len is still available: #if ENABLE_UNICODE_SUPPORT if (subtype == VSLENGTH && len > 0) { reinit_unicode_for_ash(); if (unicode_status == UNICODE_ON) { STADJUST(-len, expdest); discard = 0; len = unicode_strlen(p); } } #endif This limits the effect of the change to the BusyBox-specific Unicode section and maintains greater compatibility with dash (which is where I got the ideas for my patch series).
pushed here: https://git.busybox.net/busybox/commit/?id=3e3bfb896e0dd8a54caad9c6264e2452566b4012