Bug 8506 - ash: corrupted expansion of ${#var} if $var contains UTF-8 characters
Summary: ash: corrupted expansion of ${#var} if $var contains UTF-8 characters
Status: RESOLVED FIXED
Alias: None
Product: Busybox
Classification: Unclassified
Component: Other (show other bugs)
Version: 1.24.x
Hardware: All All
: P5 major
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-03 19:55 UTC by Martijn Dekker
Modified: 2016-03-22 22:20 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:


Attachments
Examples of corrupted output of ${#var} (273 bytes, text/plain)
2015-12-03 19:55 UTC, Martijn Dekker
Details
Shell script to show bug (384 bytes, text/plain)
2016-01-03 08:20 UTC, Hans PUFAL
Details
Fix using var_start variable proposed by Hans PUFAL (606 bytes, patch)
2016-01-03 13:42 UTC, Martijn Dekker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martijn Dekker 2015-12-03 19:55:37 UTC
Created attachment 6246 [details]
Examples of corrupted output of ${#var}

Given a variable $var, the expression ${#var} is supposed to expand to the length in characters of that variable's contents. As of busybox 1.24.x ash, the expansion of ${#var} is corrupted if $var contains any UTF-8 characters. For each UTF-8 character, one byte of the variable's content is prepended to the number.

Bug confirmed in busybox-1.24.1 ash and busybox-current ash. The bug was *not* present in busybox-1.23.2 ash and prior.
Comment 1 Hans PUFAL 2016-01-03 08:20:52 UTC
Created attachment 6261 [details]
Shell script to show bug
Comment 2 Hans PUFAL 2016-01-03 08:23:17 UTC
Problem exists in sh (with or without bash compatibility) but not in ash
Comment 3 Hans PUFAL 2016-01-03 08:30:07 UTC
Ignore previous comment.

Problem exists in ash (with or without bash compatibility) but not in hush
Comment 4 Hans PUFAL 2016-01-03 10:04:24 UTC
I have a possible fix but need someone who knows the code to check for any side effects.

In file ash.c, line 6703

	if (discard)
		STADJUST(-len, expdest);

At this point len is the unicode string length and STADJUST is removing the 
$var string from the destination. The length of the removal should be the number of bytes in the string not the unicode string length.

Changing line 6704 to 

		STADJUST(-strlen (p), expdest);

Makes the tests run correctly.

I am fairly sure of my analisys but am unsure if this could cause problems
in other situations.
Comment 5 Hans PUFAL 2016-01-03 10:57:01 UTC
An alternative solution, which I think is more robust is as follows:

Define a new char * variable named var_start in function varvalue initialising it to the value of expdest :

At line 6583

    char *var_start = expdest;


Then at the end of varvalue, reset expdest to var_start if discard is true :

At line 6705 :

	if (discard)
		expdest = var_start;
Comment 6 Martijn Dekker 2016-01-03 13:42:15 UTC
Created attachment 6266 [details]
Fix using var_start variable proposed by Hans PUFAL
Comment 7 Ron Yorston 2016-03-16 17:43:03 UTC
This regression was introduced by commit d68d1fb, part of a series I submitted in May last year to fix another issue.

The proposed patch using the var_start variable doesn't work.  Between the time the value of expdest is recorded at the start of the function and used at end, the stack block to which it refers may have been reallocated.  Restoring the incorrect value causes the shell to crash.  I was able to make this happen with a variable containing ~600 characters.

Like Hans I'm also unsure about the other proposed solution.  The code in question is rather convoluted:  as well as ${#var} it also affects ${var+XXX} and it's not clear to me that it's safe.

My suggestion is to discard the unwanted string in the conditional Unicode section while the original value of len is still available:

#if ENABLE_UNICODE_SUPPORT
        if (subtype == VSLENGTH && len > 0) {
            reinit_unicode_for_ash();
            if (unicode_status == UNICODE_ON) {
                STADJUST(-len, expdest);
                discard = 0;
                len = unicode_strlen(p);
            }
        }
#endif

This limits the effect of the change to the BusyBox-specific Unicode section and maintains greater compatibility with dash (which is where I got the ideas for my patch series).