Bug 585 - redirect fails to handle certain UTF-8 characters
Summary: redirect fails to handle certain UTF-8 characters
Status: RESOLVED FIXED
Alias: None
Product: Busybox
Classification: Unclassified
Component: Standard Compliance (show other bugs)
Version: 1.14.x
Hardware: Other Linux
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL: http://code.google.com/p/wl500g/
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-29 07:58 UTC by ecaddict
Modified: 2009-10-08 00:11 UTC (History)
1 user (show)

See Also:
Host:
Target: Oleg custom firmware for Asus WL-500 router
Build: busybox 1.14.3


Attachments
testcase (65 bytes, text/plain)
2009-08-29 08:35 UTC, Leonid
Details
busybox build config (1.54 KB, text/plain)
2009-08-29 08:38 UTC, Leonid
Details
busybox build config (24.26 KB, application/x-config)
2009-08-29 14:29 UTC, Leonid
Details
Fix (499 bytes, patch)
2009-08-29 18:23 UTC, Denys Vlasenko
Details
Fix on top on 1.15.1 (1.05 KB, patch)
2009-09-16 14:27 UTC, Denys Vlasenko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ecaddict 2009-08-29 07:58:55 UTC
When via SSH the following command is given:
echo "test" > /mnt/testÁ.txt && /opt/bin/find -L /mnt -maxdepth 1 -ls | grep /test

It produces the following output:
54    4 -rw-r--r--   1 admin    root            5 Aug 28 18:26 /mnt/test\303.txt

That is in the file name after the "test" there is the UTF-8 character with code 193 (equals 0xC1, that can be encoded in 2 bytes as 0xC3 0x81). However the created file contains only the first byte of the UTF-8 encoding.
When checked via NFS share, the file name shows also incorrectly. Only some of the UTF-8 characters fails, most of them works OK (so this could be some small bug not a full gap of functionality).
As far as I could judge, characters that can be encoded in 3 bytes are impacted as well (I did not try outside of BMP).

Files containing the same character can be correctly created with touch, so continuing in the command line:

touch /mnt/test2Á.txt && /opt/bin/find -L /mnt -maxdepth 1 -ls | grep /test

gives:
    54    4 -rw-r--r--   1 admin    root            5 Aug 28 18:26 /mnt/test\303.txt
    55    0 -rw-r--r--   1 admin    root            0 Aug 28 18:29 /mnt/test2\303\201.txt

In this case the test2 file shows the correct octal escape sequence.

What makes this bug particularly nasty is that it creates hard to delete (at least from command line) files.

I've use the following reference for UTF-8 values:
http://doc.infosnel.nl/extreme_utf-8.html

and this for general reference:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Comment 1 Leonid 2009-08-29 08:35:30 UTC
Created attachment 615 [details]
testcase
Comment 2 Leonid 2009-08-29 08:38:00 UTC
Created attachment 617 [details]
busybox build config

I can add simple testcase - see test-UTF8.sh attached. First file created with error, but name of second file is correct.

LANG is unset, busybox build config also attached.
Comment 3 Leonid 2009-08-29 09:10:09 UTC
Just recompiled busybox with attached config for x86 & GNU libc 2.9, problem don't appear. Seems to be the real problem is in uClibc 0.9.29, not in busybox itself.

Will try to perform additional test, but perhaps you can point me to the right way.
Comment 4 ecaddict 2009-08-29 09:42:25 UTC
(In reply to comment #3)
> Just recompiled busybox with attached config for x86 & GNU libc 2.9, problem
> don't appear. Seems to be the real problem is in uClibc 0.9.29, not in busybox
> itself.
> 
> Will try to perform additional test, but perhaps you can point me to the right
> way.
> 

Yes, it maybe (actually likely to be) uClibc as it has wchar in several parts and wl500g custom firmware history of change indeed points to the uClibc version that you refer.
http://code.google.com/p/wl500g/wiki/NEWS

Sorry but I cannot be more specific as I'm not familiar with source code of the libraries (if that's what you mean).
Comment 5 Denys Vlasenko 2009-08-29 14:02:05 UTC
(In reply to comment #2)
> Created an attachment (id=617) [details]
> busybox build config
> 
> I can add simple testcase - see test-UTF8.sh attached. First file created with
> error, but name of second file is correct.
> 
> LANG is unset, busybox build config also attached.

This is not a busybox .config:

CONFIG_RC=y
CONFIG_NVRAM=y
CONFIG_SHARED=y
# CONFIG_LIBBCM is not set
CONFIG_BUSYBOX=y
CONFIG_BUSYBOX_CONFIG=defconfig
CONFIG_WLCONF=y
CONFIG_BRIDGE=y
# CONFIG_VLAN is not set
CONFIG_HTTPD=y
CONFIG_WWW=y
CONFIG_NAT=y
CONFIG_NETCONF=y
CONFIG_IPTABLES=y
...

It seems to be a config of some bigger system which includes busybox as a component.
Comment 6 Denys Vlasenko 2009-08-29 14:11:24 UTC
Reproduced on 1.15.0. hush is not affected, only ash.
Comment 7 Leonid 2009-08-29 14:29:43 UTC
Created attachment 619 [details]
busybox build config

Sorry, wrong config was attached
Comment 8 Denys Vlasenko 2009-08-29 18:23:33 UTC
Created attachment 621 [details]
Fix

Please try attached patch
Comment 9 Leonid 2009-08-29 19:57:59 UTC
Patch fixes problem for me. Hope bug creator also will confirm success.
Comment 10 ecaddict 2009-08-30 13:23:05 UTC
(In reply to comment #9)
> Patch fixes problem for me. Hope bug creator also will confirm success.
> 

Yes, fix works.

echo "test" > /usr/local/testÁ.txt && ls -la /usr/local/test* 
-rw-r--r--    1 admin    root            5 Aug 30 14:43 /usr/local/testÁ.txt

Unfortunately other issues prevented me to extend my test coverage so I've made only the above test.
Comment 11 Denys Vlasenko 2009-09-12 14:00:22 UTC
Fixed in git, also will be in 1.15.1
Comment 12 ecaddict 2009-09-14 09:33:05 UTC
(In reply to comment #11)
> Fixed in git, also will be in 1.15.1
> 

Sorry for slow reaction, now I could continue with my tests and it seems that the fix only partially works, i.e. it does not work if the file name is taken from variable:


BN="testÁ.txt" && cd /mnt/cgi-bin && echo "test" > "$BN" && /opt/bin/find -L /mnt/cgi-bin/test* -maxdepth 1 -ls | grep /test

gives the following output:

9764904    4 -rw-r--r--   1 admin    root            5 Sep 14 11:29 /mnt/cgi-bin/test\303.txt
Comment 13 Leonid 2009-09-16 07:28:04 UTC
Like previous case, only ash is affected, hush works well.
Comment 14 Denys Vlasenko 2009-09-16 12:13:03 UTC
Reproduced and added to testsuite. Not fixed yet.
Comment 15 Denys Vlasenko 2009-09-16 14:27:47 UTC
Created attachment 653 [details]
Fix on top on 1.15.1

Please try attached patch.
Comment 16 ecaddict 2009-09-17 18:33:01 UTC
(In reply to comment #15)
> Created an attachment (id=653) [details]
> Fix on top on 1.15.1
> 
> Please try attached patch.
> 

Quick test indicates that fix works:

BN=testÁ.txt&&cd /usr/local/sbin && echo "test" > "$BN" && ls -la /usr/local/sbin/test*
-rw-r--r--    1 admin    root            5 Jan  1 01:03 /usr/local/sbin/testÁ.txt

Timestamp is OK as test was done on a router disconnected from network hence NTP could not set correct time.

Thank you for the quick correction.