| Summary: | redirect fails to handle certain UTF-8 characters | ||
|---|---|---|---|
| Product: | Busybox | Reporter: | ecaddict <ecaddict> |
| Component: | Standard Compliance | Assignee: | unassigned |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | busybox-cvs |
| Priority: | P5 | ||
| Version: | 1.14.x | ||
| Target Milestone: | --- | ||
| Hardware: | Other | ||
| OS: | Linux | ||
| URL: | http://code.google.com/p/wl500g/ | ||
| Host: | Target: | Oleg custom firmware for Asus WL-500 router | |
| Build: | busybox 1.14.3 | ||
| Attachments: |
testcase
busybox build config busybox build config Fix Fix on top on 1.15.1 |
||
Created attachment 615 [details]
testcase
Created attachment 617 [details]
busybox build config
I can add simple testcase - see test-UTF8.sh attached. First file created with error, but name of second file is correct.
LANG is unset, busybox build config also attached.
Just recompiled busybox with attached config for x86 & GNU libc 2.9, problem don't appear. Seems to be the real problem is in uClibc 0.9.29, not in busybox itself. Will try to perform additional test, but perhaps you can point me to the right way. (In reply to comment #3) > Just recompiled busybox with attached config for x86 & GNU libc 2.9, problem > don't appear. Seems to be the real problem is in uClibc 0.9.29, not in busybox > itself. > > Will try to perform additional test, but perhaps you can point me to the right > way. > Yes, it maybe (actually likely to be) uClibc as it has wchar in several parts and wl500g custom firmware history of change indeed points to the uClibc version that you refer. http://code.google.com/p/wl500g/wiki/NEWS Sorry but I cannot be more specific as I'm not familiar with source code of the libraries (if that's what you mean). (In reply to comment #2) > Created an attachment (id=617) [details] > busybox build config > > I can add simple testcase - see test-UTF8.sh attached. First file created with > error, but name of second file is correct. > > LANG is unset, busybox build config also attached. This is not a busybox .config: CONFIG_RC=y CONFIG_NVRAM=y CONFIG_SHARED=y # CONFIG_LIBBCM is not set CONFIG_BUSYBOX=y CONFIG_BUSYBOX_CONFIG=defconfig CONFIG_WLCONF=y CONFIG_BRIDGE=y # CONFIG_VLAN is not set CONFIG_HTTPD=y CONFIG_WWW=y CONFIG_NAT=y CONFIG_NETCONF=y CONFIG_IPTABLES=y ... It seems to be a config of some bigger system which includes busybox as a component. Reproduced on 1.15.0. hush is not affected, only ash. Created attachment 619 [details]
busybox build config
Sorry, wrong config was attached
Created attachment 621 [details]
Fix
Please try attached patch
Patch fixes problem for me. Hope bug creator also will confirm success. (In reply to comment #9) > Patch fixes problem for me. Hope bug creator also will confirm success. > Yes, fix works. echo "test" > /usr/local/testÁ.txt && ls -la /usr/local/test* -rw-r--r-- 1 admin root 5 Aug 30 14:43 /usr/local/testÁ.txt Unfortunately other issues prevented me to extend my test coverage so I've made only the above test. Fixed in git, also will be in 1.15.1 (In reply to comment #11) > Fixed in git, also will be in 1.15.1 > Sorry for slow reaction, now I could continue with my tests and it seems that the fix only partially works, i.e. it does not work if the file name is taken from variable: BN="testÁ.txt" && cd /mnt/cgi-bin && echo "test" > "$BN" && /opt/bin/find -L /mnt/cgi-bin/test* -maxdepth 1 -ls | grep /test gives the following output: 9764904 4 -rw-r--r-- 1 admin root 5 Sep 14 11:29 /mnt/cgi-bin/test\303.txt Like previous case, only ash is affected, hush works well. Reproduced and added to testsuite. Not fixed yet. Created attachment 653 [details]
Fix on top on 1.15.1
Please try attached patch.
(In reply to comment #15) > Created an attachment (id=653) [details] > Fix on top on 1.15.1 > > Please try attached patch. > Quick test indicates that fix works: BN=testÁ.txt&&cd /usr/local/sbin && echo "test" > "$BN" && ls -la /usr/local/sbin/test* -rw-r--r-- 1 admin root 5 Jan 1 01:03 /usr/local/sbin/testÁ.txt Timestamp is OK as test was done on a router disconnected from network hence NTP could not set correct time. Thank you for the quick correction. |
When via SSH the following command is given: echo "test" > /mnt/testÁ.txt && /opt/bin/find -L /mnt -maxdepth 1 -ls | grep /test It produces the following output: 54 4 -rw-r--r-- 1 admin root 5 Aug 28 18:26 /mnt/test\303.txt That is in the file name after the "test" there is the UTF-8 character with code 193 (equals 0xC1, that can be encoded in 2 bytes as 0xC3 0x81). However the created file contains only the first byte of the UTF-8 encoding. When checked via NFS share, the file name shows also incorrectly. Only some of the UTF-8 characters fails, most of them works OK (so this could be some small bug not a full gap of functionality). As far as I could judge, characters that can be encoded in 3 bytes are impacted as well (I did not try outside of BMP). Files containing the same character can be correctly created with touch, so continuing in the command line: touch /mnt/test2Á.txt && /opt/bin/find -L /mnt -maxdepth 1 -ls | grep /test gives: 54 4 -rw-r--r-- 1 admin root 5 Aug 28 18:26 /mnt/test\303.txt 55 0 -rw-r--r-- 1 admin root 0 Aug 28 18:29 /mnt/test2\303\201.txt In this case the test2 file shows the correct octal escape sequence. What makes this bug particularly nasty is that it creates hard to delete (at least from command line) files. I've use the following reference for UTF-8 values: http://doc.infosnel.nl/extreme_utf-8.html and this for general reference: http://www.cl.cam.ac.uk/~mgk25/unicode.html