Bug 11471

Summary: printf %q format not supported (yet)
Product: Busybox Reporter: bbb30 <bbb30>
Component: OtherAssignee: unassigned
Status: NEW ---    
Severity: normal CC: busybox-cvs
Priority: P5 Keywords: FIXME
Version: 1.27.x   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Host: Target:
Build:
Attachments: Text file with sample of typographic open and close quotes around text.

Description bbb30 2018-11-01 22:32:53 UTC
Created attachment 7851 [details]
Text file with sample of typographic open and close quotes around text.

I'm running busybox on android kitkat with a terminal emulator. Sometime files I try to parse contain characters that cause grief. For example, a typographical close quote (think 66-99 style quotes), when echo'd, printf'd %s or a line with it displayed via set -x in sh or bash, causes multiple duplicate lines of the same output, followed by a terminal hang. I'll attach a file with a sample of that character. Other characters, like unescaped single quotes and variants of that, also mess up scripts. I don't have an SDK so I can't compile iconv and I don't have PERL etc. I saw no mention of this particular problem in this bug database although there are some other unicode problem reports which seem unrelated. Busybox printf reports %q as an invalid format. Without any decent way to clean text strings, it can be extremely hard to write and test scripts that encounter problematic characters. If I knew where to look for the config file I would have attached it: I installed busybox with the Fdroid app. But given the release notes, I expect this issue is  cross-platform.
Comment 1 Denys Vlasenko 2018-11-02 13:32:25 UTC
(In reply to bbb30 from comment #0)
> I'm running busybox on android kitkat with a terminal emulator. Sometime files I try to parse contain characters that cause grief. For example, a typographical close quote (think 66-99 style quotes), when echo'd, printf'd %s or a line with it displayed via set -x in sh or bash, causes multiple duplicate lines of the same output, followed by a terminal hang.

This sounds like a bug in the terminal emulator - it cannot handle Unicode character 0x201D, or byte sequence 0xe2,0x80,0x9d in UTF8 encoding. File a bug with them.

print %q

$ printf '%q\n' '“teeth bone”'
“teeth\ bone”