Bug 11471 - printf %q format not supported (yet)
Summary: printf %q format not supported (yet)
Status: NEW
Alias: None
Product: Busybox
Classification: Unclassified
Component: Other (show other bugs)
Version: 1.27.x
Hardware: All Linux
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords: FIXME
Depends on:
Blocks:
 
Reported: 2018-11-01 22:32 UTC by bbb30
Modified: 2018-11-02 13:32 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments
Text file with sample of typographic open and close quotes around text. (83 bytes, text/plain)
2018-11-01 22:32 UTC, bbb30
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bbb30 2018-11-01 22:32:53 UTC
Created attachment 7851 [details]
Text file with sample of typographic open and close quotes around text.

I'm running busybox on android kitkat with a terminal emulator. Sometime files I try to parse contain characters that cause grief. For example, a typographical close quote (think 66-99 style quotes), when echo'd, printf'd %s or a line with it displayed via set -x in sh or bash, causes multiple duplicate lines of the same output, followed by a terminal hang. I'll attach a file with a sample of that character. Other characters, like unescaped single quotes and variants of that, also mess up scripts. I don't have an SDK so I can't compile iconv and I don't have PERL etc. I saw no mention of this particular problem in this bug database although there are some other unicode problem reports which seem unrelated. Busybox printf reports %q as an invalid format. Without any decent way to clean text strings, it can be extremely hard to write and test scripts that encounter problematic characters. If I knew where to look for the config file I would have attached it: I installed busybox with the Fdroid app. But given the release notes, I expect this issue is  cross-platform.
Comment 1 Denys Vlasenko 2018-11-02 13:32:25 UTC
(In reply to bbb30 from comment #0)
> I'm running busybox on android kitkat with a terminal emulator. Sometime files I try to parse contain characters that cause grief. For example, a typographical close quote (think 66-99 style quotes), when echo'd, printf'd %s or a line with it displayed via set -x in sh or bash, causes multiple duplicate lines of the same output, followed by a terminal hang.

This sounds like a bug in the terminal emulator - it cannot handle Unicode character 0x201D, or byte sequence 0xe2,0x80,0x9d in UTF8 encoding. File a bug with them.

print %q

$ printf '%q\n' '“teeth bone”'
“teeth\ bone”