Bug 15748

Summary: with leading quote, printf prints value of first byte of a character instead of its numeric value in the codset
Product: Busybox Reporter: Chris Slycord <cslycord>
Component: Standard ComplianceAssignee: unassigned
Status: NEW ---    
Severity: normal CC: busybox-cvs
Priority: P5    
Version: 1.35.x   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Host: Target:
Build:

Description Chris Slycord 2023-08-31 07:15:14 UTC
According to posix standard for printf:
If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.

This implies that it should be the character's codepoint, which is what is used in coreutils and bash.

In busybox, instead it can return the value of the first of byte of the character.

Examples:
* 바
   HEX codepoint: BC14
   DEC codepoint: 48148
   Hex UTF-8 bytes: EB B0 94
   (UTF-8 bytes converted to DEC): 235 176 148
https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EB%B0%94&mode=char
* 학
   HEX codepoint: D559
   DEC codepoint: 54617
   Hex UTF-8 bytes: ED 95 99
   (UTF-8 bytes converted to DEC): 237 149 153
https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%ED%95%99&mode=char


# busybox printf '%X' "'바"
EB
busybox printf '%X' "'학"
ED
# busybox printf '%d' "'바"
235
# busybox printf '%d' "'학"
237
(these are the HEX and DEC values of the first byte of the character)


Then the printf from coreutils/bash
# printf '%X' "'바"
BC14
# printf '%X' "'학"
D559
# printf '%d' "'바"
48148
# printf '%d' "'학"
54617
(which are the HEX and DEC values of the character's codepoint)

Same happens with multibyte Chinese characters.
# (coreutils) printf '%X' "'传"
4F20
# busybox printf '%X' "'传"
E4

传 has HEX codepoint 4F20 and Hex UTF-8 bytes: E4 BC A0
Comment 1 Chris Slycord 2023-08-31 07:24:53 UTC
Seems to happen for any multibyte character.

# (coreutils) printf '%X' "'ݔ"
754
# busybox printf '%X' "'ݔ"
DD

ݔ has Hex code point: 0754 and Hex UTF-8 bytes: DD 94