According to posix standard for printf: If the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote. This implies that it should be the character's codepoint, which is what is used in coreutils and bash. In busybox, instead it can return the value of the first of byte of the character. Examples: * 바 HEX codepoint: BC14 DEC codepoint: 48148 Hex UTF-8 bytes: EB B0 94 (UTF-8 bytes converted to DEC): 235 176 148 https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EB%B0%94&mode=char * 학 HEX codepoint: D559 DEC codepoint: 54617 Hex UTF-8 bytes: ED 95 99 (UTF-8 bytes converted to DEC): 237 149 153 https://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%ED%95%99&mode=char # busybox printf '%X' "'바" EB busybox printf '%X' "'학" ED # busybox printf '%d' "'바" 235 # busybox printf '%d' "'학" 237 (these are the HEX and DEC values of the first byte of the character) Then the printf from coreutils/bash # printf '%X' "'바" BC14 # printf '%X' "'학" D559 # printf '%d' "'바" 48148 # printf '%d' "'학" 54617 (which are the HEX and DEC values of the character's codepoint) Same happens with multibyte Chinese characters. # (coreutils) printf '%X' "'传" 4F20 # busybox printf '%X' "'传" E4 传 has HEX codepoint 4F20 and Hex UTF-8 bytes: E4 BC A0
Seems to happen for any multibyte character. # (coreutils) printf '%X' "'ݔ" 754 # busybox printf '%X' "'ݔ" DD ݔ has Hex code point: 0754 and Hex UTF-8 bytes: DD 94