PXE boot server is sending pxelinux.0 file name to PXE client with additional "garbage" characters at the end of the name. When some ROM-based PXE clients get the pxelinux.0 filename from PXE/tftp server, the name has additional characters appended to the end of the file name. PXE boot fails with a File not found error message. Other PXE clients work fine when booting from the same server. gPXE and iPXE will work for some clients whose boot ROM fails, but won't work for others. All clients PXE boot fine from a Debian Squeeze server using dnsmasq. No special setup was required to make it work. I am using Tiny Core Linux 4.7.5. Clean boot from live CD. Run tc-terminal-server script from Control Panel (or from CLI - same behavior). Standard settings used in response to TC-Terminal-Server script: Boot device: /dev/sda1 Shared IP range start: 192.168.0.11 Shared IP range end: 192.168.0.22 Netcard to use: eth0 DNS server: 192.168.0.1 Subnet: 255.255.255.0 Gateway: 192.168.0.1 The PXE/DHCP/tftp server starts and some clients PXE boot properly. Others will not PXE boot from ROM but will boot with gPXE (on some clients) or iPXE (different clients) or either gPXE or iPXE (yet other clients). Some clients will not PXE boot with ROM or gPXE or iPXE. Sample client that will not PXE (ROM) boot: PXE Boot ROM: Intel UNDI, PXE-2.0 Build 082 Realtek RTL8139 (A/B/C) RTL8030 Fast Ethernet controller V2.12 (010425) Fragment of tcp trace of PXE boot: ASCII: ../pxelinux.0.Q3*G.octet.tsize.0 Hex: 00 01 2F 70 78 65 6C 69 6E 75 78 2E 30 FF D1 33 2A 47 00 6F 63 74 65 74 00 74 73 69 7A 65 00 30 Notice the Q3*G after the name pxelinux.0. This does not show up on PXE clients when PXE booting from the Debian server running dnsmasq. And all clients can boot properly from the dnsmasq server. From information I have garnered on the Web this is a known problem. One work-around proposed is to use a symlink to the pxelinux.0 file, a method used in the Tiny Core Linux tc-terminal-setup script. Unfortunately, that does not always solve the problem. Ideally, the program should send the pxelinux.0 file name properly so that all PXE clients can find the file and download it from the tftp server. In addition, it would be very good if there were some way to send the complete file name to the client to make it that much easier for end users to define the problem.
(In reply to comment #0) > The PXE/DHCP/tftp server starts and some clients PXE boot properly. What is "PXE/DHCP/tftp server"? busybox has DHCP server (udhcpd) and tftp server (tftpd). In order to debug your problem, I need to know the version of busybox, its build .config file, and the runtime configuration: * command line options of tftpd and udhcpd invocations * conf file for udhcpd, > Fragment of tcp trace of PXE boot: > ASCII: > ../pxelinux.0.Q3*G.octet.tsize.0 > Hex: > 00 01 2F 70 78 65 6C 69 6E 75 78 2E 30 FF D1 33 > 2A 47 00 6F 63 74 65 74 00 74 73 69 7A 65 00 30 Can you attach the *entire* network trace?
Created attachment 7906 [details] tcpdump snippets for udhcpd vs dnsmasq
I have encountered this problem as well. I can send more information if needed, but it looks like the difference is that dnsmasq adds a terminating NULL byte to the end of the filename, but udhcpd does not. If I understand the RFC correctly, busybox is actually doing the right thing, but some PXE clients do not. According to RFC 2132: "Options containing NVT ASCII data SHOULD NOT include a trailing NULL; however, the receiver of such options MUST be prepared to delete trailing nulls if they exist. The receiver MUST NOT require that a trailing null be included in the data." But from a practical standpoint, including the NULL byte violates the RFC but satisfies some broken clients. On the other hand, if a client is in compliance with the RFC, the extra NULL terminator should not be a problem. This happens with the pre-packaged Arch Linux build (busybox v1.29.3) as well as my own (busybox v1.31.0.git-2019-01-02) build. Attached is a file containing the relevant snippets from the tcpdump output for both udhcpd and dnsmasq.
Basically, it's a client bug. Ideally, to solve cases like this we need a syntax to include nonprintable chars, including NULL, for string options. Just adding NUL to all strings bound to bite us when in some cases people would want that to NOT be done. Today, this can be done clumsily in udhcpd.conf like this: # For broken clients, add terminating NUL. #option bootfile "pxelinux.0"NUL - udhcpd doesn't support this syntax option 0x43 7078656c696e75782e3000 Can someone confirm this works? What syntax do you propose for strings to support it more nicely? "\0"?
I can confirm that # option bootfile pxelinux.0 option 0x43 7078656c696e75782e3000 # option pxeconffile pxelinux.cfg/default option 0xd1 7078656c696e75782e6366672f64656661756c7400 Works for me on at least one of these ancient clients. I have some others to test but I'm confident this will do the trick. So I am happy with udhcpd as-is, now that I know about this workaround. Thank you! As far as adding some way to specify this in a more human-friendly way, how about a "-z" command line option to turn on zero-terminators for all strings? Or maybe just the PXE-related strings? I could see some clients breaking with a terminator and other clients breaking without it, but a client that demands a terminator on some strings but not on other strings would really have to be seriously broken.