Created attachment 7136 [details] Patch to workaround the issue I periodically see failures when generating squashfs images with BR2_ROOTFS_DEVICE_TABLE_SUPPORTS_EXTENDED_ATTRIBUTES makedevs option enabled. I am setting specific capabilities on files which are required for them to run properly but these settings get dropped sometimes. I only see this when doing builds on servers with a large number of cores. Running on small machines seems to hide the bug. I believe the issue is related to fakeroot but have been unable to verify that for sure. I have checked the Debian fakeroot package (there is a newer version) but no commits have been made related to this issue. The current work around I have in place is to enable erroring out when generating the squashfs image and to attempt to generate the squashfs 10 times before failing. Not the best solution but it's better than silently failing. Attached is the patch with the hack.
Forgot to mention what is printed when the error occurs. I see the following printed from mksquashfs. The error code changes often and so does the file it failed on. It only happens about 10% of the time on the servers I am running on. llistxattr for [file] failed in read_attrs, because Unknown error 810343192
Clayton, Are you able to reproduce if you add "-processors 1" to the mksquashfs command? Regards, Yann E. MORIN.
Clayton, As a stop-gap measure, I've sent a patch that makes mksquashfs honour the configured number of parallel jobs: https://patchwork.ozlabs.org/patch/796879/ Of course, if you use auto-caclulation, that won't help. But if you already used a lower limit on yout build jobs, that may help. Regards, Yann E. MORIN.
(In reply to Yann E. MORIN from comment #2) Yann, I ran a test overnight on 4 different builds with the "-processors 1" argument added to the squashfs creation and saw no errors whatsoever. This however adds quite a bit of time to the squashfs creation so I would like to avoid that if possible to keep the build time as short as possible. Thanks, Clayton
Clayton, All, So I've made a little Makefile to help investigate the issue. Usign a Makefile is definitely not required, but it help with parallel stuff, just to create the files (note that it takes quite a lot of space on the disk, more than 4GiB): MKSQUASHFS ?= mksquashfs SQSH_JOBS ?= 512 FILES := $(patsubst %,toto/%,$(shell seq 1 1024)) DIRS := $(patsubst %,titi/%,$(shell seq 1 1024)) all: titi.sqsh rules: $(FILES) titi.sqsh: $(DIRS) $(MKSQUASHFS) titi $(@) -noappend -processors $(SQSH_JOBS) >/dev/null toto/%: @mkdir -p toto @dd if=/dev/urandom of=$(@) bs=4096 count=1 2>/dev/null $(DIRS): $(FILES) titi/%: @mkdir -p titi @cp -a toto $(@) @chattr -R +A $(@) And I call it like that (where 'O' is my out-of-tree build directory): ~/dev/O/host/bin/fakeroot make -j 1000 MKSQUASHFS=~/dev/O/host/bin/mksquashfs It is using the fakeroot and the mksquashfs as built by Buildroot, and I'm able to reproduce the issue; the failing files change with each run, due to scheduling ordering, but the number of failures is roughly always the same, around 10 failures for each runs, sometimes one or two less, sometimes a few more, but rarely above 12 or under 9. llistxattr for titi/214/419 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/321/41 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/381/586 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/461/579 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/491/992 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/558/989 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/608/1022 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/769/843 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/816/899 failed in read_attrs, because Unknown error -193619184. Ignoring llistxattr for titi/832/714 failed in read_attrs, because Unknown error -193619184. Ignoring | fakeroot | | Buildroot | Ubuntu | no fakeroot ---------------------+---------------------------------- mksquashfs Buildroot | KO | KO | OK mksquashfs Ubuntu | KO | KO | OK So it really is fakeroot that is causing the issues... :-/ No idea how to investigate further for now... Note: I was never able to run with more than about -processors 1019, or mksquashfs would fail at startup. 512 is anyway way above the 8 CPUs I have... Yet, it's enough to cause the failures, so job done.
Created attachment 7191 [details] Tentative fix Clayton, Could you test the attached patch and see if it fixes your issue? Regard, Yann E. MORIN.
Clayton, I've sent the patch to the list, so if you could rather reply there, that'd be great: https://patchwork.ozlabs.org/patch/801046/ Regards, Yann E. MORIN.
Clayton, All, The patch has been applied: https://git.buildroot.org/buildroot/commit/?id=eff989bab851ab01d190f3771558eb6ac30af255 Obviously, we're still interested in feeback, so feel free to re-open if you're still affected by the issue. Most notably, can you check that the xattr you get in your generated filesystem are correct? Thanks! Regards, Yann E. MORIN.
This issue still affects me. I'm using fakeroot and mksquashfs to build multiple squashfs from different mounts. Some context... I'm building 5 fairly large squashfs in parallel. The xattrs intermittently fail to get copied over. The more xattrs I have and the more squashfs I'm building in parallel, the more it will fail to copy over the xattrs. Each squashfs is about 500mb in size.
(In reply to Bob from comment #9) What version of Buildroot are you using? This is supposedly fixed with commit eff989bab (package/fakeroot: fix highly parallel uses) which first shipped in Buildroot 2017.08, and was back-ported over to 2017.02.6 in commit ac847623f5. Can you check you heither one of those changes in your tree?
(In reply to Yann E. MORIN from comment #10) I'm using fakeroot version 1.20.2 on Ubuntu 16.04. I'm not sure if this helps, but this is what the code looks like. squashfuse first.sqsh first_mount squashfuse second.sqsh second_mount # overlay multiple mounts unionfs-fuse -o cow,hide_meta_files first_mount=RW:second_mount=RO final_mount # set capabilities fakeroot setcap cap_linux_immutable final_mount/usr/blah.txt # create final squash fakeroot mksquashfs final_mount final.sqsh -no-progress -noappend -comp xz
(In reply to Bob from comment #11) So you using the fakeroot and the mksquashfs from your distribution? we do not support those. Buildroot builds and uses its own fakeroot and its own mksquashfs. Sin you are usign the ones from your distribution, and that it is a pretty "old" disgribution, it is not unsurprising that it does not have the fxes we do have... Try using the fakeroot and mksquashfs provided by Buildroot, and check if you still have the issue. In the meantime, I don't think this is a Buildroot issue so I'm closing.