Bug 9971

Summary: shuf implementation does not correctly implement algorithm chosen
Product: Busybox Reporter: Rob <rob.bast>
Component: OtherAssignee: unassigned
Status: RESOLVED FIXED    
Severity: normal CC: admwiggin+busyboxbugs, afenkart, busybox-cvs, ralphs
Priority: P5    
Version: unspecified   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Host: Target:
Build:

Description Rob 2017-06-20 11:45:05 UTC
The shuf implementation refers to the https://en.wikipedia.org/wiki/Fisher–Yates_shuffle algorithm, however it does not seem to actually follow it. This leads to peculiar behaviour. For example, given input file:

    foo
    bar
    baz

after shuffling the input file, foo will never end up back on the first line. This came to light when I ran into a use-case where someone was selecting a random line from a file using shuf | head -n 1, and the results on busybox were showing a statistical anomaly (as in, the first line would never ever be picked) vs the same process running on environments that had gnu coreutils installed.

On line https://git.busybox.net/busybox/tree/coreutils/shuf.c#n56 it uses r %= i, which will result in 0 <= r < i, while the algorithm specifies 0 <= r <= i.
Comment 1 Ralph Siemsen 2017-06-20 11:51:43 UTC
I did a quick test with shuf.c line 56 changed to:

    r %= i+1;

and it seems to solve the problem.
Comment 2 Denys Vlasenko 2017-07-08 22:53:28 UTC
Fixed in git, thanks!