Bug 13161

Summary: Services started with start-stop-daemon applet not properly destroyed when stopped
Product: Busybox Reporter: Ivan Castell Rovira <al004140>
Component: OtherAssignee: unassigned
Status: RESOLVED INVALID    
Severity: normal CC: busybox-cvs
Priority: P5    
Version: 1.32.x   
Target Milestone: ---   
Hardware: Other   
OS: Linux   
Host: Target:
Build:
Attachments: busybox config file used

Description Ivan Castell Rovira 2020-08-17 10:26:31 UTC
Created attachment 8561 [details]
busybox config file used

Tested on a ARM platform, but I suppose this happens in all platforms. 

First of all create this simple loop.sh script for the test:

    # cat /root/loop.sh
    while [ 1 ]
    do
        sleep 1000
    done


A service is started with start-stop-daemon as shown:

    # start-stop-daemon -S -b -q -m -p /var/run/my.pid --exec /root/loop.sh


When checking for the service running in background, pid 14710 is found:

    # ps fax | grep loop.sh
    14710 root      0:00 {loop.sh} /bin/sh /root/loop.sh
    14793 root      0:00 grep loop.sh


Then the service is stopped with start-stop-daemon:

    # start-stop-daemon -K -p /var/run/my.pid


You can see the PID 14710 entry with a process named [loop.sh]

    # ps fax | grep loop.sh
    14710 root      0:00 [loop.sh]
    15563 root      0:00 grep loop.sh


If I repeat this start/stop cycle several times, I get this nasty output:

    # ps fax | grep loop.sh
    14710 root      0:00 [loop.sh]
    20555 root      0:00 [loop.sh]
    20633 root      0:00 [loop.sh]
    20670 root      0:00 [loop.sh]
    20707 root      0:00 [loop.sh]
    20775 root      0:00 grep loop.sh


Sending a SIGTERM or a SIGKILL to any of this PIDS doesn't helps:

    # kill -TERM 14710
    # kill -KILL 14710
    # ps fax|grep loop.sh
    14710 root      0:00 [loop.sh]
    20555 root      0:00 [loop.sh]
    20633 root      0:00 [loop.sh]
    20670 root      0:00 [loop.sh]
    20707 root      0:00 [loop.sh]
    27604 root      0:00 grep loop.sh


This has consequences with systemv services, as some of them wait until the PID is fully removed.
Comment 1 Ivan Castell Rovira 2020-10-21 10:12:33 UTC
I discovered this bug only happens when booting the rootfs from root=ram. However, when booting from the emmc device (root=/dev/mmcblk1p3), it doesn't happen. It is the same rootfs.ext4 filesystem generated by buildroot.
Comment 2 Denys Vlasenko 2020-11-27 17:06:39 UTC
(In reply to Ivan Castell Rovira from comment #0)
> Sending a SIGTERM or a SIGKILL to any of this PIDS doesn't helps:
    # kill -TERM 14710
    # kill -KILL 14710
    # ps fax|grep loop.sh
    14710 root      0:00 [loop.sh]
    20555 root      0:00 [loop.sh]
    20633 root      0:00 [loop.sh]
    20670 root      0:00 [loop.sh]
    20707 root      0:00 [loop.sh]
    27604 root      0:00 grep loop.sh

This behavior (of seemingly not dying from SIGKILL) is a typical symptom of the dead process not being reaped by the parent: zombie processes still can be signaled (kill does not return any errors).

Please show pstree -p, so we can see what is the parent of these dead processes.
Comment 3 Ivan Castell Rovira 2020-11-30 09:43:04 UTC
(In reply to Denys Vlasenko from comment #2)

Hello and thanks for your answer. Excuse me by the delay but I had to prepare the setup again. Below are more details and the output you requested with pstree. Hope it helps! 

# busybox
BusyBox v1.27.2 (2020-11-09 12:48:50 UTC) multi-call binary.
BusyBox is copyrighted by many authors between 1998-2015.
Licensed under GPLv2. See source distribution for detailed
copyright notices.

Usage: busybox [function [arguments]...]
   or: busybox --list[-full]
   or: busybox --install [-s] [DIR]
   or: function [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, [[, addgroup, adduser, ar, arp, arping, ash, awk, basename, blkid,
        bunzip2, bzcat, cat, chattr, chgrp, chmod, chown, chroot, chrt, chvt,
        cksum, clear, cmp, cp, cpio, crond, crontab, cut, date, dc, dd,
        deallocvt, delgroup, deluser, devmem, df, diff, dirname, dmesg, dnsd,
        dnsdomainname, dos2unix, du, dumpkmap, echo, egrep, eject, env,
        ether-wake, expr, factor, fallocate, false, fbset, fdflush, fdformat,
        fdisk, fgrep, find, flock, fold, free, freeramdisk, fsck, fsfreeze,
        fstrim, fuser, getopt, getty, grep, gunzip, gzip, halt, hdparm, head,
        hexdump, hostid, hostname, hwclock, i2cdetect, i2cdump, i2cget, i2cset,
        id, ifconfig, ifdown, ifup, inetd, init, insmod, install, ip, ipaddr,
        ipcrm, ipcs, iplink, ipneigh, iproute, iprule, iptunnel, kill, killall,
        killall5, klogd, last, less, link, linux32, linux64, linuxrc, ln,
        loadfont, loadkmap, logger, login, logname, losetup, ls, lsattr, lsmod,
        lsof, lspci, lsscsi, lsusb, lzcat, lzma, lzopcat, makedevs, md5sum,
        mdev, mesg, microcom, mkdir, mkdosfs, mke2fs, mkfifo, mknod, mkpasswd,
        mkswap, mktemp, modprobe, more, mount, mountpoint, mt, mv, nameif,
        netstat, nice, nl, nohup, nproc, nslookup, od, openvt, partprobe,
        passwd, paste, patch, pidof, ping, pipe_progress, pivot_root, poweroff,
        printenv, printf, ps, pwd, rdate, readlink, readprofile, realpath,
        reboot, renice, reset, resize, rm, rmdir, rmmod, route, run-parts,
        runlevel, sed, seq, setarch, setconsole, setkeycodes, setlogcons,
        setpriv, setserial, setsid, sh, sha1sum, sha256sum, sha3sum, sha512sum,
        shred, sleep, sort, start-stop-daemon, stat, strings, stty, su,
        sulogin, svc, swapoff, swapon, switch_root, sync, sysctl, syslogd,
        tail, tar, tee, telnet, test, tftp, time, top, touch, tr, traceroute,
        true, truncate, tty, ubirename, udhcpc, uevent, umount, uname, uniq,
        unix2dos, unlink, unlzma, unlzop, unxz, unzip, uptime, usleep,
        uudecode, uuencode, vconfig, vi, vlock, w, watch, watchdog, wc, wget,
        which, who, whoami, xargs, xxd, xz, xzcat, yes, zcat


# start-stop-daemon -S -b -q -m -p /var/run/my.pid --exec /root/loop.sh
# start-stop-daemon -K -p /var/run/my.pid
stopped process in pidfile '/var/run/my.pid' (pid 397)

# start-stop-daemon -S -b -q -m -p /var/run/my.pid --exec /root/loop.sh
# start-stop-daemon -K -p /var/run/my.pid
stopped process in pidfile '/var/run/my.pid' (pid 402)

# start-stop-daemon -S -b -q -m -p /var/run/my.pid --exec /root/loop.sh
# start-stop-daemon -K -p /var/run/my.pid
stopped process in pidfile '/var/run/my.pid' (pid 407)

# start-stop-daemon -S -b -q -m -p /var/run/my.pid --exec /root/loop.sh
# start-stop-daemon -K -p /var/run/my.pid
stopped process in pidfile '/var/run/my.pid' (pid 412)

# start-stop-daemon -S -b -q -m -p /var/run/my.pid --exec /root/loop.sh
# start-stop-daemon -K -p /var/run/my.pid
stopped process in pidfile '/var/run/my.pid' (pid 417)


# ps fax|grep loop.sh
  397 root     [loop.sh]
  402 root     [loop.sh]
  407 root     [loop.sh]
  412 root     [loop.sh]
  417 root     [loop.sh]
  421 root     grep loop.sh



# pstree -p
swapper/0(1)-+-dbus-daemon(183)
             |-klogd(148)
             |-loop.sh(397)
             |-loop.sh(402)
             |-loop.sh(407)
             |-loop.sh(412)
             |-loop.sh(417)
             |-macipsrv(198)
             |-monitor_eth_con(203)
             |-sleep(374)
             |-sleep(384)
             |-sleep(398)
             |-sleep(403)
             |-sleep(408)
             |-sleep(413)
             |-sleep(418)
             |-sshd(268)
             |-start-stop-daem(143)
             |-start-stop-daem(146)
             |-start-stop-daem(193)
             |-start-stop-daem(196)
             |-start-stop-daem(200)
             |-start-stop-daem(396)
             |-start-stop-daem(401)
             |-start-stop-daem(406)
             |-start-stop-daem(411)
             |-start-stop-daem(416)
             |-syslogd(145)
             |-udevd(151)
             |-udhcpc(195)
             `-udhcpc(248)


# ps fax
PID   USER     COMMAND
    1 root     [swapper/0]
    2 root     [kthreadd]
    3 root     [ksoftirqd/0]
    4 root     [kworker/0:0]
    5 root     [kworker/0:0H]
    6 root     [kworker/u2:0]
    7 root     [rcu_preempt]
    8 root     [rcu_sched]
    9 root     [rcu_bh]
   10 root     [migration/0]
   11 root     [lru-add-drain]
   12 root     [cpuhp/0]
   13 root     [kdevtmpfs]
   14 root     [netns]
   15 root     [oom_reaper]
   16 root     [writeback]
   17 root     [kcompactd0]
   18 root     [crypto]
   19 root     [bioset]
   20 root     [kblockd]
   21 root     [kworker/0:1]
   22 root     [cfg80211]
   23 root     [watchdogd]
   24 root     [kswapd0]
   25 root     [vmstat]
   70 root     [bioset]
   71 root     [bioset]
   72 root     [bioset]
   73 root     [bioset]
   74 root     [bioset]
   75 root     [bioset]
   76 root     [bioset]
   77 root     [bioset]
   78 root     [bioset]
   79 root     [bioset]
   80 root     [bioset]
   81 root     [bioset]
   82 root     [bioset]
   83 root     [bioset]
   84 root     [bioset]
   85 root     [bioset]
   86 root     [bioset]
   87 root     [bioset]
   88 root     [bioset]
   89 root     [bioset]
   90 root     [bioset]
   91 root     [bioset]
   92 root     [bioset]
   93 root     [bioset]
   94 root     [kworker/u2:1]
   96 root     [cfinteractive]
   97 root     [irq/53-mmc0]
   98 root     [irq/54-mmc1]
   99 root     [kworker/0:2]
  100 root     [kworker/0:3]
  101 root     [ipv6_addrconf]
  102 root     [bioset]
  103 root     [krfcommd]
  104 root     [mmcqd/1]
  105 root     [bioset]
  106 root     [mmcqd/1boot0]
  107 root     [bioset]
  108 root     [mmcqd/1boot1]
  109 root     [bioset]
  110 root     [mmcqd/1rpmb]
  111 root     [irq/36-imx_ther]
  112 root     [jbd2/ram0-8]
  113 root     [ext4-rsv-conver]
  114 root     {linuxrc} init
  122 root     [kworker/0:1H]
  123 root     [jbd2/mmcblk1p5-]
  124 root     [ext4-rsv-conver]
  125 root     [jbd2/mmcblk1p6-]
  126 root     [ext4-rsv-conver]
  143 root     [start-stop-daem]
  146 root     [start-stop-daem]
  148 root     /sbin/klogd -n
  151 root     /sbin/udevd -d
  183 dbus     dbus-daemon --system
  193 root     [start-stop-daem]
  195 root     [udhcpc]
  196 root     [start-stop-daem]
  198 root     [macipsrv]
  200 root     [start-stop-daem]
  203 root     [monitor_eth_con]
  217 root     [kworker/u2:2]
  221 root     -sh
  374 root     [sleep]
  384 root     [sleep]
  396 root     [start-stop-daem]
  397 root     [loop.sh]
  398 root     sleep 1000
  401 root     [start-stop-daem]
  402 root     [loop.sh]
  403 root     sleep 1000
  406 root     [start-stop-daem]
  407 root     [loop.sh]
  408 root     sleep 1000
  411 root     [start-stop-daem]
  412 root     [loop.sh]
  413 root     sleep 1000
  416 root     [start-stop-daem]
  417 root     [loop.sh]
  418 root     sleep 1000
  423 root     ps fax
Comment 4 Denys Vlasenko 2020-11-30 17:01:24 UTC
> # ps fax
> PID   USER     COMMAND
>     1 root     [swapper/0]

Your init process is kernel's idle thread.
How did that happen??

Normally, kernel's idle thread has pid 0 (not 1 as it is seen on your system) and is not visible to userspace (/proc/0 does not exist).
Of course, kernel's idle thread is not doing anything useful, in particular, it does not reap dead children. Thus you see the zombies.
Comment 5 Ivan Castell Rovira 2020-12-01 08:01:54 UTC
As explained in a previous posts, this issue only happens when the board is booted from root=ram. The same rootfs bootted using root=/dev/mmcblk1p3 partitions works fine:

# cat /proc/cmdline
console=ttymxc0,115200 root=/dev/mmcblk1p3 rootwait ro

# ps fax
PID   USER     COMMAND
    1 root     init
    2 root     [kthreadd]
    3 root     [ksoftirqd/0]
    4 root     [kworker/0:0]
    5 root     [kworker/0:0H]
    6 root     [kworker/u2:0]
    7 root     [rcu_preempt]
    8 root     [rcu_sched]
    9 root     [rcu_bh]
   10 root     [migration/0]
   11 root     [lru-add-drain]
   12 root     [cpuhp/0]
   13 root     [kdevtmpfs]
   14 root     [netns]


Just in case some information could be helpful to debug, attached are all relevant files involved in the initial steps of mounting the rootfs:

# ls -l /sbin/init
lrwxrwxrwx    1 root     root     14 Nov  9 12:49 /sbin/init -> ../bin/busybox


# cat /proc/cmdline 
console=ttymxc0,115200 root=ram initrd=0x82000000,50M rootfstype=ext4 rw


# cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system> <mount pt>   <type>  <options>                <dump>  <pass>
/dev/root       /            ext4    auto,ro                    0       0
proc            /proc        proc    defaults                   0       0
devpts          /dev/pts     devpts  defaults,gid=5,mode=620    0       0
tmpfs           /dev/shm     tmpfs   mode=0777                  0       0
tmpfs           /tmp         tmpfs   defaults                   0       0
sysfs           /sys         sysfs   defaults                   0       0
debugfs   /sys/kernel/debug/ debugfs defaults                   0       0




# cat /etc/inittab
# /etc/inittab
# Startup the system
::sysinit:/bin/mount -o remount,rw /
::sysinit:/etc/init.d/rcS
::sysinit:/bin/mount -o remount,ro /

# Put a getty on the serial port
ttymxc0::respawn:/sbin/getty -L  ttymxc0 0 vt100 # GENERIC_SERIAL

# Stuff to do before rebooting
::shutdown:/etc/init.d/rcK
::shutdown:/bin/umount -a -r



cat /etc/init.d/rcS
#!/bin/sh

mount_overlay() {
        mkdir -p /tmp/overlays$1 /tmp/overlays-work$1
        mount -t overlay overlay -o lowerdir=$1,upperdir=/tmp/overlays$1,workdir=/tmp/overlays-work$1 $1
}

. /etc/profile
# Flasher manually mounts devtmpfs
mount -t devtmpfs none /dev 2> /dev/null || true
mkdir -p /dev/pts /dev/shm
mount -a
echo "[$IMAGE_SIDE] Mounting manually devtmpfs ... $(get_status)"

# Mount some overlays
mount_overlay /etc
mount_overlay /root
mount_overlay /usr
mount_overlay /var
mount_overlay /run

Hope all this this helps to discover what is wrong. Please remember the rootfs is built using the buildroot tool. Thanks.
Comment 6 Denys Vlasenko 2020-12-02 15:42:42 UTC
(In reply to Ivan Castell Rovira from comment #5)
> As explained in a previous posts, this issue only happens when the board is booted from root=ram. The same rootfs bootted using root=/dev/mmcblk1p3 partitions works fine

Well, when you run with root=ram, where is your init process (pid 1)? It's not in the pstree output - instead, you see "swapper/0", kernel's idle thread.

This does not look right. You need to investigate why this happens.