Bug 12236 - ash hangs after wait returns EINTR
Summary: ash hangs after wait returns EINTR
Status: RESOLVED INVALID
Alias: None
Product: Busybox
Classification: Unclassified
Component: Other (show other bugs)
Version: 1.31.x
Hardware: Other Linux
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-19 12:51 UTC by martin.brueckner
Modified: 2021-06-06 10:38 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description martin.brueckner 2019-09-19 12:51:19 UTC
I am running busybox on a NIOS2 Softcore CPU. As shell I am using ash.
With process IDs below 512 everything runs smoothly.
But when it comes to PID 512 ash hangs in an endless loop.
Starting a new ash shell afterwards (PID > 512) works. One has to kill the first shell with -9. ash in the endless loop keeps the CPU at 100%.

To debug this issue I let ash run with strace.
Additionally I wrote a small program which prints its own and the parent's PID.

# ./pid
My process ID : 495
My parent's ID: 137
# 

Having ash running with strace and starting ./pid gives me this output:

ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon echo ...}) = 0
rt_sigaction(SIGWINCH, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=NULL) = 518
setpgid(518, 518)                       = 0
wait4(-1, My process ID : 518
My parent's ID: 517
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED, NULL) = 518
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=518, si_uid=0, si_status=0, si_utime=1, si_stime=6} ---
rt_sigreturn({mask=[]})                 = 518
ioctl(10, TIOCSPGRP, [517])             = 0
wait4(-1, 0x7fdadc28, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost -isig -icanon -echo ...}) = 0


Running ./pid with PID 512 wait4 returns with EINTR and then wait is called again and again returning ECHILD. ash does not recover from this.

Other shells like bash or zsh seems to handle this differently. They don't hang at PID 512.


ioctl(0, SNDCTL_TMR_START or TCSETS, {B134 -opost isig icanon echo ...}) = 0
rt_sigaction(SIGWINCH, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=NULL) = 512
setpgid(512, 512)                       = 0
wait4(-1, My process ID : 512
My parent's ID: 143
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED, NULL) = 512
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=512, si_uid=0, si_status=0, si_utime=0, si_stime=7} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
...
(endless loop)
Comment 1 Denys Vlasenko 2019-10-08 13:56:42 UTC
Running ./pid with PID 512 wait4 returns with EINTR and then wait is called again and again returning ECHILD. ash does not recover from this.

Other shells like bash or zsh seems to handle this differently. They don't hang at PID 512.

ioctl(0, SNDCTL_TMR_START or TCSETS, {B134 -opost isig icanon echo ...}) = 0
rt_sigaction(SIGWINCH, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 8) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=NULL) = 512
setpgid(512, 512)                       = 0
wait4(-1, My process ID : 512
My parent's ID: 143
[{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED, NULL) = 512
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=512, si_uid=0, si_status=0, si_utime=0, si_stime=7} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)
wait4(-1, 0x7fc08b98, WSTOPPED, NULL)   = -1 ECHILD (No child processes)

The above looks like the following. wait4 syscall returns 512:
the "WSTOPPED, NULL) = 512" part clearly shows that syscall does NOT return EINTR, ERESTARTSYS or the like. It returns 512.

However, SIGCHLD is simultaneously delivered, therefore on return from wait4  kernel constructs the signal frame on stack and runs the signal handler
(in case of ash, it's signal_handler() function).

When it returns, (signal frame on stack is constructed so that) it returns to a stub in C library which executes rt_sigreturn syscall, which *should* restore registers and stack to the state it was before signal was delivered. However, we see "= -1 EINTR" there. If signal handler was invoked immediately on exit from wait4 syscall, this means we see a bug: rt_sigreturn did not correctly restore the syscall return register.

(This may not be so as there is no way to see when signal was delivered, it could be delivered a microsecond later later - however, as your other experiment shows, _usually_ SIGCHLD is delivered immediately at wait4.)

The bug may be caused by the fact that ERESTARTSYS constant *is* equal to 512. So rt_sigreturn code may be buggy and might be misinterpreting the value of 512 somewhere as ERESTARTSYS.

Actually, this seems to be what happens in linux/arch/nios2/kernel/signal.c

static int do_signal(struct pt_regs *regs)
{
        unsigned int retval = 0, continue_addr = 0, restart_addr = 0;
        int restart = 0;
        struct ksignal ksig;

        current->thread.kregs = regs;

        /*
         * If we were from a system call, check for system call restarting...
         */
        if (regs->orig_r2 >= 0) {
                continue_addr = regs->ea;
                restart_addr = continue_addr - 4;
                retval = regs->r2;
                
                /*
                 * Prepare for system call restart. We do this here so that a
                 * debugger will see the already changed PC.
                 */
                switch (retval) {
                case ERESTART_RESTARTBLOCK:
                        restart = -2;
                case ERESTARTNOHAND:
                case ERESTARTSYS:
                case ERESTARTNOINTR:
                        restart++;
                        regs->r2 = regs->orig_r2;
                        regs->r7 = regs->orig_r7;
                        regs->ea = restart_addr;
                        break;
                }
        }
        
        if (get_signal(&ksig)) {
                /* handler */
                if (unlikely(restart && regs->ea == restart_addr)) {
                        if (retval == ERESTARTNOHAND ||
                            retval == ERESTART_RESTARTBLOCK ||
                             (retval == ERESTARTSYS
                                && !(ksig.ka.sa.sa_flags & SA_RESTART))) {
                                regs->r2 = EINTR;
                                regs->r7 = 1;
                                regs->ea = continue_addr;
                        }
                }
                handle_signal(&ksig, regs);
                return 0;
        }
Comment 2 Denys Vlasenko 2019-10-08 14:06:35 UTC
Yes, looks like the bug is here:
        /*
         * If we were from a system call, check for system call restarting...
         */
        if (regs->orig_r2 >= 0) {
                continue_addr = regs->ea;
                restart_addr = continue_addr - 4;
                retval = regs->r2;
                
                /*
                 * Prepare for system call restart. We do this here so that a
                 * debugger will see the already changed PC.
                 */
                switch (retval) {
                case ERESTART_RESTARTBLOCK:
                        restart = -2;
                case ERESTARTNOHAND:
                case ERESTARTSYS:
                case ERESTARTNOINTR:
                        restart++;
                        regs->r2 = regs->orig_r2;
                        regs->r7 = regs->orig_r7;
                        regs->ea = restart_addr;
                        break;
                }
        }

The above code examines r2, the "errno" register on the exit from syscall, but it DOES NOT check that it is indeed an error! 

linux/arch/nios2/kernel/entry.S says:

        /* Execute the system call */
        callr   r1

        /* If the syscall returns a negative result:
         *   Set r7 to 1 to indicate error,
         *   Negate r2 to get a positive error code
         * If the syscall returns zero or a positive value:
         *   Set r7 to 0.
         * The sigreturn system calls will skip the code below by
         * adding to register ra. To avoid destroying registers
         */
translate_rc_and_ret:
        movi    r1, 0
        bge     r2, zero, 3f
        sub     r2, zero, r2
        movi    r1, 1
3:

IOW: the code in do_signal() should have checked that r7 == 1 before assuming that r2 == 512 means "ERESTARTSYS".
Comment 3 martin.brueckner 2019-10-21 21:43:21 UTC
(In reply to Denys Vlasenko from comment #2)

Thanks Denys for your very helpful analysis.
I am on vacation and couldn't reply earlier.

When I am back at work I can try to patch the kernel to check r7==1.
I am unsure where to put the check for r7==1. Is it after "case ERESTARTNOINTR" like this?

        if (regs->orig_r2 >= 0) {
                continue_addr = regs->ea;
                restart_addr = continue_addr - 4;
                retval = regs->r2;
                
                /*
                 * Prepare for system call restart. We do this here so that a
                 * debugger will see the already changed PC.
                 */
                switch (retval) {
                case ERESTART_RESTARTBLOCK:
                        restart = -2;
                case ERESTARTNOHAND:
                case ERESTARTSYS:
                case ERESTARTNOINTR:
                        if (regs->r7 == 0) break; // or is it regs->orig_r7 ?
                        restart++;
                        regs->r2 = regs->orig_r2;
                        regs->r7 = regs->orig_r7;
                        regs->ea = restart_addr;
                        break;
                }