Bug 3493

Summary: Busybox randomly crashes on ARM920T platform
Product: Busybox Reporter: Ivan Castell Rovira <al004140>
Component: OtherAssignee: unassigned
Status: RESOLVED INVALID    
Severity: major CC: al004140, busybox-cvs
Priority: P5    
Version: 1.17.x   
Target Milestone: ---   
Hardware: Other   
OS: Linux   
Host: i686-pc-linux-gnu Target: arm-unknown-linux-gnueabi-
Build:

Description Ivan Castell Rovira 2011-03-17 10:59:56 UTC
Hi everybody!

I don't know where is exactly the problem, so after searching on google for a whole week without any success, I thought publishing that issue in busybox bugtrack. This issue is related with that pieces of software:

	- busybox-1.17.3
	- dynamic linker ld-2.6.1.so
	- ARM platform ARM920T
	- cross-compiler arm-linux-gcc version 4.2.2
	- linux kernel 2.6.30.4

This is the test it is crashing. It crashes on the line executing piped ls, grep and cut busybox applets.

	$ cat test_crash.sh 
	#! /bin/sh

	ulimit -c unlimited

	export LD_DEBUG=reloc
	export LD_DEBUG_OUTPUT=/tmp/ld_logs.out
	
	while [ 1 ]; do
		ls -l /bin/ 2>> /tmp/err | grep pa 2>> /tmp/err | cut -f 1 -d " " 2> /tmp/err # LINE OF CRASH
			
		if [ -f /home/root/core ]; then
			echo core generated!
			break;
		fi
	
		rm /tmp/ld_logs.out*
	done

The test executes fine for hundreds of times (sometimes more than half an hour). But randomly it crashes, generating a coredump file. 

Analisis with gdb of the core dumped (I have cross-compiled gdb to run it on the target platform)

	$ gdb /bin/busybox --core ./core --quiet

	warning: core file may not match specified executable file.
	Core was generated by `grep pa USER=root HOME=/home/root OLDPWD=/home/root PS1=\u@\h:\w\$  ENV=/home/r'.
	Program terminated with signal 11, Segmentation fault.
	[New process 21169]
	#0  0x4015d63c in _start () from /lib/ld-linux.so.3

	(gdb) bt
	#0  0x4015d63c in _start () from /lib/ld-linux.so.3
	#1  0x000838cc in tryexec (cmd=Cannot access memory at address 0xbefe7f14) at shell/ash.c:7279
	Cannot access memory at address 0xbefe7f2c

	(gdb) info registers
	r0             0xfffffff2	4294967282
	r1             0x1733cc	1520588
	r2             0x1733f4	1520628
	r3             0x1	1
	r4             0x172008	1515528
	r5             0x0	0
	r6             0x0	0
	r7             0xb	11
	r8             0x0	0
	r9             0x0	0
	r10            0x40024000	1073889280
	r11            0x0	0
	r12            0x4015d630	1075172912
	sp             0xbefe7f08	0xbefe7f08
	lr             0x838cc	538828
	pc             0x4015d63c	0x4015d63c <_start>
	fps            0x0	0
	cpsr           0x60000010	1610612752

Dynamic linker is pointing to 2.6.1

	$ ls -l /lib/ld-linux.so.3
	lrwxrwxrwx    1 root     root            11 Jan 17  2011 /lib/ld-linux.so.3 -> ld-2.6.1.so

I modified busybox source including some simple printf lines to see variable contents at the time of the crash.

 7244 static void
 7245 tryexec(IF_FEATURE_SH_STANDALONE(int applet_no,) char *cmd, char **argv, char **envp)
 7246 {
 7247     int repeated = 0;
 7248 
 7249 #if ENABLE_FEATURE_SH_STANDALONE
 7250     if (applet_no >= 0) {
 7251         if (APPLET_IS_NOEXEC(applet_no)) {
 7252             clearenv();
 7253             while (*envp)
 7254                 putenv(*envp++);
 7255             run_applet_no_and_exit(applet_no, argv);
 7256         }
 7257         /* re-exec ourselves with the new arguments */
 7258         execve(bb_busybox_exec_path, argv, envp);
 7259         /* If they called chroot or otherwise made the binary no longer
 7260          * executable, fall through */
 7261     }
 7262 #endif
 7263 
 7264  repeat:
 7265 #ifdef SYSV
 7266     do {
 7267 // if (cmd != NULL) printf("cmd = <>%s<>\n", cmd); else printf("cmd is NULL!");
 7268 
 7269         execve(cmd, argv, envp);
 7270     } while (errno == EINTR);
 7271 #else
 7272 if (cmd != NULL) printf("cmd = <>%s<>\n", cmd); else printf("cmd is NULL!");
 7273 printf("cmd=%p\n", cmd);
 7274 printf("argv=%p\n", argv);
 7275 printf("envp=%p\n", envp);
 7276 printf("tryexec=%p\n", tryexec);
 7277 printf("execve=%p\n", execve);
 7278 
 7279     execve(cmd, argv, envp);
 7280 #endif


Just before crashing, my test script printed this addresses on stdout:

	cmdname = <>/usr/bin/cut<>
	cmd = <>/usr/bin/cut<>
	cmd=0x17357c
	argv=0x1733fc
	envp=0x17342c
	tryexec=0x83840
	execve=0xcca8
	Segmentation fault (core dumped)
	core generated!

As you can see, cmd is pointing to "/usr/bin/cut", so I don't understand why is gdb showing me this error:

"cmd=Cannot access memory at address 0xbefe7f14) at shell/ash.c:7279"

The dissasembled busybox binary referencing the address of crash (0x000838cc)

	$ arm-linux-objdump -d busybox

	00083840 <tryexec>:
	   83840:   e52de004    push    {lr}        ; (str lr, [sp, #-4]!) 
	   83844:   e24dd024    sub sp, sp, #36 ; 0x24 
	   83848:   e58d000c    str r0, [sp, #12]
	   8384c:   e58d1008    str r1, [sp, #8]
	   83850:   e58d2004    str r2, [sp, #4]
	   83854:   e3a03000    mov r3, #0  ; 0x0  
	   83858:   e58d3014    str r3, [sp, #20]
	   8385c:   e59d300c    ldr r3, [sp, #12]
	   83860:   e3530000    cmp r3, #0  ; 0x0  
	   83864:   0a000003    beq 83878 <tryexec+0x38>
	   83868:   e59f0188    ldr r0, [pc, #392]  ; 839f8 <tryexec+0x1b8>
	   8386c:   e59d100c    ldr r1, [sp, #12]
	   83870:   ebfe2485    bl  ca8c <_init+0x9b0>
	   83874:   ea000001    b   83880 <tryexec+0x40>
	   83878:   e59f017c    ldr r0, [pc, #380]  ; 839fc <tryexec+0x1bc>
	   8387c:   ebfe2482    bl  ca8c <_init+0x9b0>
	   83880:   e59f0178    ldr r0, [pc, #376]  ; 83a00 <tryexec+0x1c0>
	   83884:   e59d100c    ldr r1, [sp, #12]
	   83888:   ebfe247f    bl  ca8c <_init+0x9b0>
	   8388c:   e59f0170    ldr r0, [pc, #368]  ; 83a04 <tryexec+0x1c4>
	   83890:   e59d1008    ldr r1, [sp, #8]
	   83894:   ebfe247c    bl  ca8c <_init+0x9b0>
	   83898:   e59f0168    ldr r0, [pc, #360]  ; 83a08 <tryexec+0x1c8>
	   8389c:   e59d1004    ldr r1, [sp, #4]
	   838a0:   ebfe2479    bl  ca8c <_init+0x9b0>
	   838a4:   e59f0160    ldr r0, [pc, #352]  ; 83a0c <tryexec+0x1cc>
	   838a8:   e59f1160    ldr r1, [pc, #352]  ; 83a10 <tryexec+0x1d0>
	   838ac:   ebfe2476    bl  ca8c <_init+0x9b0>
	   838b0:   e59f015c    ldr r0, [pc, #348]  ; 83a14 <tryexec+0x1d4>
	   838b4:   e59f115c    ldr r1, [pc, #348]  ; 83a18 <tryexec+0x1d8>
	   838b8:   ebfe2473    bl  ca8c <_init+0x9b0>
	   838bc:   e59d000c    ldr r0, [sp, #12]
	   838c0:   e59d1008    ldr r1, [sp, #8]
	   838c4:   e59d2004    ldr r2, [sp, #4]
	   838c8:   ebfe24f6    bl  cca8 <_init+0xbcc>
	   838cc:   e59d3014    ldr r3, [sp, #20]
	   838d0:   e3530000    cmp r3, #0  ; 0x0  
	   838d4:   0a000002    beq 838e4 <tryexec+0xa4>

Also, the test gathers some information related with the dynamic linker:

	$ ls -l /tmp/
	-rw-r--r--    1 root     root             0 Aug 20 01:49 err
	-rw-r--r--    1 root     root           724 Aug 20 01:49 ld_logs.out.21167
	-rw-r--r--    1 root     root           523 Aug 20 01:49 ld_logs.out.21168
	-rw-r--r--    1 root     root           728 Aug 20 01:49 ld_logs.out.21170

And these are the contents of that files:

	$ cat /tmp/ld_logs.out.211*
	     21167:	
	     21167:	relocation processing: /lib/libc.so.6 (lazy)
	     21167:	
	     21167:	relocation processing: /lib/libm.so.6 (lazy)
	     21167:	
	     21167:	relocation processing: rm (lazy)
	     21167:	
	     21167:	relocation processing: /lib/ld-linux.so.3
	     21167:	
	     21167:	calling init: /lib/libc.so.6
	     21167:	
	     21167:	
	     21167:	calling init: /lib/libm.so.6
	     21167:	
	     21167:	
	     21167:	initialize program: rm
	     21167:	
	     21167:	
	     21167:	transferring control: rm
	     21167:	
	     21167:	
	     21167:	calling fini: rm [0]
	     21167:	
	     21167:	
	     21167:	calling fini: /lib/libm.so.6 [0]
	     21167:	
	     21167:	
	     21167:	calling fini: /lib/libc.so.6 [0]
	     21167:	
	     21168:	
	     21168:	relocation processing: /lib/libc.so.6 (lazy)
	     21168:	
	     21168:	relocation processing: /lib/libm.so.6 (lazy)
	     21168:	
	     21168:	relocation processing: ls (lazy)
	     21168:	
	     21168:	relocation processing: /lib/ld-linux.so.3
	     21168:	
	     21168:	calling init: /lib/libc.so.6
	     21168:	
	     21168:	
	     21168:	calling init: /lib/libm.so.6
	     21168:	
	     21168:	
	     21168:	initialize program: ls
	     21168:	
	     21168:	
	     21168:	transferring control: ls
	     21168:	
	     21170:	
	     21170:	relocation processing: /lib/libc.so.6 (lazy)
	     21170:	
	     21170:	relocation processing: /lib/libm.so.6 (lazy)
	     21170:	
	     21170:	relocation processing: cut (lazy)
	     21170:	
	     21170:	relocation processing: /lib/ld-linux.so.3
	     21170:	
	     21170:	calling init: /lib/libc.so.6
	     21170:	
	     21170:	
	     21170:	calling init: /lib/libm.so.6
	     21170:	
	     21170:	
	     21170:	initialize program: cut
	     21170:	
	     21170:	
	     21170:	transferring control: cut
	     21170:	
	     21170:	
	     21170:	calling fini: cut [0]
	     21170:	
	     21170:	
	     21170:	calling fini: /lib/libm.so.6 [0]
	     21170:	
	     21170:	
	     21170:	calling fini: /lib/libc.so.6 [0]
	     21170:	

I would really appreciate whatever advice to continue tracing that issue by myself. I really need to know why is this problem happening, but have no idea how can I continue tracing that issue. If you can help me, that would be very nice. Thanks a lot in advance!! 

Best regards,
  -- Ivan
Comment 1 Denys Vlasenko 2011-09-11 19:14:12 UTC
Not enough information to work on this report.