Bug 1075 - ARM: Programs linked to shared library segfault
Summary: ARM: Programs linked to shared library segfault
Status: NEW
Alias: None
Product: uClibc
Classification: Unclassified
Component: Shared Library Support (show other bugs)
Version: 0.9.30.2
Hardware: PC Linux
: P5 major
Target Milestone: ---
Assignee: Bernhard Reutner-Fischer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-11 11:34 UTC by Fabrizio Gennari
Modified: 2015-12-02 19:38 UTC (History)
2 users (show)

See Also:
Host: ARM embedded device
Target:
Build: i386 PC with Ubuntu 9.10


Attachments
.config file used (5.87 KB, patch)
2010-03-13 07:50 UTC, Fabrizio Gennari
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fabrizio Gennari 2010-02-11 11:34:57 UTC
On the host, I untar, configure, make and install binutils 2.20 for an ARM target.

On the host, I untar Linux kernel 2.6.32 tarball and install the sanitised headers.

On the host, I untar the uClibc 0.9.30.2 tarball and type make.
The configuration is:
target architecture: arm
target architecture features: oabi, generic arm, LITTLE endian, Linux kernel header location is set to where the sanitised headers were installed in previous step
uClibc development/debugging options: Cross-compiling toolchain prefix arm-linux-

Then I type
sudo make install_headers
(headers are needed because otherwise gcc wouldn't compile without --disable-threads)

On the host, I untar and configure gcc 4.4.3 for an ARM target. Configure options are
--target=arm-linux --enable-languages=c,c++ --nfp --enable-__cxa_atexit --disable-shared
(--disable-shared is needed because otherwise gcc wouldn't compile without crti.o)

Then I type
make all-target-libgcc
(to compile gcc itself and libgcc.a)
sudo make install-gcc
sudo make install-target-libgcc

Now I go back to the uClibc folder and type
make
sudo PATH=$PATH make install

I compile a helloworld test program, test.c

#include <stdio.h>

int main()
{
 printf("Whither Canada?\n");
 return 0;
}

twice, first with
arm-linux-gcc test.c -o testarm
then with
arm-linux-gcc test.c -static -o testarmstatic

I copy testarm and testarmstatic to the device
I copy libc.so.0 and ld-uClibc.so.0 from uClibc-0.9.30.2\lib to the device (making sure the symbolic links are dereferenced, so the actual libraries and not the links are copied)

Then I boot the device (which has already its own Linux kernel and a Busybox shell accessible from serial, and glibc installed as libc.so.6)

From the device's shell:
# ./testarmstatic
Whither Canada?
# ./testarm
Segmentation fault

The crash happens even before main() is executed. The device has gdb running on it. Here is a gdb session.
# ./gdb testarm
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/sdcard/testarm...(no debugging symbols found)...done.
(gdb) brea __uClibc_main
Breakpoint 1 at 0x8338
(gdb) brea __uClibc_init
Function "__uClibc_init" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 5 (__uClibc_init) pending.
(gdb) brea _dl_get_ready_to_run
Function "_dl_get_ready_to_run" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 6 (_dl_get_ready_to_run) pending.
(gdb) brea _dl_app_init_array
Function "_dl_app_init_array" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 7 (_dl_app_init_array) pending.
(gdb) brea _dl_run_init_array
Function "_dl_run_init_array" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y

Breakpoint 8 (_dl_run_init_array) pending.
(gdb) r
Starting program: /mnt/sdcard/testarm

Breakpoint 5, 0x4004c708 in __uClibc_init () from /lib/libc.so.0
(gdb) c
Continuing.

Breakpoint 1, 0x4004c77c in __uClibc_main () from /lib/libc.so.0
(gdb)
Continuing.

Breakpoint 5, 0x4004c708 in __uClibc_init () from /lib/libc.so.0
(gdb)
Continuing.

Breakpoint 7, 0x4005ea44 in _dl_app_init_array () from /lib/ld-uClibc.so.0
(gdb)
Continuing.

Breakpoint 8, 0x4005ea24 in _dl_run_init_array () from /lib/ld-uClibc.so.0
(gdb) disass
Dump of assembler code for function _dl_run_init_array:
0x4005ea24 <_dl_run_init_array+0>:      push    {r11, lr}
0x4005ea28 <_dl_run_init_array+4>:      mov     r3, r0
0x4005ea2c <_dl_run_init_array+8>:      ldr     r2, [r0]
0x4005ea30 <_dl_run_init_array+12>:     ldr     r1, [r3, #172]  ; 0xac
0x4005ea34 <_dl_run_init_array+16>:     ldr     r0, [r0, #164]  ; 0xa4
0x4005ea38 <_dl_run_init_array+20>:     add     r11, sp, #4
0x4005ea3c <_dl_run_init_array+24>:     pop     {r11, lr}
0x4005ea40 <_dl_run_init_array+28>:     b       0x4005e9ec
End of assembler dump.
(gdb) print $r0
$1 = 0
(gdb) print $r3
$3 = 1074180008
(gdb) nexti
0x4005ea28 in _dl_run_init_array () from /lib/ld-uClibc.so.0
(gdb) 

0x4005ea2c in _dl_run_init_array () from /lib/ld-uClibc.so.0
(gdb) print $r3
$4 = 0
(gdb) 

(gdb) nexti

Program received signal SIGSEGV, Segmentation fault.
0x4005ea2c in _dl_run_init_array () from /lib/ld-uClibc.so.0

(gdb) inf br
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0x4004c77c <__uClibc_main>
        breakpoint already hit 1 time
5       breakpoint     keep y   0x4004c708 <__uClibc_init>
        breakpoint already hit 2 times
6       breakpoint     keep y   0x40060fb0 <_dl_get_ready_to_run>
7       breakpoint     keep y   0x4005ea44 <_dl_app_init_array>
        breakpoint already hit 1 time
8       breakpoint     keep y   0x4005ea24 <_dl_run_init_array>
        breakpoint already hit 1 time

Looks like a null pointer dereference in _dl_run_init_array. _dl_get_ready_to_run is apparently never called.
Comment 1 Khem Raj 2010-02-11 21:03:24 UTC
To clarify. The device when booted it using glibc dynamic linker and boots into busybox using that.  You copy the uclibc binary and the required shared libraries
to device and then you execute the binary and it segfaults.

If thats the case is it possible for you to boot into uclibc based root file system and see if this happens there too. 
Comment 2 Bernhard Reutner-Fischer 2010-03-08 18:36:23 UTC
Sounds much like a duplicate of bug #1033. Do you have DOPIC set in your .config? Please attach it here.

thanks,
Comment 3 Fabrizio Gennari 2010-03-13 07:50:14 UTC
Created attachment 1219 [details]
.config file used

Bernhard: the .config file contains DOPIC=y. Please find it attached.

Raj: I created a new ramdisk with no glibc at all, only uClibc. The dynamically linked programs fail to start. If init is dynamically linked the kernel panics. If init is statically linked and does execve on a dynamically linked program, the call fails with "No such file or directory"
Comment 4 Khem Raj 2010-03-16 20:24:28 UTC
you are compiling uclibc with oabi. Do you have oabi support in kernel ?
Comment 5 Fabrizio Gennari 2010-04-12 19:45:59 UTC
I compiled uClibc with EABI and OABI, and the program linked to it crashes at the same point in both cases
Comment 6 Bernhard Reutner-Fischer 2010-06-22 20:25:46 UTC
(In reply to comment #5)
> I compiled uClibc with EABI and OABI, and the program linked to it crashes at
> the same point in both cases

please try current master, we can try to debug it there.
Comment 7 Fabrizio Gennari 2010-12-09 09:48:12 UTC
Pulled from git master today, same problem
Comment 8 Fabrizio Gennari 2011-02-03 09:55:49 UTC
Apparently not many people are experiencing this, so, in perfect open-source fashion, I'm on my own.

Yet, my knowledge of uClibc is not deep enough to thoroughly analyse the problem. I did some investigation, though.

The crash occurs when _dl_app_init_array() calls _dl_run_init_array() passing _dl_loaded_modules as argument. That is in ldso/ldso/dl-array.c, and just dereferences the tpnt pointer pased as argument and calls _dl_run_array_forward. Most probably the compiler inlines _dl_run_array_forward. 

The crash seems due to the fact that _dl_loaded_modules is NULL, and _dl_run_init_array tries to dereference it.

For what I could see, in order for _dl_loaded_modules to be initialised, _dl_get_ready_to_run must be called. And _dl_get_ready_to_run is never called.

Only, my knowledge of uClibc is too limited to go further, so I'd like to have help from experts about those 2 questions:
1. is the above correct?
2. when is _dl_get_ready_to_run supposed to be called?

Thank you in advance
Comment 9 Fabrizio Gennari 2011-03-07 17:07:07 UTC
A little more investigation.

fge:~/crosscompiling/uClibc/lib$ arm-linux-objdump -f libuClibc-0.9.32-rc2-git.so 

libuClibc-0.9.32-rc2-git.so:     file format elf32-littlearm
architecture: arm, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x000078e0

fge@WX800170:~/crosscompiling/uClibc/lib$ arm-linux-objdump -h libuClibc-0.9.32-rc2-git.so 

libuClibc-0.9.32-rc2-git.so:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .hash         000016ec  00000114  00000114  00000114  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .dynsym       00003b00  00001800  00001800  00001800  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynstr       00002096  00005300  00005300  00005300  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .rel.dyn      00000300  00007398  00007398  00007398  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .rel.plt      000000e0  00007698  00007698  00007698  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .plt          00000164  00007778  00007778  00007778  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 .text         000256a0  000078e0  000078e0  000078e0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  7 .rodata       000029fd  0002cf80  0002cf80  0002cf80  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .interp       00000030  0002f980  0002f980  0002f980  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .data.rel.ro  000000ec  00037e6c  00037e6c  0002fe6c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 10 .dynamic      000000a8  00037f58  00037f58  0002ff58  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 11 .got          000000f4  00038000  00038000  00030000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 12 .data         00000198  000380f4  000380f4  000300f4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 13 .bss          00003d90  0003828c  0003828c  0003028c  2**2
                  ALLOC
 14 .ARM.attributes 00000010  00000000  00000000  0003028c  2**0
                  CONTENTS, READONLY
 15 .gnu.warning.gets 0000003c  00000000  00000000  0003029c  2**2
                  CONTENTS, READONLY
 16 .gnu.warning.tmpnam 00000038  00000000  00000000  000302d8  2**2
                  CONTENTS, READONLY
 17 .gnu.warning.gethostbyaddr_r 0000003c  00000000  00000000  00030310  2**2
                  CONTENTS, READONLY
 18 .gnu.warning.gethostbyname_r 0000003c  00000000  00000000  0003034c  2**2
                  CONTENTS, READONLY
 19 .gnu.warning.gethostbyaddr 0000003c  00000000  00000000  00030388  2**2
                  CONTENTS, READONLY
 20 .gnu.warning.gethostbyname 0000003c  00000000  00000000  000303c4  2**2
                  CONTENTS, READONLY
 21 .gnu.warning.siggetmask 0000003c  00000000  00000000  00030400  2**2
                  CONTENTS, READONLY
 22 .comment      00000011  00000000  00000000  0003043c  2**0
                  CONTENTS, READONLY

Summary: the start address is just the start of .text section. This may or may not be right, but feels wrong...

fge:~/crosscompiling/uClibc/lib$ arm-linux-objdump -T libuClibc-0.9.32-rc2-git.so |grep 78e0
000078e0 l    d  .text	00000000 .text
000078e0 g    DF .text	00000048 brk

It is as if the startup function is brk(). Maybe DL_START() should be called instead?