Bug 2089 - errno not threadsafe
Summary: errno not threadsafe
Status: RESOLVED WONTFIX
Alias: None
Product: uClibc
Classification: Unclassified
Component: Threads (show other bugs)
Version: 0.9.31
Hardware: PC Linux
: P2 critical
Target Milestone: 0.9.32
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-21 08:46 UTC by Peter Korsgaard
Modified: 2011-05-10 19:15 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:


Attachments
test program triggering the bug (1.10 KB, text/plain)
2010-06-21 08:46 UTC, Peter Korsgaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Korsgaard 2010-06-21 08:46:55 UTC
Created attachment 2071 [details]
test program triggering the bug

errno access doesn't seem to be threadsafe using linuxthreads / linuxthreads.old. There has been several reports of it on the list, E.G.:

http://lists.uclibc.org/pipermail/uclibc/2010-May/044049.html

It's easy to reproduce, simply compile and run the attached test program. I'm expecting to see no output (which is what I get if I run it under glibc), but get lots of errno=0 lines instead of the expected EBADF / ENOSPC. If I apply the patch in the mail above, I no longer see any errno=0 lines, but the threads sometimes gets mixed up (E.G. errno=EBADF instead of ENOSPC and the reverse).

Tested on ARM9 (EABI) and x86-64 using linuxthreads and linuxthreads.old with the default buildroot config (http://git.buildroot.net/buildroot/tree/toolchain/uClibc/uClibc-0.9.31.config).

I also see this on 0.9.30.3, so it isn't a new issue.
Comment 1 Khem Raj 2010-06-21 17:48:47 UTC
yes there seems to be generic problem with errno handing. Lets see what we can do.
Comment 2 Khem Raj 2010-06-24 03:51:58 UTC
I testing it on nptl. So far works fine on arm. 
However on mips it gives similar errors.

1035654432: (25/25) got 1035654564 instead of 9
1572525344: (20/20) got 1572525476 instead of 28

I will see whats wrong with mips/nptl
Comment 3 Carmelo Amoroso 2010-06-27 21:20:57 UTC
(In reply to comment #2)
> 
> I will see whats wrong with mips/nptl

and I will see how it goes on sh/nptl
Comment 4 Khem Raj 2010-06-30 11:16:58 UTC
I have fixed it for mips/nptl with commit 8b48f745be1e086d6e486bbb8167e770f3d1fbc5
Comment 5 Ronald Wahl 2010-07-02 09:52:13 UTC
For this problem I found a fix here:

http://github.com/mat-c/uClibc/commits/master_fixes

Look for "don't hide __errno_location and __h_errno_location".

I'm not sure if the fix is correct but actually it works for 0.9.31 on ARM.
Comment 6 Peter Korsgaard 2010-07-05 12:01:21 UTC
(In reply to comment #5)
> For this problem I found a fix here:
> 
> http://github.com/mat-c/uClibc/commits/master_fixes
> 
> Look for "don't hide __errno_location and __h_errno_location".
> 
> I'm not sure if the fix is correct but actually it works for 0.9.31 on ARM.

It works as well here, thanks.

It does seem correct to me, as we WANT __errno_location / __h_errno_location to be weak and overridable, so the linuxthread version is used instead when using -pthread.
Comment 7 Khem Raj 2010-07-05 21:02:36 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > For this problem I found a fix here:
> > 
> > http://github.com/mat-c/uClibc/commits/master_fixes
> > 
> > Look for "don't hide __errno_location and __h_errno_location".
> > 
> > I'm not sure if the fix is correct but actually it works for 0.9.31 on ARM.
> 
> It works as well here, thanks.
> 
> It does seem correct to me, as we WANT __errno_location / __h_errno_location to
> be weak and overridable, so the linuxthread version is used instead when using
> -pthread.

the problem is that we should not genetate __GI__* symbols when generating non PIC
.a file currently there are refrences to these __GI symbols in libc so when you link
statically then the references in libc get resolved to libc's __GI_ errno symbols

unhiding these symbols as patch does may not be the right solution. think of shared
objects it will have undefined behavior when both libc and libpthread are loaded.
it will depend on load order which one is chosen

I think there is need to clean up the build for pic and non pic build of libc.a
look into this area
Comment 8 Peter Korsgaard 2010-07-05 21:57:44 UTC
(In reply to comment #7)

> the problem is that we should not genetate __GI__* symbols when generating non
> PIC.a file currently there are refrences to these __GI symbols in libc so when
> you link statically then the references in libc get resolved to libc's __GI_ 
> errno symbols

Hmm, isn't this exactly like how we do it for malloc/free? I don't see any references to __GI_malloc or __GI__errno_location when I like statically?

> unhiding these symbols as patch does may not be the right solution. think of
> shared objects it will have undefined behavior when both libc and libpthread
> are loaded.
> it will depend on load order which one is chosen

Also when the libc version is marked as weak? I haven't seen that issue. Libc gets added very early to the linker, so the libpthread version seems to always override it.

> I think there is need to clean up the build for pic and non pic build of libc.a
> look into this area

So what exactly do you suggest?
Comment 9 Khem Raj 2010-07-05 23:38:44 UTC
(In reply to comment #8)
> (In reply to comment #7)
> 
> > the problem is that we should not genetate __GI__* symbols when generating non
> > PIC.a file currently there are refrences to these __GI symbols in libc so when
> > you link statically then the references in libc get resolved to libc's __GI_ 
> > errno symbols
> 
> Hmm, isn't this exactly like how we do it for malloc/free? 

yes but lpthread and lc are also controlled by gcc compiler driver and
gcc -pthread will break. Essentially it should not depend on the link order
when doing shared linking thats why weak symbols.

I don't see any
> references to __GI_malloc or __GI__errno_location when I like statically?
> 

yeah because they are not declared with hidden_proto and hidden_def

> > unhiding these symbols as patch does may not be the right solution. think of
> > shared objects it will have undefined behavior when both libc and libpthread
> > are loaded.
> > it will depend on load order which one is chosen
> 
> Also when the libc version is marked as weak? I haven't seen that issue. Libc
> gets added very early to the linker, so the libpthread version seems to always
> override it.
> 
> > I think there is need to clean up the build for pic and non pic build of libc.a
> > look into this area
> 
> So what exactly do you suggest?

We need to say that when compiling static libc.a then dont use this hidden_proto and hidden_def because it does not matter in that case.

but when compiling PIC version then we use it.
Comment 10 Peter Korsgaard 2010-07-06 05:37:35 UTC
(In reply to comment #9)
> > So what exactly do you suggest?
> 
> We need to say that when compiling static libc.a then dont use this
> hidden_proto and hidden_def because it does not matter in that case.
> 
> but when compiling PIC version then we use it.

But that means we're back to square one, right? Today, with a dynamic libc and -pthread we end up using the __errno_location in libc, not in libpthread.
Comment 11 Khem Raj 2010-07-06 17:21:45 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > > So what exactly do you suggest?
> > 
> > We need to say that when compiling static libc.a then dont use this
> > hidden_proto and hidden_def because it does not matter in that case.
> > 
> > but when compiling PIC version then we use it.
> 
> But that means we're back to square one, right? Today, with a dynamic libc and
> -pthread we end up using the __errno_location in libc, not in libpthread.

the libc one's will be weak symbols and libpthread ones will be strong so when
linking libpthread.so will be preferred.
Comment 12 Peter Korsgaard 2010-07-06 17:27:52 UTC
(In reply to comment #11)

> > But that means we're back to square one, right? Today, with a dynamic libc and
> > -pthread we end up using the __errno_location in libc, not in libpthread.
> 
> the libc one's will be weak symbols and libpthread ones will be strong so when
> linking libpthread.so will be preferred.

Yes, but that's not what happens as it is (libc version weak + hidden) - The libc version is used within libc because of the hidden.

With my patch to remove the hidden it works.
Comment 13 Daniel Ng 2011-04-12 06:02:36 UTC
Without this patch, my system would be broken. I am using uClibc-0.9.31 with a PowerPC-based arch (Linux 2.6.30).

Without this patch, I would get errno==0 when read() returned -1. The code doesn't handle such a case (and it shouldn't need to!) and so results in the system locking up.

I've upped the Priority to P2 Critical.

The patch makes the system behave correctly.
Comment 14 Vasily Khoruzhick 2011-04-16 21:19:37 UTC
This bug affects mpd (it can't populate database), and I suspect that it affects libao+alsalib (it just segfaults when using libao with alsa as output). Tested with uclibc-0.9.31 and uclibc-0.9.32-rc2 via buildroot-2011.02.

So, mpd and libao are unusable in mentioned buildroot version.
Comment 15 Vasily Khoruzhick 2011-04-17 06:34:29 UTC
(In reply to comment #14)
> This bug affects mpd (it can't populate database), and I suspect that it
> affects libao+alsalib (it just segfaults when using libao with alsa as output).
> Tested with uclibc-0.9.31 and uclibc-0.9.32-rc2 via buildroot-2011.02.
> 
> So, mpd and libao are unusable in mentioned buildroot version.

Just tested, with mentioned patch, mpd and libao+alsa problems just disappear.
Comment 16 Steve Stetak 2011-04-25 13:15:10 UTC
This bug was also causing stunnel to randomly drop SSL connections in the middle of the SSL handshake with the message "SSL_accept: Peer suddenly disconnected"

The patch above seems to have fixed the issue.
Comment 17 Bernhard Reutner-Fischer 2011-05-10 19:15:45 UTC
This works with NPTL.