Created attachment 313 [details] stress.c Asterisk segfaults fairly often and it seems like very often its related to pthread_mutex_lock(). I found a stress test in the http://posixtest.sourceforge.net/ test suite that seems to be able to trigger some bugs in the linuxthreads.old. It returns: Test stress.c FAILED: Got a non-zero value - two threads owns the mutex
Created attachment 315 [details] testfrmw.c
Created attachment 317 [details] testfrmw.h
Created attachment 319 [details] posixtest.h
compile with: gcc -o stress stress.c -lpthread run the test in background, './stress &' and send -USR1 to the processes 'killall -USR1 stress'
One problem is (for PTHREAD_MUTEX_ERRORCHECK_NP and PTHREAD_MUTEX_TIMED_NP) types that __pthread_alt_lock() uses wait_node on stack, but does not protect against signals. It is possible for the wait_node on stack to be left on the wake up queue, but the pthread_alt_lock to be unwinded from stack causing bad things. I suspect same can happen with __pthread_lock(). However, __pthread_lock() does not link stack allocated items on the wait list, so it's harder to debug. But I suspect similar race condition exists there. manpage says pthread_mutex_lock is not a cancellation point. Shouldn't we just block signals there?
With NPTL this stress test passes. I think we can close this bug. Thanks!
Works with NPTL