[Bug nptl/2644] New: Race condition during unwind code after thread cancellation
I think there is a race condition in the code in
nptl/sysdeps/pthread/unwind-forcedunwind.c which can lead to a segfault. I
found this in a redhat build, but it exists in CVS glibc too (as of May 6 2006).
Consider a call to _Unwind_ForcedUnwind, when libgcc_s_forcedunwind has not been
loaded. _Unwind_ForcedUnwind checks the value against null, and jumps to
pthread_cancel_init. Meanwhile another thread comes in and initialises all
these pointers, so the first check in pthread_cancel_init shows that
libgcc_s_getcfa is non-null, so we return to _Unwind_ForcedUnwind and execute
libgcc_s_forcedunwind. As the function pointer libgcc_s_forcedunwind has not
been marked volatile, the compiler does not need to reload this value, and
attempts to call the address it previously loaded, ie. 0.
I have a test case which shows the problem and patch which I believe fixes it.
Summary: Race condition during unwind code after thread
AssignedTo: drepper at redhat dot com
ReportedBy: batneil at thebatcave dot org dot uk
CC: glibc-bugs at sources dot redhat dot com
GCC build triplet: i686-redhat-linux
GCC host triplet: i686-redhat-linux
GCC target triplet: i686-redhat-linux
------- Additional Comments From batneil at thebatcave dot org dot uk 2006-05-07 14:14 -------
To clarify how the problem manifests, here's a snippet of the compiled
_Unwind_ForcedUnwind code from the unpatched libc:
If the test at 941380 shows that edx (which is libgcc_s_forcedunwind) contains
0, and the test at 9413ad shows that eax (libgcc_s_getcfa) is non-zero, we jump
back to 941384 and try to call edx without having changed it from 0.
After patching and rebuilding on my system the following code results:
------- Additional Comments From batneil at thebatcave dot org dot uk 2006-05-07 18:37 -------
Thanks for the quick response. Your solution does sound better, I'm just
rebuilding to check that it works as expected.
The compiler I'm using is GCC 3.4.3; to be precise, it's version 3.4.3 20041212
(Red Hat 3.4.3-9.EL4). If it's of any use, the first piece of code I quoted
matches that from the libpthread.so.0 that comes from the glibc-2.3.4-2 package
on Red Hat EL 4.
I'm building with rpmbuild at the moment, but from what I can tell the CFLAGS
are set to '-march=i686 -DNDEBUG=1 -g -O3'. Please let me know if you need more
details on the configuration.
It seems to me that although not all compilers necessarily will plant code that
is unsafe, it is at least valid for them to do so from the current source.
As discussed, this patch forces the function pointers to be reloaded when
required, without needing them all to marked as volatile. I've used the '+'
modifier in the asm, when I used '=' gcc decided to dead-code one of the stores
and everything broke.
------- Additional Comments From jakub at redhat dot com 2006-05-08 11:27 -------
I think instead of the patch you checked in we should just mark
pthread_cancel_init with __attribute__((noinline)). That will do everything
that's needed and is desirable anyway.