[Bug dynamic-link/19329] New: dl-tls.c assert failure at concurrent pthread_create and dlopen

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] New: dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

            Bug ID: 19329
           Summary: dl-tls.c assert failure at concurrent pthread_create
                    and dlopen
           Product: glibc
           Version: 2.22
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: nszabolcs at gmail dot com
  Target Milestone: ---

(this is a continuation of bug 17918, but it turns out to be a different
issue that was originally reported there.)

failure:

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init:
Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed!

caused by dlopen (in _dl_add_to_slotinfo and in dl_open_worker) doing

  listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
  //...
  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))

while pthread_create (in _dl_allocate_tls_init) concurrently doing

  assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));

so

T1:
  y = x + 1;
  ++x;

T2:
  assert(y <= x);

this is hard to trigger as the race window is short compared to the time
dlopen and pthread_create takes, however if i add a usleep(1000) between
the two operations in T1, it is triggered all the time.

the slotinfo and tls generation update lack any sort of synchronization or
atomics in _dl_allocate_tls_init (dlopen holds GL(dl_load_lock)).

on x86_64 with added usleep:

(gdb) p _rtld_local._dl_tls_dtv_slotinfo_list->slotinfo[0]@64
$11 = {{gen = 0, map = 0x7ffff7ff94e8}, {gen = 1, map = 0x7ffff7ff94e8}, {gen =
2, map = 0x7ffff0000910}, {gen = 0, map = 0x0} <repeats 61 times>}
(gdb) p _rtld_local._dl_tls_generation
$12 = 1

T1:
#0  0x00007ffff7df2097 in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff7df1f74 in usleep (useconds=<optimised out>) at
../sysdeps/posix/usleep.c:32
#2  0x00007ffff7decc6b in dl_open_worker (a=a@entry=0x7ffff7611c80) at
dl-open.c:527
#3  0x00007ffff7de8314 in _dl_catch_error
(objname=objname@entry=0x7ffff7611c70,
errstring=errstring@entry=0x7ffff7611c78,
mallocedp=mallocedp@entry=0x7ffff7611c6f,
    operate=operate@entry=0x7ffff7dec720 <dl_open_worker>,
args=args@entry=0x7ffff7611c80) at dl-error.c:187
#4  0x00007ffff7dec2a9 in _dl_open (file=0x7ffff7611ee0 "mod-0.so",
mode=-2147483646, caller_dlopen=0x4007e2 <start+34>, nsid=-2, argc=<optimised
out>,
    argv=<optimised out>, env=0x7fffffffe378) at dl-open.c:652
#5  0x00007ffff7bd5ee9 in dlopen_doit (a=a@entry=0x7ffff7611eb0) at dlopen.c:66
#6  0x00007ffff7de8314 in _dl_catch_error (objname=0x7ffff00008d0,
errstring=0x7ffff00008d8, mallocedp=0x7ffff00008c8, operate=0x7ffff7bd5e90
<dlopen_doit>,
    args=0x7ffff7611eb0) at dl-error.c:187
#7  0x00007ffff7bd6521 in _dlerror_run (operate=operate@entry=0x7ffff7bd5e90
<dlopen_doit>, args=args@entry=0x7ffff7611eb0) at dlerror.c:163
#8  0x00007ffff7bd5f82 in __dlopen (file=file@entry=0x7ffff7611ee0 "mod-0.so",
mode=mode@entry=2) at dlopen.c:87
#9  0x00000000004007e2 in start (a=<optimised out>) at a.c:19
#10 0x00007ffff79bf3d4 in start_thread (arg=0x7ffff7612700) at
pthread_create.c:333
#11 0x00007ffff76feedd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

T2:
#0  __GI___assert_fail (assertion=0x7ffff7df8840 "listp->slotinfo[cnt].gen <=
GL(dl_tls_generation)", file=0x7ffff7df68e6 "dl-tls.c", line=493,
    function=0x7ffff7df9020 <__PRETTY_FUNCTION__.9528> "_dl_allocate_tls_init")
at dl-minimal.c:220
#1  0x00007ffff7deb492 in __GI__dl_allocate_tls_init (result=0x7fffb7fff700) at
dl-tls.c:493
#2  0x00007ffff79bff67 in allocate_stack (stack=<synthetic pointer>,
pdp=<synthetic pointer>, attr=0x7fffffffdf90) at allocatestack.c:579
#3  __pthread_create_2_1 (newthread=newthread@entry=0x7fffffffe078,
attr=attr@entry=0x0, start_routine=start_routine@entry=0x4007c0 <start>,
arg=arg@entry=0xd)
    at pthread_create.c:526
#4  0x000000000040062a in main () at a.c:34


i think
  GL(dl_tls_generation)
  GL(dl_tls_dtv_slotinfo_list)
  listp->slotinfo[i].map
  listp->slotinfo[i].gen
  listp->next

may all be accessed concurrently by pthread_create and dlopen without
any synchronization.

this can also cause wrong maxgen computation into dtv[0].counter

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Chan Lee <chan45.lee at samsung dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chan45.lee at samsung dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Ilya Palachev <i.palachev at samsung dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |i.palachev at samsung dot com

--- Comment #1 from Ilya Palachev <i.palachev at samsung dot com> ---
Hi, I've suggested a patch for this bug:
https://sourceware.org/ml/libc-alpha/2015-12/msg00570.html

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

sh0924.hwang at samsung dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sh0924.hwang at samsung dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

dongkyun.s at samsung dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dongkyun.s at samsung dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

dongkyun.s at samsung dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dongkyun.s at samsung dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

dongkyun.s at samsung dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dongkyun.s at samsung dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

--- Comment #2 from Szabolcs Nagy <nszabolcs at gmail dot com> ---
Created attachment 8893
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8893&action=edit
test case (main module)

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

--- Comment #3 from Szabolcs Nagy <nszabolcs at gmail dot com> ---
Created attachment 8894
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8894&action=edit
test case (build script)

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

hk0110.choi at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hk0110.choi at gmail dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Szabolcs Nagy <nszabolcs at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |nszabolcs at gmail dot com

--- Comment #4 from Szabolcs Nagy <nszabolcs at gmail dot com> ---
assigned this to myself, will work on it for 2.24, the current latest patch is
https://sourceware.org/ml/libc-alpha/2016-01/msg00480.html

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Matt Avant <mavant at palantir dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mavant at palantir dot com

--- Comment #5 from Matt Avant <mavant at palantir dot com> ---
Is this patch still being reviewed? The last update I see is
https://sourceware.org/ml/libc-alpha/2016-01/msg00620.html, but I'm not
familiar with how issue tracking works for this project so I could easily have
missed something...

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Markus Trippelsdorf <markus at trippelsdorf dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |markus at trippelsdorf dot de

--- Comment #6 from Markus Trippelsdorf <markus at trippelsdorf dot de> ---
I sometimes see the same failure during make check:

env GCONV_PATH=/var/tmp/glibc-build/iconvdata
LOCPATH=/var/tmp/glibc-build/localedata LC_ALL=C  
/var/tmp/glibc-build/elf/ld-linux-x86-64.so.2 --library-path
/var/tmp/glibc-build:/var/tmp/glibc-build/math:/var/tmp/glibc-build/elf:/var/tmp/glibc-build/dlfcn:/var/tmp/glibc-build/nss:/var/tmp/glibc-build/nis:/var/tmp/glibc-build/rt:/var/tmp/glibc-build/resolv:/var/tmp/glibc-build/crypt:/var/tmp/glibc-build/mathvec:/var/tmp/glibc-build/support:/var/tmp/glibc-build/nptl
/var/tmp/glibc-build/nptl/tst-stack4  >
/var/tmp/glibc-build/nptl/tst-stack4.out; \                                    
../scripts/evaluate-test.sh nptl/tst-stack4 $? false false >
/var/tmp/glibc-build/nptl/tst-stack4.test-result                                
Inconsistency detected by ld.so: dl-tls.c: 488: _dl_allocate_tls_init:
Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

This is unfortunately not reproducible.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Pádraig Brady <P at draigBrady dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |P at draigBrady dot com

--- Comment #7 from Pádraig Brady <P at draigBrady dot com> ---
We were often hitting this issue with some multithreaded binaries with many
shared libs. These patches referenced here, address the issue. Specifically:
  https://patches.linaro.org/patch/85007/
  https://patches.linaro.org/patch/85008/

These have been _extensively_ tested here with glibc-2.23 with many binaries

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Pádraig Brady from comment #7)
> We were often hitting this issue with some multithreaded binaries with many
> shared libs. These patches referenced here, address the issue. Specifically:
>   https://patches.linaro.org/patch/85007/
>   https://patches.linaro.org/patch/85008/
>
> These have been _extensively_ tested here with glibc-2.23 with many binaries

Please repost those to libc-alpha so we can review.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

--- Comment #9 from Pádraig Brady <P at draigBrady dot com> ---
We found an off by one issue with this (with ASAN + certain number of shared
libs). When the last vector in the _dl_allocate_tls_init list of vectors was of
size one it would have been skipped. The fix is:

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
        }

       total += cnt;
-      if (total >= dtv_slots)
+      if (total > dtv_slots)
        break;

       /* Synchronize with dl_add_to_slotinfo.  */

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Mike Mezeul <mmezeul at advaoptical dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mmezeul at advaoptical dot com

--- Comment #10 from Mike Mezeul <mmezeul at advaoptical dot com> ---
Has there been any activity on this one lately? Does anyone know if a fix will
be coming anytime soon?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

--- Comment #11 from Pádraig Brady <P at draigBrady dot com> ---
This has been _very_ well tested at facebook
Note the additional fix in comment #9
It would be great to merge this. thanks!

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

--- Comment #12 from Szabolcs Nagy <nszabolcs at gmail dot com> ---
(In reply to Pádraig Brady from comment #11)
> This has been _very_ well tested at facebook
> Note the additional fix in comment #9
> It would be great to merge this. thanks!

sorry i didnt have time to work on this in this release cycle, i'll try to look
at it in the next one if others don't beat me to it (the comments can be
improved, dtv_slots should be fixed so it has consistent meaning and one should
reason about the consequences of removing the asserts, they might catch valid
corruption that is still present via dlclose races that are not fixed).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/19329] dl-tls.c assert failure at concurrent pthread_create and dlopen

cvs-commit at gcc dot gnu.org
In reply to this post by cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19329

Lukas <lukasz.koniecki at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lukasz.koniecki at gmail dot com

--- Comment #13 from Lukas <lukasz.koniecki at gmail dot com> ---
(In reply to Szabolcs Nagy from comment #12)

> sorry i didnt have time to work on this in this release cycle, i'll try to
> look at it in the next one if others don't beat me to it (the comments can
> be improved, dtv_slots should be fixed so it has consistent meaning and one
> should reason about the consequences of removing the asserts, they might
> catch valid corruption that is still present via dlclose races that are not
> fixed).

Any update on this? It has been over a year since the last comment.

--
You are receiving this mail because:
You are on the CC list for the bug.
12