[Bug dynamic-link/18457] New: pthread_join deadlock in library destructor

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] New: pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

            Bug ID: 18457
           Summary: pthread_join deadlock in library destructor
           Product: glibc
           Version: 2.21
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: [hidden email]
                CC: aoliva at redhat dot com
            Blocks: 17090, 17620, 17621, 17628
  Target Milestone: ---

A dlopen'd library that is unloaded and calls pthread_join in the destructor
causes a deadlock while waiting for the child to finish.

$ cat pthread-join.c
#include <dlfcn.h>

int
main (void)
{
  void *f = dlopen ("pthread-join-test.so", RTLD_NOW | RTLD_GLOBAL);
  if (f) dlclose (f);
}
$ cat pthread-join-test.c
#include <stdio.h>
#include <pthread.h>

static pthread_t th;
static int running = 1;

static void *
test_run (void *p)
{
  while (running)
    fprintf (stderr, "XXX test_run\n");
  fprintf (stderr, "XXX test_run FINISHED\n");
  return NULL;
}

static void __attribute__ ((constructor))
do_init (void)
{
  pthread_create (&th, NULL, test_run, NULL);
}

static void __attribute__ ((destructor))
do_end (void)
{
  running = 0;
  fprintf (stderr, "thread_join...\n");
  pthread_join (th, NULL);
  fprintf (stderr, "thread_join DONE\n");
}
$ gcc -fPIC -shared pthread-join-test.c -lpthread -o pthread-join-test.so
$ gcc pthread-join.c -lpthread -ldl -Wl,-path=. -o pthread-join
$ ./pthread-join
thread_join...
XXX test_run FINISHED


This was broken by f8aeae3 ("Fix DTV race, assert, DTV_SURPLUS Static TLS
limit, and nptl_db garbage").


Referenced Bugs:

https://sourceware.org/bugzilla/show_bug.cgi?id=17090
[Bug 17090] Assertion failure (map->l_tls_modid == cnt) when dlopen()ing
initial-exec TLS shared objects under certain circumstances
https://sourceware.org/bugzilla/show_bug.cgi?id=17620
[Bug 17620] DTV_SURPLUS limits loading of Static TLS-requiring modules in
multi-threaded programs
https://sourceware.org/bugzilla/show_bug.cgi?id=17621
[Bug 17621] DTV update for Static TLS dlopened modules is racy
https://sourceware.org/bugzilla/show_bug.cgi?id=17628
[Bug 17628] nptl_db's td_thr_tls* don't check DTV generation
--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

David Abdurachmanov <david.abdurachmanov at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |david.abdurachmanov at gmail dot c
                   |                            |om

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

Alexandre Oliva <aoliva at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |aoliva at sourceware dot org
           Assignee|unassigned at sourceware dot org   |aoliva at sourceware dot org

--- Comment #1 from Alexandre Oliva <aoliva at sourceware dot org> ---
Created attachment 8340
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8340&action=edit
Patch that fixes the regression

Thanks for the report . Here's the patch I'm testing to fix the regression.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com,
                   |                            |siddhesh at redhat dot com
   Target Milestone|---                         |2.22

--- Comment #2 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Alexandre Oliva from comment #1)
> Created attachment 8340 [details]
> Patch that fixes the regression
>
> Thanks for the report . Here's the patch I'm testing to fix the regression.

We are still looking to fix this in 2.22, but Alex's patch may not be the
solution that is decided upon. Initially as a conservative fix we may just not
uload the DSO that have destructors used by thread_local. Then in 2.23 we will
put in a broader fix that resolves and documents the concurrency requirements
for the internal glibc interfaces.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

Alexandre Oliva <aoliva at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Assignee|aoliva at sourceware dot org       |unassigned at sourceware dot org

--- Comment #3 from Alexandre Oliva <aoliva at sourceware dot org> ---
Don't even consider the patch, it's wrong.  We can't assume that, just because
l_tls_offset is set, the TLS block has been set up for all threads; a
concurrent dlopen may have already set l_tls_offset for a previously-loaded but
TLS-undecided module, but not yet completed initializing the TLS blocks for all
threads.  Taking the lock guarantees we wait for that initialization to
complete.  In order to make this work, we'd have to have dlopen to set
l_tls_offset to some value indicating initialization is underway and must be
waited for, overwriting it with the final value that obviates the lock only
after the initialization is done.

I think we're better off leaving this as is, and focusing on ways to run module
init and fini without holding the rtld lock.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

Andreas Schwab <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #8340|0                           |1
        is obsolete|                            |

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

--- Comment #4 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  e400f3ccd36fe91d432cc7d45b4ccc799dece763 (commit)
      from  48f5f7a63c5971a698ac2b97e4c383e04cbbe110 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e400f3ccd36fe91d432cc7d45b4ccc799dece763

commit e400f3ccd36fe91d432cc7d45b4ccc799dece763
Author: Siddhesh Poyarekar <[hidden email]>
Date:   Fri Jul 24 19:13:38 2015 +0530

    Use IE model for static variables in libc.so, libpthread.so and rtld

    The recently introduced TLS variables in the thread-local destructor
    implementation (__cxa_thread_atexit_impl) used the default GD access
    model, resulting in a call to __tls_get_addr.  This causes a deadlock
    with recent changes to the way TLS is initialized because DTV
    allocations are delayed and hence despite knowing the offset to the
    variable inside its TLS block, the thread has to take the global rtld
    lock to safely update the TLS offset.

    This causes deadlocks when a thread is instantiated and joined inside
    a destructor of a dlopen'd DSO.  The correct long term fix is to
    somehow not take the lock, but that will need a lot deeper change set
    to alter the way in which the big rtld lock is used.

    Instead, this patch just eliminates the call to __tls_get_addr for the
    thread-local variables inside libc.so, libpthread.so and rtld by
    building all of their units with -mtls-model=initial-exec.

    There were concerns that the static storage for TLS is limited and
    hence we should not be using it.  Additionally, dynamically loaded
    modules may result in libc.so looking for this static storage pretty
    late in static binaries.  Both concerns are valid when using TLSDESC
    since that is where one may attempt to allocate a TLS block from
    static storage for even those variables that are not IE.  They're not
    very strong arguments for the traditional TLS model though, since it
    assumes that the static storage would be used sparingly and definitely
    not by default.  Hence, for now this would only theoretically affect
    ARM architectures.

    The impact is hence limited to statically linked binaries that dlopen
    modules that in turn load libc.so, all that on arm hardware.  It seems
    like a small enough impact to justify fixing the larger problem that
    currently affects everything everywhere.

    This still does not solve the original problem completely.  That is,
    it is still possible to deadlock on the big rtld lock with a small
    tweak to the test case attached to this patch.  That problem is
    however not a regression in 2.22 and hence could be tackled as a
    separate project.  The test case is picked up as is from Alex's patch.

    This change has been tested to verify that it does not cause any
    issues on x86_64.

    ChangeLog:

        [BZ #18457]
        * nptl/Makefile (tests): New test case tst-join7.
        (modules-names): New test case module tst-join7mod.
        * nptl/tst-join7.c: New file.
        * nptl/tst-join7mod.c: New file.
        * Makeconfig (tls-model): Pass -ftls-model=initial-exec for
        all translation units in libc.so, libpthread.so and rtld.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                              |   10 +++++++
 Makeconfig                             |    6 +++-
 NEWS                                   |   12 ++++----
 nptl/Makefile                          |   10 +++++-
 nptl/tst-join7.c                       |   46 ++++++++++++++++++++++++++++++++
 nptl/{tst-detach1.c => tst-join7mod.c} |   46 ++++++++++++++++++--------------
 6 files changed, 101 insertions(+), 29 deletions(-)
 create mode 100644 nptl/tst-join7.c
 copy nptl/{tst-detach1.c => tst-join7mod.c} (52%)

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

Siddhesh Poyarekar <siddhesh at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Siddhesh Poyarekar <siddhesh at redhat dot com> ---
Fixed in master.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug dynamic-link/18457] pthread_join deadlock in library destructor

Sourceware - glibc-bugs mailing list
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=18457

jack <meave390 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |meave390 at gmail dot com

--- Comment #6 from jack <meave390 at gmail dot com> ---
Have to need you just follow the online this link here
https://fundecade.com/sliding-block-puzzle and looking the online here update
of latest Sliding Block Puzzle free online game most of the players have exited
for this.

--
You are receiving this mail because:
You are on the CC list for the bug.