[RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Ondřej Bílka
Hi,

This bug had a simple patch for five years without reply.
https://sourceware.org/bugzilla/show_bug.cgi?id=4578
Could someone comment this?

It was detected on custom chip, could this be replicated on other
architectures?

An analysis from bugzilla and patch are below

"
Details:

If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
or RT_DELETE at the time another thread calls fork(), then the child exit
code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
If the child subsequently requires ld.so functionality before calling exec(),
then the assertion will fire.

The patch acquires dl_load_lock on entry to fork() and releases it on exit
from the parent path.  The child path is initialized as currently done.
This is essentially pthreads_atfork, but forced to be first because the
acquisition of dl_load_lock must happen before malloc_atfork is active
to avoid a deadlock.
"

--- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
2007-05-29 23:44:33.000000000 -0400
+++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
2007-05-31 15:07:18.712221827 -0400
@@ -27,6 +27,7 @@
 #include "fork.h"
 #include <hp-timing.h>
 #include <ldsodefs.h>
+#include <bits/libc-lock.h>
 #include <bits/stdio-lock.h>
 #include <atomic.h>
 
@@ -59,6 +60,8 @@
     struct used_handler *next;
   } *allp = NULL;
 
+  /* grab ld.so lock BEFORE switching to malloc_atfork */
+   __rtld_lock_lock_recursive (GL(dl_load_lock));
   /* Run all the registered preparation handlers.  In reverse order.
      While doing this we build up a list of all the entries.  */
   struct fork_handler *runp;
@@ -208,6 +211,8 @@
 
   allp = allp->next;
  }
+      /* unlock ld.so last, because we locked it first */
+      __rtld_lock_unlock_recursive (GL(dl_load_lock));
     }
 
   return pid;

Reply | Threaded
Open this post in threaded view
|

Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Ondřej Bílka
On Wed, Oct 09, 2013 at 10:05:34PM +0200, Ondřej Bílka wrote:
> Hi,
>
> This bug had a simple patch for five years without reply.
> https://sourceware.org/bugzilla/show_bug.cgi?id=4578
> Could someone comment this?
>
> It was detected on custom chip, could this be replicated on other
> architectures?
>
Comments?
 

> An analysis from bugzilla and patch are below
>
> "
> Details:
>
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling exec(),
> then the assertion will fire.
>
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path.  The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "
>
> --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-29 23:44:33.000000000 -0400
> +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-31 15:07:18.712221827 -0400
> @@ -27,6 +27,7 @@
>  #include "fork.h"
>  #include <hp-timing.h>
>  #include <ldsodefs.h>
> +#include <bits/libc-lock.h>
>  #include <bits/stdio-lock.h>
>  #include <atomic.h>
>  
> @@ -59,6 +60,8 @@
>      struct used_handler *next;
>    } *allp = NULL;
>  
> +  /* grab ld.so lock BEFORE switching to malloc_atfork */
> +   __rtld_lock_lock_recursive (GL(dl_load_lock));
>    /* Run all the registered preparation handlers.  In reverse order.
>       While doing this we build up a list of all the entries.  */
>    struct fork_handler *runp;
> @@ -208,6 +211,8 @@
>  
>    allp = allp->next;
>   }
> +      /* unlock ld.so last, because we locked it first */
> +      __rtld_lock_unlock_recursive (GL(dl_load_lock));
>      }
>  
>    return pid;
Reply | Threaded
Open this post in threaded view
|

[PING][RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Ondřej Bílka
In reply to this post by Ondřej Bílka
ping

On Wed, Oct 09, 2013 at 10:05:34PM +0200, Ondřej Bílka wrote:

> Hi,
>
> This bug had a simple patch for five years without reply.
> https://sourceware.org/bugzilla/show_bug.cgi?id=4578
> Could someone comment this?
>
> It was detected on custom chip, could this be replicated on other
> architectures?
>
> An analysis from bugzilla and patch are below
>
> "
> Details:
>
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling exec(),
> then the assertion will fire.
>
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path.  The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "
>
> --- glibc-2.5-sources/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-29 23:44:33.000000000 -0400
> +++ glibc-2.5-modified/nptl/sysdeps/unix/sysv/linux/fork.c
> 2007-05-31 15:07:18.712221827 -0400
> @@ -27,6 +27,7 @@
>  #include "fork.h"
>  #include <hp-timing.h>
>  #include <ldsodefs.h>
> +#include <bits/libc-lock.h>
>  #include <bits/stdio-lock.h>
>  #include <atomic.h>
>  
> @@ -59,6 +60,8 @@
>      struct used_handler *next;
>    } *allp = NULL;
>  
> +  /* grab ld.so lock BEFORE switching to malloc_atfork */
> +   __rtld_lock_lock_recursive (GL(dl_load_lock));
>    /* Run all the registered preparation handlers.  In reverse order.
>       While doing this we build up a list of all the entries.  */
>    struct fork_handler *runp;
> @@ -208,6 +211,8 @@
>  
>    allp = allp->next;
>   }
> +      /* unlock ld.so last, because we locked it first */
> +      __rtld_lock_unlock_recursive (GL(dl_load_lock));
>      }
>  
>    return pid;

Reply | Threaded
Open this post in threaded view
|

Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Mike Frysinger
In reply to this post by Ondřej Bílka
On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote:

> Details:
>
> If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> or RT_DELETE at the time another thread calls fork(), then the child exit
> code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> If the child subsequently requires ld.so functionality before calling
> exec(), then the assertion will fire.
>
> The patch acquires dl_load_lock on entry to fork() and releases it on exit
> from the parent path.  The child path is initialized as currently done.
> This is essentially pthreads_atfork, but forced to be first because the
> acquisition of dl_load_lock must happen before malloc_atfork is active
> to avoid a deadlock.
> "
doesn't seem right that we grab the lock and then just reset it in the child ?  
seems like you should just unlock it rather than reset it in the child.

i'm also wary of code that already grabs a lot of locks trying to grab even
more.  the code paths that already grab the IO locks ... can they possibly
grab this one too ?  like a custom format handler that triggers loading of
libs ?

> +  /* grab ld.so lock BEFORE switching to malloc_atfork */

comment style is incorrect

> +      /* unlock ld.so last, because we locked it first */

comment style is wrong here too
-mike

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Ondřej Bílka
On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote:

> On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote:
> > Details:
> >
> > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> > or RT_DELETE at the time another thread calls fork(), then the child exit
> > code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our case)
> > re-initializes dl_load_lock but does not restore r_state to RT_CONSISTENT.
> > If the child subsequently requires ld.so functionality before calling
> > exec(), then the assertion will fire.
> >
> > The patch acquires dl_load_lock on entry to fork() and releases it on exit
> > from the parent path.  The child path is initialized as currently done.
> > This is essentially pthreads_atfork, but forced to be first because the
> > acquisition of dl_load_lock must happen before malloc_atfork is active
> > to avoid a deadlock.
> > "
>
> doesn't seem right that we grab the lock and then just reset it in the child ?  
> seems like you should just unlock it rather than reset it in the child.
>
That part looks ok as without locking you could get inconsistent linker structures.

> i'm also wary of code that already grabs a lot of locks trying to grab even
> more.  the code paths that already grab the IO locks ... can they possibly
> grab this one too ?  like a custom format handler that triggers loading of
> libs ?
>
Do you haave a better solution? I send this to decide what to do with
this bug. I would not be surprised if we decided that it is invalid as
threads with fork cause problems in general.
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Mike Frysinger
On Thursday 02 January 2014 18:54:40 Ondřej Bílka wrote:

> On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote:
> > On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote:
> > > Details:
> > >
> > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> > > or RT_DELETE at the time another thread calls fork(), then the child
> > > exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our
> > > case) re-initializes dl_load_lock but does not restore r_state to
> > > RT_CONSISTENT. If the child subsequently requires ld.so functionality
> > > before calling exec(), then the assertion will fire.
> > >
> > > The patch acquires dl_load_lock on entry to fork() and releases it on
> > > exit from the parent path.  The child path is initialized as currently
> > > done. This is essentially pthreads_atfork, but forced to be first
> > > because the acquisition of dl_load_lock must happen before
> > > malloc_atfork is active to avoid a deadlock.
> > > "
> >
> > doesn't seem right that we grab the lock and then just reset it in the
> > child ? seems like you should just unlock it rather than reset it in the
> > child.
>
> That part looks ok as without locking you could get inconsistent linker
> structures.
that's not what i meant.  rather than make the call to reset in the child as
the code in the tree does now, if you grab the lock early, why not use the
normal unlock call in the child ?  struct state would be fine.

> > i'm also wary of code that already grabs a lot of locks trying to grab
> > even more.  the code paths that already grab the IO locks ... can they
> > possibly grab this one too ?  like a custom format handler that triggers
> > loading of libs ?
>
> Do you haave a better solution? I send this to decide what to do with
> this bug. I would not be surprised if we decided that it is invalid as
> threads with fork cause problems in general.

i think we want to support it without it being completely terrible.

maybe the answer here is to reset all the dl state like we do with the lock ?  
is there some init func we could call ?  i'm really not familiar with glibc
ldso internals.
-mike

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][BZ #1874] Fix assertion triggered by thread/fork interaction

Ondřej Bílka
On Thu, Jan 02, 2014 at 09:07:02PM -0500, Mike Frysinger wrote:

> On Thursday 02 January 2014 18:54:40 Ondřej Bílka wrote:
> > On Thu, Jan 02, 2014 at 05:18:22PM -0500, Mike Frysinger wrote:
> > > On Wednesday 09 October 2013 16:05:34 Ondřej Bílka wrote:
> > > > Details:
> > > >
> > > > If a thread happens to hold dl_load_lock and have r_state set to RT_ADD
> > > > or RT_DELETE at the time another thread calls fork(), then the child
> > > > exit code from fork (in nptl/sysdeps/unix/sysv/linux/fork.c in our
> > > > case) re-initializes dl_load_lock but does not restore r_state to
> > > > RT_CONSISTENT. If the child subsequently requires ld.so functionality
> > > > before calling exec(), then the assertion will fire.
> > > >
> > > > The patch acquires dl_load_lock on entry to fork() and releases it on
> > > > exit from the parent path.  The child path is initialized as currently
> > > > done. This is essentially pthreads_atfork, but forced to be first
> > > > because the acquisition of dl_load_lock must happen before
> > > > malloc_atfork is active to avoid a deadlock.
> > > > "
> > >
> > > doesn't seem right that we grab the lock and then just reset it in the
> > > child ? seems like you should just unlock it rather than reset it in the
> > > child.
> >
> > That part looks ok as without locking you could get inconsistent linker
> > structures.
>
> that's not what i meant.  rather than make the call to reset in the child as
> the code in the tree does now, if you grab the lock early, why not use the
> normal unlock call in the child ?  struct state would be fine.
>
possible.
 

> > > i'm also wary of code that already grabs a lot of locks trying to grab
> > > even more.  the code paths that already grab the IO locks ... can they
> > > possibly grab this one too ?  like a custom format handler that triggers
> > > loading of libs ?
> >
> > Do you haave a better solution? I send this to decide what to do with
> > this bug. I would not be surprised if we decided that it is invalid as
> > threads with fork cause problems in general.
>
> i think we want to support it without it being completely terrible.
>
> maybe the answer here is to reset all the dl state like we do with the lock ?  
> is there some init func we could call ?  i'm really not familiar with glibc
> ldso internals.

How would you handle dlclose with handle before fork? I still do not see
a clean solution.