[PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Torvald Riegel-4
I've asked for comments on whether mutexes acquired before fork() remain
to be acquired by just the parent process after fork():
https://sourceware.org/ml/libc-alpha/2016-12/msg00772.html

If we can agree on one of the requirements I'm proposing there, this
patch fixes the core problem of bug 19402.  (The reproducer of this bug
reveals another issue on x86, for which I'll send a patch next.)

The fix is:

Robust mutexes acquired at the time of a call to fork() do not remain
acquired by the forked child process.  We have to clear the list of
acquired robust mutexes before registering this list with the kernel;
otherwise, if some of the robust mutexes are process-shared, the parent
process can alter the child's robust mutex list, which can lead to
deadlocks or even modification of memory that may not be occupied by a
mutex anymore.

Tested on x86_64-linux with glibc's tests and the reproducer from 19402.

robust-clear-list.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Florian Weimer-5
On 12/22/2016 11:15 AM, Torvald Riegel wrote:
> I've asked for comments on whether mutexes acquired before fork() remain
> to be acquired by just the parent process after fork():
> https://sourceware.org/ml/libc-alpha/2016-12/msg00772.html

The conceptual problem here is that we need to effectively
process-shared mutexes differently from those which are process-private.
  With effectively process-shared, I mean that the mutex object resides
on a mapping which is not copied by fork, but remains shared with the
parent.

I'm going to comment on the other thread as well.

> The fix is:
>
> Robust mutexes acquired at the time of a call to fork() do not remain
> acquired by the forked child process.  We have to clear the list of
> acquired robust mutexes before registering this list with the kernel;
> otherwise, if some of the robust mutexes are process-shared, the parent
> process can alter the child's robust mutex list, which can lead to
> deadlocks or even modification of memory that may not be occupied by a
> mutex anymore.
>
> Tested on x86_64-linux with glibc's tests and the reproducer from 19402.

Can we add a test case for this?

> +      /* Initialize the robust mutex list setting in the kernel which has
> + been reset during the fork.  We do not check for errors because if
> + it fails here, it must have failed at process startup as well and
> + nobody could have used robust mutexes.
> + Before we do that, we have to clear the list of robust mutexes
> + because we do not inherit ownership of mutexes from the parent.
> + We do not have to set self->robust_head.futex_offset since we do
> + inherit the correct value from the parent.  We do not need to clear
> + the pending operation because it must have been zero when fork was
> + called.  */

fork is supposed to be async-signal-safe, so I'm not sure the comment
about the pending operation is completely correct.

(Our fork implementation is currently not async-signal-safe, though.)

> +# ifdef __PTHREAD_MUTEX_HAVE_PREV
> +      self->robust_prev = &self->robust_head;
> +# endif
> +      self->robust_head.list = &self->robust_head;
>  # ifdef SHARED
>        if (__builtin_expect (__libc_pthread_functions_init, 0))
>   PTHFCT_CALL (ptr_set_robust, (self));

The change as such looks good to me, assuming this is the semantics we
want.  I checked it matches the initialization code, and it comes before
the set_robust_list call, as required.

Thanks,
Florian

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Torvald Riegel-4
On Thu, 2016-12-22 at 13:22 +0100, Florian Weimer wrote:

> On 12/22/2016 11:15 AM, Torvald Riegel wrote:
> > I've asked for comments on whether mutexes acquired before fork() remain
> > to be acquired by just the parent process after fork():
> > https://sourceware.org/ml/libc-alpha/2016-12/msg00772.html
>
> The conceptual problem here is that we need to effectively
> process-shared mutexes differently from those which are process-private.
>   With effectively process-shared, I mean that the mutex object resides
> on a mapping which is not copied by fork, but remains shared with the
> parent.

I don't think we need to, but that's a more general topic than this
patch in particular.

I think the use case of a process-private, robust mutex that is acquired
across fork and needs to be recovered is quite narrow.

Second, I don't see a way how we can accomodate both process-private and
process-shared robust mutexes in one list we're sharing with the kernel.
As soon as there are process-shared pages that the mutexes reside on, we
have lost because we cannot trust the contents of this memory (the
parent might do anything with it, so we can't traverse the list safely).

> > The fix is:
> >
> > Robust mutexes acquired at the time of a call to fork() do not remain
> > acquired by the forked child process.  We have to clear the list of
> > acquired robust mutexes before registering this list with the kernel;
> > otherwise, if some of the robust mutexes are process-shared, the parent
> > process can alter the child's robust mutex list, which can lead to
> > deadlocks or even modification of memory that may not be occupied by a
> > mutex anymore.
> >
> > Tested on x86_64-linux with glibc's tests and the reproducer from 19402.
>
> Can we add a test case for this?

What do you have in mind?  Checking that the list is reset to zero after
fork?  Do we need a test for that if we have documented the need to do
that in the code?
(Trying to kill the child in such a way that corruption is visible and
can be safely detected is probably nontrivial and not worth it, I
guess.)

> > +      /* Initialize the robust mutex list setting in the kernel which has
> > + been reset during the fork.  We do not check for errors because if
> > + it fails here, it must have failed at process startup as well and
> > + nobody could have used robust mutexes.
> > + Before we do that, we have to clear the list of robust mutexes
> > + because we do not inherit ownership of mutexes from the parent.
> > + We do not have to set self->robust_head.futex_offset since we do
> > + inherit the correct value from the parent.  We do not need to clear
> > + the pending operation because it must have been zero when fork was
> > + called.  */
>
> fork is supposed to be async-signal-safe, so I'm not sure the comment
> about the pending operation is completely correct.

Any operation that sets the pending operation to nonzero is not
async-signal-safe or safe for a signal handler though.  So what could go
wrong?  (OTOH, just clearing the pending op is fine too.)

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Florian Weimer-5
On 12/23/2016 10:31 PM, Torvald Riegel wrote:
>> Can we add a test case for this?
> What do you have in mind?  Checking that the list is reset to zero after
> fork?  Do we need a test for that if we have documented the need to do
> that in the code?

What about the attached patch?

Thanks,
Florian

robust-test.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Torvald Riegel-4
On Fri, 2017-01-13 at 14:11 +0100, Florian Weimer wrote:
> On 12/23/2016 10:31 PM, Torvald Riegel wrote:
> >> Can we add a test case for this?
> > What do you have in mind?  Checking that the list is reset to zero after
> > fork?  Do we need a test for that if we have documented the need to do
> > that in the code?
>
> What about the attached patch?

That looks fine generally, with the following exceptions / comments:

According to the feedback so far we got for
http://austingroupbugs.net/view.php?id=1112 the trylock-based test is
actually UB.  There should be a comment making that clear (ie, that we
rely on something we know about our current implementation to test
this).

If this was a multi-threaded test program, it would be invalid because
non-async-signal-safe functions are executed in the child before exec()
is called.  In a single-threaded program, it's not quite clear to me
what POSIX really wants.  There's indication that only AS-safe functions
should be called, but IIRC that's not explicitly stated.
Florian says we still need to support this as an extension.  If so, this
should be made clear in the test (using a suitable comment).

What about putting the x* wrappers as static inline functions just in
the header?  Then, we wouldn't need to add all those trivial C files.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Florian Weimer-5
On 01/13/2017 04:37 PM, Torvald Riegel wrote:

> On Fri, 2017-01-13 at 14:11 +0100, Florian Weimer wrote:
>> On 12/23/2016 10:31 PM, Torvald Riegel wrote:
>>>> Can we add a test case for this?
>>> What do you have in mind?  Checking that the list is reset to zero after
>>> fork?  Do we need a test for that if we have documented the need to do
>>> that in the code?
>>
>> What about the attached patch?
>
> That looks fine generally, with the following exceptions / comments:
>
> According to the feedback so far we got for
> http://austingroupbugs.net/view.php?id=1112 the trylock-based test is
> actually UB.  There should be a comment making that clear (ie, that we
> rely on something we know about our current implementation to test
> this).

Yeah, I'll move the check to the parent process and add another bit for
error-checking mutexes (and make the test conditional on that).

> If this was a multi-threaded test program, it would be invalid because
> non-async-signal-safe functions are executed in the child before exec()
> is called.  In a single-threaded program, it's not quite clear to me
> what POSIX really wants.  There's indication that only AS-safe functions
> should be called, but IIRC that's not explicitly stated.
> Florian says we still need to support this as an extension.  If so, this
> should be made clear in the test (using a suitable comment).

I'll add a comment, yes.

> What about putting the x* wrappers as static inline functions just in
> the header?  Then, we wouldn't need to add all those trivial C files.

We did that before and it caused problems because it restricted what
feature macros we could assume in the implementation of those inline
functions.

I verified your patch with this test, so I think it is okay to check it
in.  I'll commit my updated patch afterwards.

Thanks,
Florian

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Florian Weimer-5
In reply to this post by Torvald Riegel-4
On 01/13/2017 04:37 PM, Torvald Riegel wrote:

> On Fri, 2017-01-13 at 14:11 +0100, Florian Weimer wrote:
>> On 12/23/2016 10:31 PM, Torvald Riegel wrote:
>>>> Can we add a test case for this?
>>> What do you have in mind?  Checking that the list is reset to zero after
>>> fork?  Do we need a test for that if we have documented the need to do
>>> that in the code?
>>
>> What about the attached patch?
>
> That looks fine generally, with the following exceptions / comments:
>
> According to the feedback so far we got for
> http://austingroupbugs.net/view.php?id=1112 the trylock-based test is
> actually UB.  There should be a comment making that clear (ie, that we
> rely on something we know about our current implementation to test
> this).
>
> If this was a multi-threaded test program, it would be invalid because
> non-async-signal-safe functions are executed in the child before exec()
> is called.  In a single-threaded program, it's not quite clear to me
> what POSIX really wants.  There's indication that only AS-safe functions
> should be called, but IIRC that's not explicitly stated.
> Florian says we still need to support this as an extension.  If so, this
> should be made clear in the test (using a suitable comment).
I added a comment and an additional test which uses an error-checking mutex.

Test case still passes on current master.  Okay to commit?

Thanks,
Florian


tst-robust-fork.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Torvald Riegel-4
On Mon, 2017-01-23 at 15:57 +0100, Florian Weimer wrote:

> On 01/13/2017 04:37 PM, Torvald Riegel wrote:
> > On Fri, 2017-01-13 at 14:11 +0100, Florian Weimer wrote:
> >> On 12/23/2016 10:31 PM, Torvald Riegel wrote:
> >>>> Can we add a test case for this?
> >>> What do you have in mind?  Checking that the list is reset to zero after
> >>> fork?  Do we need a test for that if we have documented the need to do
> >>> that in the code?
> >>
> >> What about the attached patch?
> >
> > That looks fine generally, with the following exceptions / comments:
> >
> > According to the feedback so far we got for
> > http://austingroupbugs.net/view.php?id=1112 the trylock-based test is
> > actually UB.  There should be a comment making that clear (ie, that we
> > rely on something we know about our current implementation to test
> > this).
> >
> > If this was a multi-threaded test program, it would be invalid because
> > non-async-signal-safe functions are executed in the child before exec()
> > is called.  In a single-threaded program, it's not quite clear to me
> > what POSIX really wants.  There's indication that only AS-safe functions
> > should be called, but IIRC that's not explicitly stated.
> > Florian says we still need to support this as an extension.  If so, this
> > should be made clear in the test (using a suitable comment).
>
> I added a comment and an additional test which uses an error-checking mutex.
>
> Test case still passes on current master.  Okay to commit?

LGTM.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Florian Weimer-5
On 01/24/2017 09:27 PM, Torvald Riegel wrote:

> On Mon, 2017-01-23 at 15:57 +0100, Florian Weimer wrote:
>> On 01/13/2017 04:37 PM, Torvald Riegel wrote:
>>> On Fri, 2017-01-13 at 14:11 +0100, Florian Weimer wrote:
>>>> On 12/23/2016 10:31 PM, Torvald Riegel wrote:
>>>>>> Can we add a test case for this?
>>>>> What do you have in mind?  Checking that the list is reset to zero after
>>>>> fork?  Do we need a test for that if we have documented the need to do
>>>>> that in the code?
>>>>
>>>> What about the attached patch?
>>>
>>> That looks fine generally, with the following exceptions / comments:
>>>
>>> According to the feedback so far we got for
>>> http://austingroupbugs.net/view.php?id=1112 the trylock-based test is
>>> actually UB.  There should be a comment making that clear (ie, that we
>>> rely on something we know about our current implementation to test
>>> this).
>>>
>>> If this was a multi-threaded test program, it would be invalid because
>>> non-async-signal-safe functions are executed in the child before exec()
>>> is called.  In a single-threaded program, it's not quite clear to me
>>> what POSIX really wants.  There's indication that only AS-safe functions
>>> should be called, but IIRC that's not explicitly stated.
>>> Florian says we still need to support this as an extension.  If so, this
>>> should be made clear in the test (using a suitable comment).
>>
>> I added a comment and an additional test which uses an error-checking mutex.
>>
>> Test case still passes on current master.  Okay to commit?
>
> LGTM.

Siddhesh, is this okay despite the freeze?  It's a test-only change:

   <https://sourceware.org/ml/libc-alpha/2017-01/msg00423.html>

Thanks,
Florian

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] [BZ #19402] Clear list of acquired robust mutexes in the child process after forking.

Siddhesh Poyarekar-8
Yes, this is fine.

Siddhesh