BZ 13939 malloc deadlock

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

BZ 13939 malloc deadlock

Jeff Law

Can someone please review the patch attached to BZ 13939.  It's got 2.16
as a milesone, so it'd be nice to get resolved for 2.16 :-)

FWIW, we're using it in Fedora as well as Red Hat's products.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: BZ 13939 malloc deadlock

Roland McGrath-4
The best way to get a patch reviewed is to (re-)post it here.
Reply | Threaded
Open this post in threaded view
|

Re: BZ 13939 malloc deadlock

Maxim Kuvyrkov-2
In reply to this post by Jeff Law
On 23/06/2012, at 8:50 AM, Jeff Law wrote:

>
> Can someone please review the patch attached to BZ 13939.  It's got 2.16 as a milesone, so it'd be nice to get resolved for 2.16 :-)
>
> FWIW, we're using it in Fedora as well as Red Hat's products.

The patch looks OK from correctness standpoint, but, IMO, it makes obscure code even more obscure.  Malloc is not the best documented piece of GLIBC, so how about the following clarifications to the code?

...

> @@ -780,7 +780,7 @@ get_free_list (void)
>  
>  
>  static mstate
> -reused_arena (void)
> +reused_arena (bool retrying)
>  {
>    mstate result;
>    static mstate next_to_use;
> @@ -797,6 +797,15 @@ reused_arena (void)
>      }
>    while (result != next_to_use);
>  
> +  /* If we are retrying due to a failure to allocate in the main
> +     arena, don't wait for the main arena to become available, select
> +     another.
> +
> +     To really fix this right we would have to try the allocation
> +     in every other arena, but that seems like severe overkill.  */
> +  if (retrying && result == &main_arena)
> +    result = result->next;
> +

Given the increasing complexity of reused_arena(), it deserves a header comment.

The "retrying" argument has a meaning of "avoid-main_arena", so it would be clearer to name it that.  On the other hand, to understand why exactly main_arena is being special-cased here one has to follow the trail several levels up.

I suggest changing the argument from "bool retrying" to "mstate avoid_arena" with the header comment ...
/* Return arena that can be reused for memory allocation.
   Avoid AVOID_ARENA as we have already failed to allocate memory in it.  */
.  This way there's less special-casing of main_arena.

...

> @@ -3026,23 +3027,26 @@ __libc_valloc(size_t bytes)
>    if(!ar_ptr)
>      return 0;
>    p = _int_valloc(ar_ptr, bytes);
> -  (void)mutex_unlock(&ar_ptr->mutex);
>    if(!p) {
>      /* Maybe the failure is due to running out of mmapped areas. */
>      if(ar_ptr != &main_arena) {
> +      (void)mutex_unlock(&ar_ptr->mutex);
>        ar_ptr = &main_arena;
>        (void)mutex_lock(&ar_ptr->mutex);
>        p = _int_memalign(ar_ptr, pagesz, bytes);
>        (void)mutex_unlock(&ar_ptr->mutex);
>      } else {
>        /* ... or sbrk() has failed and there is still a chance to mmap() */
> -      ar_ptr = arena_get2(ar_ptr->next ? ar_ptr : 0, bytes);
> +      mstate prev = ar_ptr->next ? ar_ptr : 0;
> +      (void)mutex_unlock(&ar_ptr->mutex);
> +      ar_ptr = arena_get2(prev, bytes, true);
>        if(ar_ptr) {
>   p = _int_memalign(ar_ptr, pagesz, bytes);
>   (void)mutex_unlock(&ar_ptr->mutex);
>        }
>      }
> -  }
> +  } else
> +    (void)mutex_unlock (&ar_ptr->mutex);

I don't think this and other similar clean-ups are worthwhile.  One now has to follow 3 code paths to figure out that the lock is released versus 1 path before.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics


Reply | Threaded
Open this post in threaded view
|

Re: BZ 13939 malloc deadlock

Jeff Law
On 06/26/2012 08:20 PM, Maxim Kuvyrkov wrote:
>
> Given the increasing complexity of reused_arena(), it deserves a
> header comment.
I'll add one.  Frankly, the overall lack of header comments and
documentation of key data structures inside glibc is depressing.

>
> The "retrying" argument has a meaning of "avoid-main_arena", so it
> would be clearer to name it that.  On the other hand, to understand
> why exactly main_arena is being special-cased here one has to follow
> the trail several levels up.
>
> I suggest changing the argument from "bool retrying" to "mstate
> avoid_arena" with the header comment ... /* Return arena that can be
> reused for memory allocation. Avoid AVOID_ARENA as we have already
> failed to allocate memory in it.  */ .  This way there's less
> special-casing of main_arena.
Seems quite reasonable.

>
> ...
>> @@ -3026,23 +3027,26 @@ __libc_valloc(size_t bytes) if(!ar_ptr)
>> return 0; p = _int_valloc(ar_ptr, bytes); -
>> (void)mutex_unlock(&ar_ptr->mutex); if(!p) { /* Maybe the failure
>> is due to running out of mmapped areas. */ if(ar_ptr !=
>> &main_arena) { +      (void)mutex_unlock(&ar_ptr->mutex); ar_ptr =
>> &main_arena; (void)mutex_lock(&ar_ptr->mutex); p =
>> _int_memalign(ar_ptr, pagesz, bytes);
>> (void)mutex_unlock(&ar_ptr->mutex); } else { /* ... or sbrk() has
>> failed and there is still a chance to mmap() */ -      ar_ptr =
>> arena_get2(ar_ptr->next ? ar_ptr : 0, bytes); +      mstate prev =
>> ar_ptr->next ? ar_ptr : 0; +
>> (void)mutex_unlock(&ar_ptr->mutex); +      ar_ptr =
>> arena_get2(prev, bytes, true); if(ar_ptr) { p =
>> _int_memalign(ar_ptr, pagesz, bytes);
>> (void)mutex_unlock(&ar_ptr->mutex); } } -  } +  } else +
>> (void)mutex_unlock (&ar_ptr->mutex);
>
> I don't think this and other similar clean-ups are worthwhile.  One
> now has to follow 3 code paths to figure out that the lock is
> released versus 1 path before.
I don't have a strong opinion as to *which* approach is taken to release
the lock in the various retrying paths.  What I do have a strong opinion
about is unifying their overall structure.  Having the lock released at
different times in different ways in the half-dozen different retry
implementations is just insane from a maintenance standpoint.

The only thing that prevents us from pulling the mutex_unlock call up is
concerns about racing on ar_ptr->next, which is used in the sbrk()
failed, retrying via mmap path.

I did evaluate the code for races on that object and didn't find any,
but this is a new codebase for me and I could have missed something.  If
we can conclude that a race on that object isn't an issue, then we can
trivially simplify this code.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: BZ 13939 malloc deadlock

Maxim Kuvyrkov-2
On 3/07/2012, at 6:46 AM, Jeff Law wrote:

> On 06/26/2012 08:20 PM, Maxim Kuvyrkov wrote:
>>
...
>> I don't think this and other similar clean-ups are worthwhile.  One
>> now has to follow 3 code paths to figure out that the lock is
>> released versus 1 path before.
> I don't have a strong opinion as to *which* approach is taken to release the lock in the various retrying paths.  What I do have a strong opinion about is unifying their overall structure.  Having the lock released at different times in different ways in the half-dozen different retry implementations is just insane from a maintenance standpoint.

After looking at the code once more, I agree.

> +      mstate prev = ar_ptr->next ? ar_ptr : 0;
> +      (void)mutex_unlock(&ar_ptr->mutex);
> +      ar_ptr = arena_get2(prev, bytes + 2*pagesz + MINSIZE, true);

I still think that introduction of 'prev' is a bit too verbose (it is used once and it's initializer would serve the purpose just as well), but this is way in the area of nit-picks that I can make my peace with it :-).

>
> The only thing that prevents us from pulling the mutex_unlock call up is concerns about racing on ar_ptr->next, which is used in the sbrk() failed, retrying via mmap path.
>
> I did evaluate the code for races on that object and didn't find any, but this is a new codebase for me and I could have missed something.  If we can conclude that a race on that object isn't an issue, then we can trivially simplify this code.

I don't recommend going this way as it is error-prone for bugs from future changes.  A reasonable programmer would expect ar_ptr->mutex to lock the entirety of ar_ptr, including ar_ptr->next.

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics


Reply | Threaded
Open this post in threaded view
|

Re: BZ 13939 malloc deadlock

Jeff Law
On 07/02/2012 02:00 PM, Maxim Kuvyrkov wrote:
>
>> +      mstate prev = ar_ptr->next ? ar_ptr : 0; +
>> (void)mutex_unlock(&ar_ptr->mutex); +      ar_ptr =
>> arena_get2(prev, bytes + 2*pagesz + MINSIZE, true);
>
> I still think that introduction of 'prev' is a bit too verbose (it is
> used once and it's initializer would serve the purpose just as well),
> but this is way in the area of nit-picks that I can make my peace
> with it :-).
That was the style used in the libc_memalign version which I found to be
the cleanest and most robust WRT racing on ar_ptr->next and thus was the
variant I settled on for all the retry paths.



>
>>
>> The only thing that prevents us from pulling the mutex_unlock call
>> up is concerns about racing on ar_ptr->next, which is used in the
>> sbrk() failed, retrying via mmap path.
>>
>> I did evaluate the code for races on that object and didn't find
>> any, but this is a new codebase for me and I could have missed
>> something.  If we can conclude that a race on that object isn't an
>> issue, then we can trivially simplify this code.
>
> I don't recommend going this way as it is error-prone for bugs from
> future changes.  A reasonable programmer would expect ar_ptr->mutex
> to lock the entirety of ar_ptr, including ar_ptr->next.
Precisely.  Thus if we're trying to be robust WRT future changes and
avoiding problems with ar->next changing after the lock is released but
before we call arena_get2 (which is what it looks like libc_memalign is
trying to do), then we need to get the value from ar->next before we
release the lock.

Note that libc_malloc and libc_calloc's current implementations can't
race on ar->next because the lock is still held.  Of course it's the
holding of that lock which leads to the deadlock that we're trying to
fix right now.

libc_pvalloc and libc_valloc's current implementation would have
problems if there are races with writes to ar->next as they unlock the
arena then blindly use the current value of ar->next.


Jeff
Reply | Threaded
Open this post in threaded view
|

Re: BZ 13939 malloc deadlock

Maxim Kuvyrkov-2
On 3/07/2012, at 9:21 AM, Jeff Law wrote:

> On 07/02/2012 02:00 PM, Maxim Kuvyrkov wrote:
>>
>> I don't recommend going this way as it is error-prone for bugs from
>> future changes.  A reasonable programmer would expect ar_ptr->mutex
>> to lock the entirety of ar_ptr, including ar_ptr->next.
> Precisely.  Thus if we're trying to be robust WRT future changes and avoiding problems with ar->next changing after the lock is released but before we call arena_get2 (which is what it looks like libc_memalign is trying to do), then we need to get the value from ar->next before we release the lock.
>
> Note that libc_malloc and libc_calloc's current implementations can't race on ar->next because the lock is still held.  Of course it's the holding of that lock which leads to the deadlock that we're trying to fix right now.
>
> libc_pvalloc and libc_valloc's current implementation would have problems if there are races with writes to ar->next as they unlock the arena then blindly use the current value of ar->next.

I see now, you are right again.  I would appreciate you adding a comment to one of the instances of "prev = ar_ptr->next" saying something like "Get ar_ptr->next before releasing the lock.".

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Reply | Threaded
Open this post in threaded view
|

Re: [Patch] [BZ 13939] malloc deadlock

Jeff Law
On 07/02/2012 03:29 PM, Maxim Kuvyrkov wrote:

> On 3/07/2012, at 9:21 AM, Jeff Law wrote:
>
>> On 07/02/2012 02:00 PM, Maxim Kuvyrkov wrote:
>>>
>>> I don't recommend going this way as it is error-prone for bugs from
>>> future changes.  A reasonable programmer would expect ar_ptr->mutex
>>> to lock the entirety of ar_ptr, including ar_ptr->next.
>> Precisely.  Thus if we're trying to be robust WRT future changes and avoiding problems with ar->next changing after the lock is released but before we call arena_get2 (which is what it looks like libc_memalign is trying to do), then we need to get the value from ar->next before we release the lock.
>>
>> Note that libc_malloc and libc_calloc's current implementations can't race on ar->next because the lock is still held.  Of course it's the holding of that lock which leads to the deadlock that we're trying to fix right now.
>>
>> libc_pvalloc and libc_valloc's current implementation would have problems if there are races with writes to ar->next as they unlock the arena then blindly use the current value of ar->next.
>
> I see now, you are right again.  I would appreciate you adding a comment to one of the instances of "prev = ar_ptr->next" saying something like "Get ar_ptr->next before releasing the lock.".
You asked for a header comment on reused_arena.  Done.

You asked for avoiding special casing the main_arena by passing in the
arena to be avoided.  Done.

You asked for a comment WRT getting the arena's next pointer prior to
releasing its lock (to avoid potential races on the arena's next
pointer).  Done.


OK?

Thanks,
Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] [BZ 13939] malloc deadlock

Maxim Kuvyrkov-2
On 10/08/2012, at 4:11 PM, Jeff Law wrote:

> On 07/02/2012 03:29 PM, Maxim Kuvyrkov wrote:
>> On 3/07/2012, at 9:21 AM, Jeff Law wrote:
>>
>>> On 07/02/2012 02:00 PM, Maxim Kuvyrkov wrote:
>>>>
>>>> I don't recommend going this way as it is error-prone for bugs from
>>>> future changes.  A reasonable programmer would expect ar_ptr->mutex
>>>> to lock the entirety of ar_ptr, including ar_ptr->next.
>>> Precisely.  Thus if we're trying to be robust WRT future changes and avoiding problems with ar->next changing after the lock is released but before we call arena_get2 (which is what it looks like libc_memalign is trying to do), then we need to get the value from ar->next before we release the lock.
>>>
>>> Note that libc_malloc and libc_calloc's current implementations can't race on ar->next because the lock is still held.  Of course it's the holding of that lock which leads to the deadlock that we're trying to fix right now.
>>>
>>> libc_pvalloc and libc_valloc's current implementation would have problems if there are races with writes to ar->next as they unlock the arena then blindly use the current value of ar->next.
>>
>> I see now, you are right again.  I would appreciate you adding a comment to one of the instances of "prev = ar_ptr->next" saying something like "Get ar_ptr->next before releasing the lock.".
> You asked for a header comment on reused_arena.  Done.
>
> You asked for avoiding special casing the main_arena by passing in the arena to be avoided.  Done.
>
> You asked for a comment WRT getting the arena's next pointer prior to releasing its lock (to avoid potential races on the arena's next pointer).  Done.
>
>
> OK?

OK for trunk with the NEWS hunk updated to refer to 2.18.

Joseph, I don't think I have formal approval privileges (or do I?), so would you please rubber stamp this review?

David, you may want to consider this patch for 2.17 branch.  The fix should apply cleanly to 2.17.  If you do backport it, please remember to update NEWS hunk on both 2.17 and trunk.

Jeff, thanks for fixing this!

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Reply | Threaded
Open this post in threaded view
|

Re: [Patch] [BZ 13939] malloc deadlock

Joseph Myers
On Fri, 10 Aug 2012, Maxim Kuvyrkov wrote:

> OK for trunk with the NEWS hunk updated to refer to 2.18.

I don't understand your 2.18 references.  2.17 isn't due to freeze until
November (as per <http://sourceware.org/glibc/wiki/Glibc%20Timeline>)
unless David announces otherwise.

> Joseph, I don't think I have formal approval privileges (or do I?), so
> would you please rubber stamp this review?

There's no such thing as formal approval privileges outside the listed
machine / OS maintainers.

http://sourceware.org/glibc/wiki/Consensus

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] [BZ 13939] malloc deadlock

David Miller-13
From: "Joseph S. Myers" <[hidden email]>
Date: Fri, 10 Aug 2012 11:48:59 +0000

> On Fri, 10 Aug 2012, Maxim Kuvyrkov wrote:
>
>> OK for trunk with the NEWS hunk updated to refer to 2.18.
>
> I don't understand your 2.18 references.  2.17 isn't due to freeze until
> November (as per <http://sourceware.org/glibc/wiki/Glibc%20Timeline>)
> unless David announces otherwise.

I have no current plans to change this :-)
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] [BZ 13939] malloc deadlock

Maxim Kuvyrkov-2
On 11/08/2012, at 1:19 PM, David Miller wrote:

> From: "Joseph S. Myers" <[hidden email]>
> Date: Fri, 10 Aug 2012 11:48:59 +0000
>
>> On Fri, 10 Aug 2012, Maxim Kuvyrkov wrote:
>>
>>> OK for trunk with the NEWS hunk updated to refer to 2.18.
>>
>> I don't understand your 2.18 references.  2.17 isn't due to freeze until
>> November (as per <http://sourceware.org/glibc/wiki/Glibc%20Timeline>)
>> unless David announces otherwise.
>
> I have no current plans to change this :-)


Silly me.  I thought the fact that we have a 2.17 release manager meant that 2.17 has branched.

Jeff, your patch is OK.

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics