Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

Maxim Kuvyrkov-4
[CC: libc-ports as glibc's ARM and sparc32 lowlevellock.c need same patch.]

On Mar 22, 2014, at 2:50 AM, [hidden email] wrote:

> From: Brian Niebuhr <[hidden email]>
>
> __lll_timedlock_wait has a bug that is exposed in these conditions:
>
> 1. Thread 1 acquires the lock
> 2. Thread 2 attempts to acquire the lock and waits via futex syscall
> 3. Thread 1 unlocks the lock and wakes the waiting thread
> 4. Thread 1 re-aquires the lock before thread 2 gets the CPU
>
> What happens in this case is that the futex value is set to '1' since
> Thread 1 reaquired the lock through the fast path (since it had just been
> released).  The problem is that Thread 2 is in the loop in
> __lll_timedlock_wait and it expects the futex value to be '2'.
> atomic_compare_and_exchange_bool_acq only sets the futex value to '2' if
> it is already set to '0', and since Thread 1 reaquired the lock, the
> futex remains set to '1'.
>
> When Thread 2 attempts to wait on the futex, the operating system returns
> -EWOULDBLOCK since the futex value is not '2'.  This causes a busy wait
> condition where Thread 2 continuously attempts to wait on the futex and
> the kernel immediately returns -EWOULDBLOCK.  This continues until Thread
> 1 releases the lock.
>
> The fix is to use atomic_exchange_acq instead of
> atomic_compare_and_exchange_bool_acq which will force the futex value to
> '2' on each loop iteration.

You know, I started this reply as a you-are-wrong-here-is-why, but after looking at both glibc and uclibc, I agree that you are correct.  Just to reiterate for others, the problem here is not correctness, but busy-wait instead of sleep under certain conditions.

>  This change makes uClibc line up with
> glibc's implementation.

This problem is fixed in glibc's ./nptl/sysdeps/unix/sysv/linux/lowlevellock.c, but still present in glibc's ARM and sparc32 lowlevellock.c.  Do you plan to fix these too?

Interestingly, glibc's hppa lowlevellock.c has an alternative solution to this problem.  I can't quite figure out which solution is faster, so would appreciate a second pair of eyes.  My feeling is that the generic (the one in your patch) version is faster.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org

>
> Signed-off-by: Brian Niebuhr <[hidden email]>
> ---
> libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
> index af864b3..4f7e890 100644
> --- a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
> +++ b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
> @@ -60,10 +60,7 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>     return EINVAL;
>
>   /* Upgrade the lock.  */
> -  if (atomic_exchange_acq (futex, 2) == 0)
> -    return 0;
> -
> -  do
> +  while (atomic_exchange_acq (futex, 2) != 0)
>     {
>       struct timeval tv;
>
> @@ -86,7 +83,6 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>       // XYZ: Lost the lock to check whether it was private.
>       lll_futex_timed_wait (futex, 2, &rt, private);
>     }
> -  while (atomic_compare_and_exchange_bool_acq (futex, 2, 0) != 0);
>
>   return 0;
> }
> --
> 1.8.3.2
>
> _______________________________________________
> uClibc mailing list
> [hidden email]
> http://lists.busybox.net/mailman/listinfo/uclibc

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

Will Newton
And CC libc-alpha, as libc-ports has mostly become disused.

On 27 March 2014 20:31, Maxim Kuvyrkov <[hidden email]> wrote:

> [CC: libc-ports as glibc's ARM and sparc32 lowlevellock.c need same patch.]
>
> On Mar 22, 2014, at 2:50 AM, [hidden email] wrote:
>
>> From: Brian Niebuhr <[hidden email]>
>>
>> __lll_timedlock_wait has a bug that is exposed in these conditions:
>>
>> 1. Thread 1 acquires the lock
>> 2. Thread 2 attempts to acquire the lock and waits via futex syscall
>> 3. Thread 1 unlocks the lock and wakes the waiting thread
>> 4. Thread 1 re-aquires the lock before thread 2 gets the CPU
>>
>> What happens in this case is that the futex value is set to '1' since
>> Thread 1 reaquired the lock through the fast path (since it had just been
>> released).  The problem is that Thread 2 is in the loop in
>> __lll_timedlock_wait and it expects the futex value to be '2'.
>> atomic_compare_and_exchange_bool_acq only sets the futex value to '2' if
>> it is already set to '0', and since Thread 1 reaquired the lock, the
>> futex remains set to '1'.
>>
>> When Thread 2 attempts to wait on the futex, the operating system returns
>> -EWOULDBLOCK since the futex value is not '2'.  This causes a busy wait
>> condition where Thread 2 continuously attempts to wait on the futex and
>> the kernel immediately returns -EWOULDBLOCK.  This continues until Thread
>> 1 releases the lock.
>>
>> The fix is to use atomic_exchange_acq instead of
>> atomic_compare_and_exchange_bool_acq which will force the futex value to
>> '2' on each loop iteration.
>
> You know, I started this reply as a you-are-wrong-here-is-why, but after looking at both glibc and uclibc, I agree that you are correct.  Just to reiterate for others, the problem here is not correctness, but busy-wait instead of sleep under certain conditions.
>
>>  This change makes uClibc line up with
>> glibc's implementation.
>
> This problem is fixed in glibc's ./nptl/sysdeps/unix/sysv/linux/lowlevellock.c, but still present in glibc's ARM and sparc32 lowlevellock.c.  Do you plan to fix these too?
>
> Interestingly, glibc's hppa lowlevellock.c has an alternative solution to this problem.  I can't quite figure out which solution is faster, so would appreciate a second pair of eyes.  My feeling is that the generic (the one in your patch) version is faster.
>
> Thank you,
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>>
>> Signed-off-by: Brian Niebuhr <[hidden email]>
>> ---
>> libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
>> index af864b3..4f7e890 100644
>> --- a/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
>> +++ b/libpthread/nptl/sysdeps/unix/sysv/linux/arm/lowlevellock.c
>> @@ -60,10 +60,7 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>>     return EINVAL;
>>
>>   /* Upgrade the lock.  */
>> -  if (atomic_exchange_acq (futex, 2) == 0)
>> -    return 0;
>> -
>> -  do
>> +  while (atomic_exchange_acq (futex, 2) != 0)
>>     {
>>       struct timeval tv;
>>
>> @@ -86,7 +83,6 @@ __lll_timedlock_wait (int *futex, const struct timespec *abstime, int private)
>>       // XYZ: Lost the lock to check whether it was private.
>>       lll_futex_timed_wait (futex, 2, &rt, private);
>>     }
>> -  while (atomic_compare_and_exchange_bool_acq (futex, 2, 0) != 0);
>>
>>   return 0;
>> }
>> --
>> 1.8.3.2
>>
>> _______________________________________________
>> uClibc mailing list
>> [hidden email]
>> http://lists.busybox.net/mailman/listinfo/uclibc
>



--
Will Newton
Toolchain Working Group, Linaro
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

Joseph Myers
I don't know how this might relate to
<https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see
<https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and
<https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest
of that thread).  But my preference for how to address this is definitely
to move to unifying lowlevellock.[ch] files across as many architectures
as possible - which requires someone to understand the differences and
produce a careful analysis that shows what the best form for generic files
is and what cases actually require architecture-specific files to override
those generic files (preferably overriding only the bits that need
overriding).

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

Maxim Kuvyrkov-4
On Mar 28, 2014, at 11:01 AM, Joseph S. Myers <[hidden email]> wrote:

> I don't know how this might relate to
> <https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see
> <https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and
> <https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest
> of that thread).  But my preference for how to address this is definitely
> to move to unifying lowlevellock.[ch] files across as many architectures
> as possible - which requires someone to understand the differences and
> produce a careful analysis that shows what the best form for generic files
> is and what cases actually require architecture-specific files to override
> those generic files (preferably overriding only the bits that need
> overriding).

Yeap, it's the same issue in the PR and same solution as in this thread.  Unfortunately, the previous discussion veered off towards sparc away from ARM and got forgotten.

I agree that unifying lowlevellock.c implementation is the way forward.  At the very least I will make sure that ARM doesn't have unnecessary divergence from generic lowlevellock.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org



Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

Joseph Myers
On Fri, 28 Mar 2014, Maxim Kuvyrkov wrote:

> On Mar 28, 2014, at 11:01 AM, Joseph S. Myers <[hidden email]> wrote:
>
> > I don't know how this might relate to
> > <https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see
> > <https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and
> > <https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest
> > of that thread).  But my preference for how to address this is definitely
> > to move to unifying lowlevellock.[ch] files across as many architectures
> > as possible - which requires someone to understand the differences and
> > produce a careful analysis that shows what the best form for generic files
> > is and what cases actually require architecture-specific files to override
> > those generic files (preferably overriding only the bits that need
> > overriding).
>
> Yeap, it's the same issue in the PR and same solution as in this thread.  
> Unfortunately, the previous discussion veered off towards sparc away
> from ARM and got forgotten.

The present thread is specifically discussing lowlevellock.c, but Carlos
suggested in the previous discussion that the real issue was in
__lll_timedlock in lowlevellock.h.  I think both files need unification
across architectures.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Fix __lll_timedlock_wait busy-wait issue

Torvald Riegel-4
In reply to this post by Joseph Myers
On Thu, 2014-03-27 at 22:01 +0000, Joseph S. Myers wrote:

> I don't know how this might relate to
> <https://sourceware.org/bugzilla/show_bug.cgi?id=15119> (see
> <https://sourceware.org/ml/libc-ports/2013-01/msg00084.html> and
> <https://sourceware.org/ml/libc-ports/2013-02/msg00021.html> and the rest
> of that thread).  But my preference for how to address this is definitely
> to move to unifying lowlevellock.[ch] files across as many architectures
> as possible - which requires someone to understand the differences and
> produce a careful analysis that shows what the best form for generic files
> is and what cases actually require architecture-specific files to override
> those generic files (preferably overriding only the bits that need
> overriding).
>

I agree.  My gut feeling is that the locks should eventually become
unified C code, using atomics to do the synchronization;
architecture-specific code should be either in the atomics or in more
generally useful spin-waiting code (which could be used by other sync
constructs as well).  The futex syscall is really on the slowpath; if
you hit it, you will have had at least a cache miss on the futex var,
and doing the syscall will likely give you more cache misses.
Therefore, I don't see a reason why the futex syscall needs to have
custom asm implementations such as on x86 currently.