[Bug nptl/25765] New: Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/25765] New: Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=25765

            Bug ID: 25765
           Summary: Incorrect futex syscall in
                    __pthread_disable_asynccancel for linux x86_64 leads
                    to livelock
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: nptl
          Assignee: unassigned at sourceware dot org
          Reporter: martin.lubich at gmx dot at
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Created attachment 12422
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12422&action=edit
Example code to reproduce and trigger the bug

There is a bug in the x86_64 specific implementation of
__pthread_disable_asynccancel.

When detecting an ongoing thread cancellation (CANCELLING_BITMASK) the code
tries to block on a futex based on the cancellation member of the thread
structure.

The generic c-code in nptl/cancellation.c does this in a correct way.

The specific implemention in sysdeps/unix/sysv/linux/x86_64/cancellation.S has
an error in setting up the futex syscall. The 3rd parameter ( the value against
which the kernel futex code checks ) is not set (edx register) i.e. edx is not
in a defined state and thus typically the futex call will return immediately
with EAGAIN. This leads to an endless loop.

If the looping thread has a higher RT priority than the cancelling thread, the
loop will go on forever, consuming all CPU cycles there are. In case of RT
threads, this will also cause complete system freezes.

If have attached a simple test which will show the problem after some time.

This is a patch which fixes the problem.

The patch is based on a glibc 2.27, but the bug is still present in the actual
version 2.31. as well as the actual developmemt version.

--------------- snip ----------------------------

diff -Naur glibc-2.27/sysdeps/unix/sysv/linux/x86_64/cancellation.S
glibc-2.27_patched/sysdeps/unix/sysv/linux/x86_64/cancellation.S
--- glibc-2.27/sysdeps/unix/sysv/linux/x86_64/cancellation.S    2018-02-01
17:17:18.000000000 +0100
+++ glibc-2.27_patched/sysdeps/unix/sysv/linux/x86_64/cancellation.S  
2020-04-02 12:08:02.712851151 +0200
@@ -95,8 +95,8 @@
        cmpxchgl %r11d, %fs:CANCELHANDLING
        jnz     2b

-       movl    %r11d, %eax
-3:     andl    $(TCB_CANCELING_BITMASK|TCB_CANCELED_BITMASK), %eax
+3:     movl    %r11d, %eax
+       andl    $(TCB_CANCELING_BITMASK|TCB_CANCELED_BITMASK), %eax
        cmpl    $TCB_CANCELING_BITMASK, %eax
        je      4f
 1:     ret
@@ -104,12 +104,13 @@
        /* Performance doesn't matter in this loop.  We will
           delay until the thread is canceled.  And we will unlikely
           enter the loop twice.  */
-4:     mov     %fs:0, %RDI_LP
+4:      movl    %r11d, %edx
+        mov    %fs:0, %RDI_LP
        movl    $__NR_futex, %eax
        xorq    %r10, %r10
        addq    $CANCELHANDLING, %rdi
        LOAD_PRIVATE_FUTEX_WAIT (%esi)
        syscall
-       movl    %fs:CANCELHANDLING, %eax
+       movl    %fs:CANCELHANDLING, %edx
        jmp     3b
 END(__pthread_disable_asynccancel)

------------------- snip ---------------------------

This is a linux x86_64 specific bug.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/25765] Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=25765

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg

--- Comment #1 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
Thanks for the report, recently I submitted a patch to just remove all the
x86_64 assembly (the cancellation syscalls are now only done by C
implementation, so there is no need to use specialized assembly routines).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/25765] Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=25765

--- Comment #2 from Martin Lubich <martin.lubich at gmx dot at> ---
Thats interesting. Is this already in master ?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/25765] Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=25765

--- Comment #3 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
Unfortunately no, it is still in review.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/25765] Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=25765

--- Comment #4 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Adhemerval Zanella
<[hidden email]>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=17fd707f88c5531972c980a4f4567ba6c7f84067

commit 17fd707f88c5531972c980a4f4567ba6c7f84067
Author: Adhemerval Zanella <[hidden email]>
Date:   Tue Mar 31 14:59:28 2020 -0300

    nptl: Remove x86_64 cancellation assembly implementations [BZ #25765]

    All cancellable syscalls are done by C implementations, so there is no
    no need to use a specialized implementation to optimize register usage.

    It fixes BZ #25765.

    Checked on x86_64-linux-gnu.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/25765] Incorrect futex syscall in __pthread_disable_asynccancel for linux x86_64 leads to livelock

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=25765

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED
           Assignee|unassigned at sourceware dot org   |adhemerval.zanella at linaro dot o
                   |                            |rg
   Target Milestone|---                         |2.32

--- Comment #5 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
Fixed on 2.32.

--
You are receiving this mail because:
You are on the CC list for the bug.