[Bug translator/23891] New: stap and stapio process are stuck in signal processor and could not terminate properly

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug translator/23891] New: stap and stapio process are stuck in signal processor and could not terminate properly

github at kalvdans dot no-ip.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23891

            Bug ID: 23891
           Summary: stap and stapio process are stuck in signal processor
                    and could not terminate properly
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: translator
          Assignee: systemtap at sourceware dot org
          Reporter: datong at openresty dot com
  Target Milestone: ---

Hi,

We have observed that sometimes the stap (and associated stapio) process could
not be terminated using SIGTERM on one of our production systems:

strace shows the following:

+ timeout 5s strace -p 8596 -s 1024
Process 8596 attached
write(2, "Too many interrupts received, exiting.\n", 39Process 8596 detached
 <detached ...>
+ true
+ timeout 5s strace -p 9110 -s 1024
Process 9110 attached
write(2, "WARNING:", 8Process 9110 detached
 <detached ...>
+ true

The full backtrace of the stap and stapio process:

+ gstack 8596
#0  0x00007f8cc36bc420 in __write_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1  0x0000000000412308 in handle_interrupt () at main.cxx:280
#2  <signal handler called>
#3  0x00007f8cc36bceca in __libc_waitpid (pid=9110, stat_loc=0x7ffe7df20abc,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31
#4  0x0000000000597d8a in stap_waitpid (verbose=0, pid=9110) at util.cxx:749
#5  0x0000000000606cd9 in direct::finish (this=0x23fc830) at remote.cxx:108
#6  0x0000000000603222 in remote::run (remotes=std::vector of length 1,
capacity 1 = {...}) at remote.cxx:1292
#7  0x00000000004126cc in pass_5 (s=..., targets=std::vector of length 1,
capacity 1 = {...}) at main.cxx:1209
#8  0x000000000040fa54 in main (argc=<optimized out>, argv=<optimized out>) at
main.cxx:1429
+ gstack 9110
Thread 3 (Thread 0x7f2ad693d700 (LWP 9112)):
#0  0x00007f2ad6d0e101 in do_sigwait (sig=0x7f2ad693cef4, set=<optimized out>)
at ../sysdeps/unix/sysv/linux/sigwait.c:61
#1  __sigwait (set=set@entry=0x628170, sig=sig@entry=0x7f2ad693cef4) at
../sysdeps/unix/sysv/linux/sigwait.c:99
#2  0x0000000000402db5 in signal_thread (arg=0x628170) at mainloop.c:40
#3  0x00007f2ad6d06dc5 in start_thread (arg=0x7f2ad693d700) at
pthread_create.c:308
#4  0x00007f2ad6a3573d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 2 (Thread 0x7f2ad613c700 (LWP 9113)):
#0  0x00007f2ad6d0d43d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000407618 in reader_thread (data=0x0) at relay.c:235
#2  0x00007f2ad6d06dc5 in start_thread (arg=0x7f2ad613c700) at
pthread_create.c:308
#3  0x00007f2ad6a3573d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 1 (Thread 0x7f2ad712b740 (LWP 9110)):
#0  0x00007f2ad6a26c7d in write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f2ad69b3f73 in _IO_new_file_write (f=0x7f2ad6cf91c0
<_IO_2_1_stderr_>, data=0x7ffe0aaa9210, n=8) at fileops.c:1301
#2  0x00007f2ad69b470f in new_do_write (to_do=8, data=0x7ffe0aaa9210
"WARNING:acing...\n", fp=0x7f2ad6cf91c0 <_IO_2_1_stderr_>) at fileops.c:537
#3  _IO_new_file_xsputn (f=0x7f2ad6cf91c0 <_IO_2_1_stderr_>, data=<optimized
out>, n=8) at fileops.c:1383
#4  0x00007f2ad698a47d in buffered_vfprintf (s=s@entry=0x7f2ad6cf91c0
<_IO_2_1_stderr_>, format=format@entry=0x40a648 "WARNING:",
args=args@entry=0x7ffe0aaab850) at vfprintf.c:2340
#5  0x00007f2ad698531e in _IO_vfprintf_internal (s=s@entry=0x7f2ad6cf91c0
<_IO_2_1_stderr_>, format=0x40a648 "WARNING:", ap=0x7ffe0aaab850) at
vfprintf.c:1289
#6  0x00007f2ad6a4aefd in ___vfprintf_chk (fp=0x7f2ad6cf91c0 <_IO_2_1_stderr_>,
flag=flag@entry=1, format=<optimized out>, ap=ap@entry=0x7ffe0aaab850) at
vfprintf_chk.c:34
#7  0x0000000000405172 in vfprintf (__ap=0x7ffe0aaab850, __fmt=<optimized out>,
__stream=<optimized out>) at /usr/include/bits/stdio2.h:127
#8  eprintf (fmt=<optimized out>) at common.c:693
#9  0x0000000000404c55 in stp_main_loop () at mainloop.c:810
#10 0x000000000040279d in main (argc=<optimized out>, argv=0x7ffe0aaadc68) at
stapio.c:73

At this point, sending SIGTERM to neither of the process works, and the Kernel
module remains loaded.

Best,
Datong

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug translator/23891] stap and stapio process are stuck in signal processor and could not terminate properly

github at kalvdans dot no-ip.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23891

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fche at redhat dot com

--- Comment #1 from Frank Ch. Eigler <fche at redhat dot com> ---
Any idea how to reproduce this?
The two last thread backtraces seem to be sitting inside a write(2), which
suggests a pipe-full or such condition on the stapio stdout files.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug translator/23891] stap and stapio process are stuck in signal processor and could not terminate properly

github at kalvdans dot no-ip.org
In reply to this post by github at kalvdans dot no-ip.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23891

agentzh <agentzh at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |agentzh at gmail dot com

--- Comment #2 from agentzh <agentzh at gmail dot com> ---
Already committed a fix to master. Feel free to close this PR.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug translator/23891] stap and stapio process are stuck in signal processor and could not terminate properly

github at kalvdans dot no-ip.org
In reply to this post by github at kalvdans dot no-ip.org
https://sourceware.org/bugzilla/show_bug.cgi?id=23891

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
commit ab368ac2a

--
You are receiving this mail because:
You are the assignee for the bug.