[Bug uprobes/18171] New: setjmp/longjmp clashing with uprobes, causing SIGSEGV or SIGILL

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] New: setjmp/longjmp clashing with uprobes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

            Bug ID: 18171
           Summary: setjmp/longjmp clashing with uprobes, causing SIGSEGV
                    or SIGILL
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: uprobes
          Assignee: systemtap at sourceware dot org
          Reporter: fche at redhat dot com

The testsuite/systemtap.base/bz5274* test case now regularly fails on modern
kernels such as 3.18.9-200.fc21.x86_64, on at least i686 & x86-64.

% gcc -g -o bz5274 .../testsuite/systemtap.base/bz5274.c
% cat > bz5274a.stp << EOF
probe never
# ,process("bz5274").function("funca")
  ,process("bz5274").function("funca").return
# ,process("bz5274").function("funcb")
  ,process("bz5274").function("funcb").return
  ,process("bz5274").function("funcc")
# ,process("bz5274").function("funcc").return
  ,process("bz5274").function("funcd")
# ,process("bz5274").function("funcd").return
# ,process("bz5274").function("main")
  ,process("bz5274").function("main").return  # <-- out=SIGBUS vs in=SIGILL
{i++ }
global i
EOF
% sudo stap -t bz5274a.stp -c ./bz5274
In bz5274.c: funcb :20 : i=1.  Calling funcc
In bz5274.c: funcc :13 : i=1.  Calling funcd
In bz5274.c: funcd :7 : i=1.  Calling longjmp
In bz5274.c: funcb :20 : i=2.  Calling funcc
In bz5274.c: funcc :13 : i=2.  Calling funcd
In bz5274.c: funcd :7 : i=2.  Calling longjmp
In bz5274.c: funcb :20 : i=3.  Calling funcc
In bz5274.c: funcc :13 : i=3.  Calling funcd
In bz5274.c: funcd :7 : i=3.  Calling longjmp
In bz5274.c: funcb :20 : i=4.  Calling funcc
In bz5274.c: funcc :13 : i=4.  Calling funcd
In bz5274.c: funcd :7 : i=4.  Calling longjmp
WARNING: Child process exited with signal 11 (Segmentation fault)
i=0xa
----- probe hit report:
process(".../testsuite/bz5274").function("funcb@.../systemtap.base/bz5274.c:18").return,
(bz5274a.stp:1:1), hits: 2 [...]
process(".../testsuite/bz5274").function("funcc@.../systemtap.base/bz5274.c:11"),
(bz5274a.stp:1:1), hits: 4 [...]
process(".../testsuite/bz5274").function("funcd@.../systemtap.base/bz5274.c:5"),
(bz5274a.stp:1:1), hits: 4 [...]
end, (<synthetic>:1:1), hits: 1, cycles: 4608min/4608avg/4608max, from: end,
index: 6
----- refresh report:

WARNING: .../staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

[same with -DSTP_ALIBI]

% coredumpctl (pick last $pid)
% sudo coredumpctl gdb $pid
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000400774 in funca (s=0xbffffffda1e80000 <error: Cannot access
memory at address 0xbffffffda1e80000>, env=0xb80000000abf)
    at .../bz5274.c:34

(gdb) info registers
rax            0x4    4
rbx            0x0    0
rcx            0x7fffffe4    2147483620
rdx            0x400752    4196178
rsi            0x4    4
rdi            0x7ffd3654d590    140725514982800
rbp            0x41ff894156415741    0x41ff894156415741
rsp            0x4007d8    0x4007d8 <__libc_csu_init+8>
r8             0x7ffd3654d550    140725514982736
r9             0x7ffd3654d570    140725514982768
r10            0x0    0
r11            0x246    582
r12            0x400570    4195696
r13            0x7ffd3654d740    140725514983232
r14            0x0    0
r15            0x0    0
rip            0x400774    0x400774 <funca+62>
eflags         0x10246    [ PF ZF IF RF ]
cs             0x33    51
ss             0x2b    43
ds             0x0    0
es             0x0    0
fs             0x0    0
gs             0x0    0


Note the corrupted $rsp value.


My attempts to reproduce this with "perf probe" failed, without exonerating the
kernel uprobes:

% stap -p2 bz5274a.stp) | sh .../scripts/stap2perf | sudo sh
[...]
  probe:bz5274_main_return                           [Tracepoint event]
  probe:bz5274_funcd                                 [Tracepoint event]
  probe:bz5274_funcc                                 [Tracepoint event]
  probe:bz5274_funcb_return                          [Tracepoint event]
  probe:bz5274_funca_return                          [Tracepoint event]

% sudo perf record -e 'probe:*' -aR .../bz5274
[...]
In bz5274.c: funcc :13 : i=4.  Calling funcd
In bz5274.c: funcd :7 : i=4.  Calling longjmp
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.888 MB perf.data (~38805 samples) ]

% sudo perf report
[...] "The perf.data file has no samples"

% sudo perf report -D
[...]
Aggregated stats:
           TOTAL events:       7357
            MMAP events:         96
            COMM events:        648
            EXIT events:          2
            FORK events:        334
           MMAP2 events:       6276
  FINISHED_ROUND events:          1
probe:bz5274_main_return stats:
probe:bz5274_funcd stats:
probe:bz5274_funcc stats:
probe:bz5274_funcb_return stats:
probe:bz5274_funca_return stats:

... indicates zero uprobes hits of any sort

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mcermak at redhat dot com
            Summary|setjmp/longjmp clashing     |setjmp/longjmp clashing
                   |with uprobes, causing       |with u[ret]probes, causing
                   |SIGSEGV or SIGILL           |SIGSEGV or SIGILL

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
In reply to this post by albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

--- Comment #1 from Martin Cermak <mcermak at redhat dot com> ---
Reproduced using perf. Problem is the order of commandline switches produced by
stap2perf: Flipping -a and -x seems to help:

=======
# perf probe -x /usr/share/systemtap/testsuite/bz5274 -a
bz5274_funca_return=funca%return
# perf probe -x /usr/share/systemtap/testsuite/bz5274 -a
bz5274_funcb_return=funcb%return

...
=======

After this I have:

=======
# cat //sys/kernel/debug/tracing/uprobe_events
r:probe_bz5274/bz5274_funca_return
/usr/share/systemtap/testsuite/bz5274:0x00000000000006e6
r:probe_bz5274/bz5274_funcb_return
/usr/share/systemtap/testsuite/bz5274:0x00000000000006a0
p:probe_bz5274/bz5274_funcc
/usr/share/systemtap/testsuite/bz5274:0x000000000000065a
p:probe_bz5274/bz5274_funcd
/usr/share/systemtap/testsuite/bz5274:0x0000000000000616
r:probe_bz5274/bz5274_main_return
/usr/share/systemtap/testsuite/bz5274:0x0000000000000725
=======

Now using -e 'probe_bz5274:*' instead of -e 'probe:*' I'm getting:

=======
# /root/bin/perf record -e 'probe_bz5274:*' -aR ./bz5274
In bz5274.c: funcb :20 : i=1.  Calling funcc
In bz5274.c: funcc :13 : i=1.  Calling funcd
In bz5274.c: funcd :7 : i=1.  Calling longjmp
In bz5274.c: funcb :20 : i=2.  Calling funcc
In bz5274.c: funcc :13 : i=2.  Calling funcd
In bz5274.c: funcd :7 : i=2.  Calling longjmp
In bz5274.c: funcb :20 : i=3.  Calling funcc
In bz5274.c: funcc :13 : i=3.  Calling funcd
In bz5274.c: funcd :7 : i=3.  Calling longjmp
In bz5274.c: funcb :20 : i=4.  Calling funcc
In bz5274.c: funcc :13 : i=4.  Calling funcd
In bz5274.c: funcd :7 : i=4.  Calling longjmp
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.166 MB perf.data (10 samples) ]
Segmentation fault (core dumped)
[root@dhcp-1-102 testsuite]#
=======

But now I'm getting two corefiles:

=======
[root@dhcp-1-102 testsuite]# ls core*
core.2097  core.2098
[root@dhcp-1-102 testsuite]# file core.2200
core.2200: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program
headers (163)
[root@dhcp-1-102 testsuite]# file core.2201
core.2201: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from
'./bz5274'
[root@dhcp-1-102 testsuite]#  
[root@dhcp-1-102 testsuite]# gdb -q ./bz5274 core.2097
Reading symbols from ./bz5274...done.

warning: core file may not match specified executable file.
---------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^---------------------------------------------------
[New LWP 2097]
Core was generated by `/root/bin/perf record -e probe_bz5274:* -aR ./bz5274'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000003e38e0ff99 in ?? ()
(gdb) info registers
rax            0x0      0
rbx            0x0      0
rcx            0xffffffffffffffff       -1
rdx            0xb      11
rsi            0x831    2097
rdi            0x831    2097
rbp            0x3e389b76a0     0x3e389b76a0
rsp            0x7fff705200f8   0x7fff705200f8
r8             0x7fff70520030   140735077810224
r9             0x7fff7051fe40   140735077809728
r10            0xa5     165
r11            0x202    514
r12            0x1      1
r13            0x3e389b8e80     267237691008
r14            0x8365d0 8611280
r15            0x0      0
rip            0x3e38e0ff99     0x3e38e0ff99
eflags         0x202    [ IF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) q
[root@dhcp-1-102 testsuite]#
[root@dhcp-1-102 testsuite]# gdb -q ./bz5274 core.2098
Reading symbols from ./bz5274...done.
[New LWP 2098]
Core was generated by `./bz5274'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000400724 in funca (s=0xffff95e800400876 <error: Cannot access
memory at address 0xffff95e800400876>, env=0xbfc68948ffffff30) at
systemtap.base/bz5274.c:34
34      }
(gdb) info registers
rax            0x4      4
rbx            0x0      0
rcx            0x7fffffe4       2147483620
rdx            0x400702 4196098
rsi            0x4      4
rdi            0x7fff359cc380   140734092854144
rbp            0x41ff894156415741       0x41ff894156415741
rsp            0x400768 0x400768 <__libc_csu_init+8>
r8             0x7fff359cc340   140734092854080
r9             0x7fff359cc360   140734092854112
r10            0x0      0
r11            0x246    582
r12            0x400520 4195616
r13            0x7fff359cc530   140734092854576
r14            0x0      0
r15            0x0      0
rip            0x400724 0x400724 <funca+62>
eflags         0x10246  [ PF ZF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb)
=======

So I assume only the second corefile is relevant.

The $rsp corruption seems to happen here too.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
In reply to this post by albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

Mark Wielaard <mjw at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mjw at redhat dot com

--- Comment #2 from Mark Wielaard <mjw at redhat dot com> ---
Just a note that it seems the same is true for the golang runtime.
Although I haven't pinpointed it precisely it seems at least some code in the
garbage collector seems to get upset when we place return probe.

e.g.

$ cat hello.go
package main

import "fmt"

func main() {
    fmt.Printf("Hello, world.\n")
}

$ go build hello.go

$ ./run-stap -e 'probe process.function("*").call {printf("%s->%s %s\n",
thread_indent (1), ppfunc (), $$parms)} probe process.function("*").return
{printf("%s<-%s %s\n", thread_indent (-1), ppfunc (), $$return)}' -c ./hello

[...]

fatal error: panic on m stack

runtime stack:
WARNING: Child process exited with status 2

[...]

Without the process.function("*").return probe things look fine (assuming you
have the fixes for PR17957, PR17958 and PR17959).

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
In reply to this post by albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

--- Comment #3 from Frank Ch. Eigler <fche at redhat dot com> ---
Oleg's patches at http://www.spinics.net/lists/kernel/msg1981148.html look good
on x86-64.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
In reply to this post by albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mark at klomp dot org

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
In reply to this post by albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

Mark Wielaard <mjw at fedoraproject dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|mjw at fedoraproject dot org       |

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug uprobes/18171] setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL

albert.aribaud at 3adev dot fr
In reply to this post by albert.aribaud at 3adev dot fr
https://sourceware.org/bugzilla/show_bug.cgi?id=18171

Frank Ch. Eigler <fche at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #4 from Frank Ch. Eigler <fche at redhat dot com> ---
Kernels much newer than these are working fine.

--
You are receiving this mail because:
You are the assignee for the bug.