[Bug releng/25581] New: USDT probes when /proc/[pid]/mem not writeable

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] New: USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

            Bug ID: 25581
           Summary: USDT probes when /proc/[pid]/mem not writeable
           Product: systemtap
           Version: unspecified
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: releng
          Assignee: systemtap at sourceware dot org
          Reporter: dale.hamel at srvthe dot net
  Target Milestone: ---

Created attachment 12303
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12303&action=edit
The full patch against sys/sdt.h and for the dtrace python header generating
script

USDT probes may not be accessible in environments where /proc/[pid]/mem is not
writeable. For instance, Container OS used by Google's GKE product forces this
to read-only.

In kernels later than 4.20, there is a kernel interface to increment the
semaphore counters for USDT probes by passing the offset to perf_event_open. In
kernels prior to this, if /proc/[pid]/mem is read-only, there is no userspace
way to support USDT probes, then the only means remaining is the kernel.

It is possible to provide an alternative implementation for enabled probes in
sys/sdt.h, however. I was able to modify the standard "systemtap-sdt-devel"
package with these patches:

```patch
From 191d9b7554859deab59d85b9ebe2f28878adfe7d Mon Sep 17 00:00:00 2001
From: Dale Hamel <[hidden email]>
Date: Wed, 19 Feb 2020 14:05:30 -0500
Subject: [PATCH] Modify systemtap-sdt-dev to use uprobes instead of semaphores

---
 systemtap-sdt-dev/sys/sdt.h   |  2 +-
 systemtap-sdt-dev/usdt/dtrace | 60 ++++++++++++++++++-----------------
 2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/systemtap-sdt-dev/sys/sdt.h b/systemtap-sdt-dev/sys/sdt.h
index c2de2a9..7eec9bb 100644
--- a/systemtap-sdt-dev/sys/sdt.h
+++ b/systemtap-sdt-dev/sys/sdt.h
@@ -171,7 +171,7 @@ __extension__ extern unsigned long long __sdt_unsp;
 #endif

 #define _SDT_ASM_BODY(provider, name, pack_args, args)               \
-  _SDT_ASM_1(990:  _SDT_NOP)                         \
+  _SDT_ASM_1(.ifndef provider##_##name##_check\n provider##_##name##_check:\n
.endif\n990: _SDT_NOP) \
   _SDT_ASM_3(      .pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
   _SDT_ASM_1(      .balign 4)                        \
   _SDT_ASM_3(      .4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE)          \
diff --git a/systemtap-sdt-dev/usdt/dtrace b/systemtap-sdt-dev/usdt/dtrace
index 981cce1..edfe004 100755
--- a/systemtap-sdt-dev/usdt/dtrace
+++ b/systemtap-sdt-dev/usdt/dtrace
@@ -40,30 +40,31 @@ except ImportError:
 class _HeaderCreator(object):
     def init_semaphores(self, fdesc):
         # dummy declaration just to make the object file non-empty
-        fdesc.write("/* Generated by the Systemtap dtrace wrapper */\n\n")
-        fdesc.write("static void __dtrace (void) __attribute__((unused));\n")
-        fdesc.write("static void __dtrace (void) {}\n")
+        #fdesc.write("/* Generated by the Systemtap dtrace wrapper */\n\n")
+        #fdesc.write("static void __dtrace (void) __attribute__((unused));\n")
+        #fdesc.write("static void __dtrace (void) {}\n")
         fdesc.write("\n#include <sys/sdt.h>\n\n")

     def init_probes(self, fdesc):
         fdesc.write("/* Generated by the Systemtap dtrace wrapper */\n\n")
-        fdesc.write("\n#define _SDT_HAS_SEMAPHORES 1\n\n")
-        fdesc.write("\n#define STAP_HAS_SEMAPHORES 1 /* deprecated */\n\n")
         fdesc.write("\n#include <sys/sdt.h>\n\n")
+        fdesc.write("#define HACK_UPROBE_ENABLED(provider, name) \
+                    ((*(char *) provider##_##name##_check) & 0x90) != 0x90\n")

     def add_semaphore(self, this_provider, this_probe):
         # NB: unsigned short is fixed in ABI
-        semaphores_def = '\n#if defined STAP_SDT_V1\n'
-        semaphores_def += '#define %s_%s_semaphore %s_semaphore\n' % \
-                          (this_provider, this_probe, this_probe)
-        semaphores_def += '#endif\n'
-        semaphores_def += '#if defined STAP_SDT_V1 || defined STAP_SDT_V2 \n'


```

Obviously if this were to be submitted upstream, a proper way to switch between
reference counting and uprobe based semaphores would make be preferred, this is
from my first prototype. I would think this could probably use the existing
HAS_SDT_SEMAPHORE logic

I've been able to verify this with bpftrace and bcc on a kernel without write
access to /proc/[pid]/mem.

This approach works by anchoring the existing asm label 990: to a symbolic
label, making it the definition for the prototype that is added to the header
in place of the semaphore code.

If there is support for this approach, I would be happy to modify the above
patch into a more modular form to be upstreamed.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dale.hamel at srvthe dot net

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |Kernel 4.14
               Host|                            |Linux X86 Server
              Build|                            |Google Container OS,
                   |                            |Milestone 62, build ID
                   |                            |10895

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

--- Comment #1 from Dale Hamel <dale.hamel at srvthe dot net> ---
This is the first bug I have created on here, hopefully I have submitted it
correctly. If I am out of order at all, please direct me to the right way to
submit this issue / proposal for enhancement.

This patch has been tested in production and works around the dependency on a
semaphore for popular programs like ruby, memcached, mysql, and redis, and for
using USDT probes without impacting performance when not attached.

Thanks for taking the time to review this bug, I look forward to hearing your
feedback and suggestions,
-Dale

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

--- Comment #2 from Dale Hamel <dale.hamel at srvthe dot net> ---
This minimal proof of concept shows the main concept employed by this patch:

```c
#include <stdio.h>
#include <unistd.h>

void _check();

#define MY_ASM \
  do { \
    __asm__ __volatile__ ("_check:\n990: nop");\
  } while(0)

int main(int argc, char **argv)
{
  while(1) {
    MY_ASM;
    printf("%08X\n", (*(char *)_check) & 0x90);
    sleep(1);
  }
}
```

When I run the program, I see the output:

```
...
00000090
00000090
00000090
...
```

Now If I attach to it with bpftrace:

```
bpftrace -e 'uprobe:./a.out:0x1164 { printf("here\n") }'
```

Note that 0x1164 is from:

```
$readelf -s ./a.out| grep _check
35: 0000000000001164     0 NOTYPE  LOCAL  DEFAULT   13 _check
```

I can see that the value changes in the program:

```
00000090
00000090
00000080
00000080
```

Though I was expecting 0xCC, not 0x80. It seems like 0x80 is for a syscall?
Unsure why, but it toggles as I attach/detach with bpftrace.



I compiled ruby with `--enable-dtrace` with these sys/sdt and dtrace.py, and
performed these tests:

Given the ruby process:

```
ruby -e 'TracePoint.new{}.enable; def foo; puts "hi"; sleep 1; end; while true
do; foo;end'
```

Note that `TracePoint.new{}.enable` is needed post ruby 2.5, aside from that we
are just sleeping and calling `foo`.

To start with, I was perplexed when I saw there were no sdt notes on
/proc/RUBYPID/exe, but on checking /proc/RUBYPID/maps, I see that this is built
with libruby.so, and the probes are in there. I check the elf notes on that,
and find `method__entry`:

```
$ readelf --notes /proc/168586/root/usr/local/lib/libruby.so | grep
method__entry -A1
    Name: cmethod__entry
    Location: 0x0000000000237e5a, Base: 0x00000000002f7305, Semaphore:
0x0000000000000000
--
    Name: method__entry
    Location: 0x000000000023819e, Base: 0x00000000002f7305, Semaphore:
0x0000000000000000
--
    Name: method__entry
    Location: 0x0000000000243ec3, Base: 0x00000000002f7305, Semaphore:
0x0000000000000000
--
    Name: cmethod__entry
    Location: 0x00000000002462e4, Base: 0x00000000002f7305, Semaphore:
0x0000000000000000

```

There are multiple addresses for this probe that we might attach to, but only
one of them will actually be checked in the source. To determine this, we have
to find the hack function:

```
$ readelf -s /proc/168586/root/usr/local/lib/libruby.so | grep
ruby_method__entry_check
  3922: 000000000023819e     0 NOTYPE  LOCAL  DEFAULT   12
ruby_method__entry_check
```

So it is the one at the address `0x000000000023819e`. To translate this to the
vmaddr, we need the base of libruby.so, which we can read from
/proc/RUBYPID/maps:

```
cat /proc/168586/maps | grep libruby
7f4f035f4000-7f4f0361f000 r--p 00000000 08:01 2637891                  
/usr/local/lib/libruby.so.2.6.5
7f4f0361f000-7f4f03852000 r-xp 0002b000 08:01 2637891                  
/usr/local/lib/libruby.so.2.6.5
7f4f03852000-7f4f03938000 r--p 0025e000 08:01 2637891                  
/usr/local/lib/libruby.so.2.6.5
7f4f03938000-7f4f0393e000 r--p 00343000 08:01 2637891                  
/usr/local/lib/libruby.so.2.6.5
7f4f0393e000-7f4f03941000 rw-p 00349000 08:01 2637891                  
/usr/local/lib/libruby.so.2.6.5
```

So 0x7f4f035f4000 + 0x000000000023819e = 0x7f4f0382c19e

Now lets just check the data in memory at that address without anything
attached:

```
$dd if=/proc/168586/mem count=1 bs=1 skip=$(( 0x7F4F0382C19E )) 2> /dev/null |
xxd
00000000: 90                                       .
```

It's the NOP exactly as expected. Now for the real magic, we attach bpftrace in
one terminal:

```
$ bpftrace -e 'usdt::ruby:method__entry {printf("%s\n", str(arg1))}' -p 168586
```

And immediately see it is printing `foo`, a good sign. But my `ENABLED` check
could still be broken right? So lets check the memory now that the probe is
enabled:

```
dd if=/proc/168586/mem count=1 bs=1 skip=$(( 0x7F4F0382C19E )) 2> /dev/null |
xxd
00000000: cc                                       .
```

Exactly as expected, the kernel has overwritten the NOP (0x90) with INT3
(0xCC).

The macro to check this determines that 0x90 != 0xCC, and returns true - the
probe is enabled by the uprobe itself.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12303|0                           |1
        is obsolete|                            |

--- Comment #3 from Dale Hamel <dale.hamel at srvthe dot net> ---
Created attachment 12305
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12305&action=edit
Improved patch for style, make semaphore use optional

Cleaned up version of my original hard-coded patch.

I haven't tested this as extensively, but should be better to review

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

--- Comment #4 from Dale Hamel <dale.hamel at srvthe dot net> ---
Created attachment 12306
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12306&action=edit
Additional fixes on cleaned-up patch

Found a couple of issues, will continue to tidy if i find more

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12305|0                           |1
        is obsolete|                            |

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12306|0                           |1
        is obsolete|                            |

--- Comment #5 from Dale Hamel <dale.hamel at srvthe dot net> ---
Created attachment 12307
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12307&action=edit
Additional fixes

Ran it and double checked the generated headers.

Sample without -S flag:

```

/* RUBY_METHOD_ENTRY ( const char *classname, const char *methodname, const
char *filename, int lineno ) */
#if defined STAP_SDT_V1
#define ruby_method__entry_semaphore method__entry_semaphore
#else
#define RUBY_METHOD_ENTRY_ENABLED() __builtin_expect
(ruby_method__entry_semaphore, 0)
#endif
__extension__ extern unsigned short ruby_method__entry_semaphore __attribute__
((unused)) __attribute__ ((section (".probes")));
#define RUBY_METHOD_ENTRY(arg1, arg2, arg3, arg4) \
DTRACE_PROBE4 (ruby, method__entry, arg1, arg2, arg3, arg4)

```

With the -S flag:

```

#define _SDT_HAS_SEMAPHORES 0


#define STAP_HAS_SEMAPHORES 0 /* deprecated */

#define __PLATFORM_UPROBE_ENABLED(provider, name)\
   provider##_##name##_check != 0 && \
   ((*(char *) __##provider##_##name##_asm_check) & 0x90) != 0x90


#include <sys/sdt.h>

/* RUBY_METHOD_ENTRY ( const char *classname, const char *methodname, const
char *filename, int lineno ) */
#define RUBY_METHOD_ENTRY_ENABLED() __PLATFORM_UPROBE_ENABLED(ruby,
method__entry)
#define RUBY_METHOD_ENTRY(arg1, arg2, arg3, arg4) \
DTRACE_PROBE4 (ruby, method__entry, arg1, arg2, arg3, arg4)

void __ruby_method__entry_asm_check();
```

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12307|0                           |1
        is obsolete|                            |

--- Comment #6 from Dale Hamel <dale.hamel at srvthe dot net> ---
Created attachment 12308
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12308&action=edit
Additional testing and improvements

Verified this works end-to-end as my original patch, but more modular.

Behavior is preserved without -S.

If I create a wrapper for dtrace like:

```
#!/bin/bash

dtrace -S $@
```

I can use the uprobe approach to probe on a system where /proc/[pid]/mem is not
writeable.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12308|0                           |1
        is obsolete|                            |

--- Comment #7 from Dale Hamel <dale.hamel at srvthe dot net> ---
Created attachment 12309
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12309&action=edit
Fixes typo

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

--- Comment #8 from Dale Hamel <dale.hamel at srvthe dot net> ---
I may have introduced a regression with this patch, as when I try to build ruby
with it with semaphores enabled it doesn't generate the necessary  object file
anymore. I think that I may need to use a different value for falling back than
the current define of _HAS_SDT_SEMAPHORE

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug releng/25581] USDT probes when /proc/[pid]/mem not writeable

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=25581

Dale Hamel <dale.hamel at srvthe dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12309|0                           |1
        is obsolete|                            |

--- Comment #9 from Dale Hamel <dale.hamel at srvthe dot net> ---
Created attachment 12311
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12311&action=edit
Fixes regression where semaphores need shared object

--
You are receiving this mail because:
You are the assignee for the bug.