Getting systemtap examples working with --bpf backend

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting systemtap examples working with --bpf backend

William Cohen
I noticed https://elinux.org/images/d/dc/Kernel-Analysis-Using-eBPF-Daniel-Thompson-Linaro.pdf mentioned on page 29 that many of the systemtap examples did not work with the bpf back end and led to frustration.  Today I took a quick survey of how badly the examples are broken by adding the following line to the beginning of the run_command function in check.exp trying:

    set command [ string map {"stap " "stap --bpf "} $command ]

Most of the examples fail.  There are a few that actually do appear to run are just doing things in probe begin or end handlers (with the exception of cachestat*) :

PASS: systemtap.examples/general/ansi_colors run
PASS: systemtap.examples/general/ansi_colors2 run
PASS: systemtap.examples/general/helloworld run
PASS: systemtap.examples/memory/cachestat run
PASS: systemtap.examples/memory/cachestat_bpf run
PASS: systemtap.examples/memory/kmalloc-top run

The non-bpf cachestat works because it just probes raw functions. Most examples fail because of various missing syscall.*/syscall_any probe points and gettimeofday_*() functions.  The time functions should be something easy to get working in bpf as there is already a nanosecond time function and its results could be scaled for microseconds, milliseconds, and seconds.

-Will
Reply | Threaded
Open this post in threaded view
|

Re: Getting systemtap examples working with --bpf backend

William Cohen
On 5/16/19 9:41 AM, William Cohen wrote:

> I noticed https://elinux.org/images/d/dc/Kernel-Analysis-Using-eBPF-Daniel-Thompson-Linaro.pdf mentioned on page 29 that many of the systemtap examples did not work with the bpf back end and led to frustration.  Today I took a quick survey of how badly the examples are broken by adding the following line to the beginning of the run_command function in check.exp trying:
>
>     set command [ string map {"stap " "stap --bpf "} $command ]
>
> Most of the examples fail.  There are a few that actually do appear to run are just doing things in probe begin or end handlers (with the exception of cachestat*) :
>
> PASS: systemtap.examples/general/ansi_colors run
> PASS: systemtap.examples/general/ansi_colors2 run
> PASS: systemtap.examples/general/helloworld run
> PASS: systemtap.examples/memory/cachestat run
> PASS: systemtap.examples/memory/cachestat_bpf run
> PASS: systemtap.examples/memory/kmalloc-top run
>
> The non-bpf cachestat works because it just probes raw functions. Most examples fail because of various missing syscall.*/syscall_any probe points and gettimeofday_*() functions.  The time functions should be something easy to get working in bpf as there is already a nanosecond time function and its results could be scaled for microseconds, milliseconds, and seconds.
>
> -Will
>

Hi,

The cachestat_bpf test has been folded into the cachestat test to minimize duplication.  The helloworld tests has been set to run with the bpf back end.  The kmalloc-top test is actually a perl script isn't running the generating code with the bpf backend.  The ansi_colors and ansi_colors2 compile and run but their output is not checked and it results do not have the proper color formatting like the regular systemtap version (due to not handling octal escapes, PR23559)

The systemtap example heavily leverage the systemtap tapsets.  Currently, the bpf has few tapsets.  Things like syscall.* probe points and gettimeofday_* functions are not available for systemtap bpf generation.  However, probably don't want to blindly duplicate all the tapsets in the tapsets/linux directory for tapsets/bpf.

There is already a bpf ktime_get_ns available.  If there was a time offset generated, then it should be possible to have gettimeofday_* functions for bpf, allowing some additional scripts to work. Alternatively, don't worry about the offset at the moment as most of the example are taking the difference between two gettimeofday_* function calls.

A number of examples are using multdimensional-arrays  this can also be implicit when using the @entry() operation. However, these examples are not going to work because of PR23478.  Examples such as hugepage_cow_delays fail in the folowing manner:


attempting command stap --bpf -p4 hugepage_cow_delays.stp
OUT semantic error: unhandled multi-dimensional array: identifier 'gettimeofday_us' at hugepage_cow_delays.stp:8:37
        source:     <<< (gettimeofday_us() - @entry(gettimeofday_us()))
                                                    ^

Pass 4: compilation failed.  [man error::pass4]
child process exited abnormally
RC 1


-Will
Reply | Threaded
Open this post in threaded view
|

Re: Getting systemtap examples working with --bpf backend

William Cohen
On 5/20/19 3:52 PM, William Cohen wrote:

> On 5/16/19 9:41 AM, William Cohen wrote:
>> I noticed https://elinux.org/images/d/dc/Kernel-Analysis-Using-eBPF-Daniel-Thompson-Linaro.pdf mentioned on page 29 that many of the systemtap examples did not work with the bpf back end and led to frustration.  Today I took a quick survey of how badly the examples are broken by adding the following line to the beginning of the run_command function in check.exp trying:
>>
>>     set command [ string map {"stap " "stap --bpf "} $command ]
>>
>> Most of the examples fail.  There are a few that actually do appear to run are just doing things in probe begin or end handlers (with the exception of cachestat*) :
>>
>> PASS: systemtap.examples/general/ansi_colors run
>> PASS: systemtap.examples/general/ansi_colors2 run
>> PASS: systemtap.examples/general/helloworld run
>> PASS: systemtap.examples/memory/cachestat run
>> PASS: systemtap.examples/memory/cachestat_bpf run
>> PASS: systemtap.examples/memory/kmalloc-top run
>>
>> The non-bpf cachestat works because it just probes raw functions. Most examples fail because of various missing syscall.*/syscall_any probe points and gettimeofday_*() functions.  The time functions should be something easy to get working in bpf as there is already a nanosecond time function and its results could be scaled for microseconds, milliseconds, and seconds.
>>
>> -Will
>>
>
> Hi,
>
> The cachestat_bpf test has been folded into the cachestat test to minimize duplication.  The helloworld tests has been set to run with the bpf back end.  The kmalloc-top test is actually a perl script isn't running the generating code with the bpf backend.  The ansi_colors and ansi_colors2 compile and run but their output is not checked and it results do not have the proper color formatting like the regular systemtap version (due to not handling octal escapes, PR23559)
>
> The systemtap example heavily leverage the systemtap tapsets.  Currently, the bpf has few tapsets.  Things like syscall.* probe points and gettimeofday_* functions are not available for systemtap bpf generation.  However, probably don't want to blindly duplicate all the tapsets in the tapsets/linux directory for tapsets/bpf.
>
> There is already a bpf ktime_get_ns available.  If there was a time offset generated, then it should be possible to have gettimeofday_* functions for bpf, allowing some additional scripts to work. Alternatively, don't worry about the offset at the moment as most of the example are taking the difference between two gettimeofday_* function calls.
>
> A number of examples are using multdimensional-arrays  this can also be implicit when using the @entry() operation. However, these examples are not going to work because of PR23478.  Examples such as hugepage_cow_delays fail in the folowing manner:
>
>
> attempting command stap --bpf -p4 hugepage_cow_delays.stp
> OUT semantic error: unhandled multi-dimensional array: identifier 'gettimeofday_us' at hugepage_cow_delays.stp:8:37
>         source:     <<< (gettimeofday_us() - @entry(gettimeofday_us()))
>                                                     ^
>
> Pass 4: compilation failed.  [man error::pass4]
> child process exited abnormally
> RC 1
>
>
> -Will
>
Hi,

Attached is a proposed tapset file for tapsets/bpf to provide gettimeofday_* function for scripts that are using the time of day.  It has a global variable for a time offset to convert the ktime_get_ns into gettimeofday_ns.  Expect that some type of probe begin would set that up, but don't have anything doing that yet. Any thoughts or comments about this?

-Will

timestamp_gtod.stp (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Getting systemtap examples working with --bpf backend

Frank Ch. Eigler

wcohen wrote:

> global __gtod_offset = 0 /* FIXME need to set appropriately on startup */

Is there a standard bpf function to get this value?  If so, it's trivial
to call it from a probe-begin and initialize this global.  If not, it's
a less trivial job for it to get a reserved spot in the same global
array where the bpf runtime communicates exit-ness with stapbpf.  Then
stapbpf could initialize this shared global at its startup.  Or a fake
stapbpf-special bpf function could provide this value, and again a
probe begin could save the value.

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: Getting systemtap examples working with --bpf backend

William Cohen
On 5/22/19 4:51 PM, Frank Ch. Eigler wrote:

>
> wcohen wrote:
>
>> global __gtod_offset = 0 /* FIXME need to set appropriately on startup */
>
> Is there a standard bpf function to get this value?  If so, it's trivial
> to call it from a probe-begin and initialize this global.  If not, it's
> a less trivial job for it to get a reserved spot in the same global
> array where the bpf runtime communicates exit-ness with stapbpf.  Then
> stapbpf could initialize this shared global at its startup.  Or a fake
> stapbpf-special bpf function could provide this value, and again a
> probe begin could save the value.
>
> - FChE
>

Hi,

The BPF helper libraries have a ktime_get_ns function to get the time, but there isn't a helper function to get that offset between start of the epoch and when the machine powered on, what ktime_get_ns uses as the start of time.

What might be feasible is to probe begin run in user space to compute that offset and put it in a bpf map.  According to https://blogs.oracle.com/linux/notes-on-bpf-3 :


Map actions

We can create/update, delete and lookup map information, both in BPF programs and in user-space. User-space map interactions are done via the BPF syscall. Their function signatures are slightly different to those of their in-kernel BPF program equivalents. In tools/lib/bpf/bpf.c wrappers for these actions are present:

This technique might also be useful for initialization information like syscall numbers<->names and other constants rather than trying to put everything into space constrained bpf code. However, not sure how that is going to be managed if multiple systemtap scripts are kicked off.

-Will
Reply | Threaded
Open this post in threaded view
|

Re: Getting systemtap examples working with --bpf backend

Serguei Makarov-2
On Wed, May 22, 2019 at 5:12 PM William Cohen <[hidden email]> wrote:
> We can create/update, delete and lookup map information, both in BPF programs and in user-space. User-space map interactions are done via the BPF syscall. Their function signatures are slightly different to those of their in-kernel BPF program equivalents. In tools/lib/bpf/bpf.c wrappers for these actions are present:
Yep. You can see in bpfinterp.cxx how the BPF-level helpers are then
implemented in terms of the syscalls.

> This technique might also be useful for initialization information like syscall numbers<->names and other constants rather than trying to put everything into space constrained bpf code. However, not sure how that is going to be managed if multiple systemtap scripts are kicked off.
Each stapbpf invocation creates its own separate set of maps to track
global variables, so there is no conflict. (What would be more
difficult is if you wanted to share a map between different stapbpf
processes for some reason.)

There was an upcoming BPF extension being discussed at LPC 2018 which
would allow loading constant data sections into the BPF program's
address space. Having that would simplify things a lot.
Reply | Threaded
Open this post in threaded view
|

Re: Getting systemtap examples working with --bpf backend

William Cohen
In reply to this post by William Cohen
On 5/16/19 9:41 AM, William Cohen wrote:

> I noticed https://elinux.org/images/d/dc/Kernel-Analysis-Using-eBPF-Daniel-Thompson-Linaro.pdf mentioned on page 29 that many of the systemtap examples did not work with the bpf back end and led to frustration.  Today I took a quick survey of how badly the examples are broken by adding the following line to the beginning of the run_command function in check.exp trying:
>
>     set command [ string map {"stap " "stap --bpf "} $command ]
>
> Most of the examples fail.  There are a few that actually do appear to run are just doing things in probe begin or end handlers (with the exception of cachestat*) :
>
> PASS: systemtap.examples/general/ansi_colors run
> PASS: systemtap.examples/general/ansi_colors2 run
> PASS: systemtap.examples/general/helloworld run
> PASS: systemtap.examples/memory/cachestat run
> PASS: systemtap.examples/memory/cachestat_bpf run
> PASS: systemtap.examples/memory/kmalloc-top run
>
> The non-bpf cachestat works because it just probes raw functions. Most examples fail because of various missing syscall.*/syscall_any probe points and gettimeofday_*() functions.  The time functions should be something easy to get working in bpf as there is already a nanosecond time function and its results could be scaled for microseconds, milliseconds, and seconds.
>
> -Will
>

Hi,

I have been looking at getting the syscall_any tapset working with
bpf.  If the syscall_any and syscall_any.return worked then the
following examples should work:

syscalls_by_pid.stp

There are other examples that use syscall_any and syscall_any.return,
but they have other issues like using multi-dimensional arrays, string
concentenation operations, or for loops that will prevent them from
working with the bpf backend.

I have made some modifications to provide the syscall_name and
syscall_num functions for bpf.  However, the code doesn't handle
32-bit compat syscalls properly.  This also gives warnings about
cross-file global variable references. This is on the
wcohen/bpf_syscall_any branch of systemtap git repo. Below is an
example running on x86_64:

$ ../install/bin/stap  --bpf -k -e 'probe oneshot {printf("%s\n", syscall_name(10))}'
WARNING: cross-file global variable reference to identifier '__syscall_32_num2name' at /home/wcohen/research/profiling/systemtap_write/install/share/systemtap/tapset/x86_64/syscall_num.stp:3:8 from: identifier '__syscall_32_num2name' at /home/wcohen/research/profiling/systemtap_write/install/share/systemtap/tapset/syscall_table.stp:8:16
 source:         return __syscall_32_num2name[num]
                        ^
WARNING: cross-file global variable reference to identifier '__syscall_64_num2name' at /home/wcohen/research/profiling/systemtap_write/install/share/systemtap/tapset/x86_64/syscall_num.stp:5:8 from: identifier '__syscall_64_num2name' at :11:12
 source:     return __syscall_64_num2name[num]
                    ^
WARNING: instance of overloaded function will never be reached: identifier 'syscall_name' at :5:10
 source: function syscall_name(num) {
                  ^
mprotect
Keeping temporary directory "/tmp/stapdkLJNc"


Taking a look at the syscall_any and syscall_any.return probes.  The
syscall_any can work in the existing bpf environment, but the
syscall_any.return uses some machine dependent C code possibly from
the kernel header to extract the syscall number the pt_regs. The
question is how this to keep it portable and avoid having the
debuginfo installed.  With the incomplete syscall_any tapset:

$ ../install/bin/stap  --bpf -k  testsuite/systemtap.examples/process/syscalls_by_pid.stp  -T 1
WARNING: cross-file global variable reference to identifier '__syscall_32_num2name' at /home/wcohen/research/profiling/systemtap_write/install/share/systemtap/tapset/x86_64/syscall_num.stp:3:8 from: identifier '__syscall_32_num2name' at /home/wcohen/research/profiling/systemtap_write/install/share/systemtap/tapset/syscall_table.stp:8:16
 source:         return __syscall_32_num2name[num]
                        ^
WARNING: cross-file global variable reference to identifier '__syscall_64_num2name' at /home/wcohen/research/profiling/systemtap_write/install/share/systemtap/tapset/x86_64/syscall_num.stp:5:8 from: identifier '__syscall_64_num2name' at :11:12
 source:     return __syscall_64_num2name[num]
                    ^
Collecting data... Type Ctrl-C to exit and display results
#SysCalls  PID
109        30875
117        27703
233        22764
317        20845
22         19624
1951       16890
219        14304
108        13864
548        13707
11         13665
110        13198
11         10789
11         10783
11         10780
11         10777
18         10520
1985       10372
17         10247
10         10014
19         9479
...

-Will