SDTs with data types and argument names

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

SDTs with data types and argument names

Craig Ringer
SystemTap has inherited the dtrace decision to give SDTs anonymous
arguments of type 'long' and generic names like arg1, arg2, etc.

This makes sense if you're trying to be DTrace compatible, but I don't
think stap is really trying to be very dtrace-like at runtime.

It'd be great to capture the probe argument names and their data types to
systemtap when SDTs are generated from a probes.d file. It'd make sense to
expose this capability for when probes are defined with STAP_PROBE(...) etc
in their own builds too.

The goal is to let you write

probe process("myapp").mark("some__tracepoint")
{
    printf("hit some__tracepoint(%s, %d)\n",
        user_string(useful_firstarg_name),
        some_secondarg->somemember->somelongmember);
}

and display useful arg names and types in `stap -L` too.

Saving the argument names looks relatively simple in most cases. Define an
additional set of macros in the usual STAP_PROBE2() etc style like the
following pseudoishcode:

    STAP_PROBE2_ARGAMES(provider, probename, argname1, argname2) \
        const char "__stap_argnames_" ## provider ## "_" ## probename ##
[2][] \
              = { #argname1, #argname2 } \
        __attribute__ ((unused)) \
        __attribute__ ((section (".probes")));

i.e generate some constant data with the probe names in a global array we
can look up when compiling the tapscript based on the provider and probe
name.

The 'dtrace' script could emit these automatically into the generated
probes.h and the compiler would de-duplicate them at link-time. But it'd be
cleaner if they were embedded into the .o optionally generated by the
dtrace script.

A nearly identical approach could be used to give systemtap access to the
textual datatype names for probes declared in probes.d. Or we could even
use gcc's __typeof__ to derive them.

Applications that wanted to expose type and arg info for probes would have
to do so explicitly by invoking STAP_PROBEn_ARGAMES(...) and
STAP_PROBEn_ARGTYPES(...) with the names and types of the probe somewhere
in global scope, outside the probe callsite. Which is a bit inconvenient,
but not that hard.

That is, unless there's some way we can escape the function scope in which
the STAP_PROBEn(...) macro is invoked and define global symbols. I've asked
for ideas about that here: https://stackoverflow.com/q/59402666/398670 .
If that's possible then ideally I'd like to use the gcc __typeof__ operator
to autogenerate typed symbols for each argument or to generate a
const array of type names. Also to have a variant that uses token pasting
to derive the argument names automatically, though we'd still need a
variant that lets you specify them explicitly for when the args are
expressions not simple variable name tokens.

So my hope is it'll be possible to write

    STAP_PROBE2(myprovider, myprobe, thing->foo, "foo", get_something(),
"something");

and have stap record the supplied argnames and infer the typeinfo then
record that too, so it can look it up during tapscript translation. Instead
of

    x = user_string($arg1)
    y = @cast(arg2, "[hidden email]","/path/to/myprogram")->somevalue

you'd be able to write

    x = user_string($foo)
    y = $something->somevalue

and perhaps even more importantly, `stap -L` could show useful type and
argname info for probes so they'd serve as documentation of sorts.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: SDTs with data types and argument names

Craig Ringer
On Thu, 19 Dec 2019 at 11:00, Craig Ringer <[hidden email]> wrote:

>
> That is, unless there's some way we can escape the function scope in which
> the STAP_PROBEn(...) macro is invoked and define global symbols. I've asked
> for ideas about that here: https://stackoverflow.com/q/59402666/398670 .
>

Looks like that's quite practical using the existing .pushsection and
.popsection features used in the existing <sys/sdt.h>. If building without
__ASSEMBLER__ we would treat STAP_PROBEn_ARGINFO(...) the same as
STAP_PROBEn(...) i.e. not generate arg info. But we could still emit it for
probes defined via a probes.d .

Will try to find time to draft a patch. My first foray into asm and custom
ELF sections...
Reply | Threaded
Open this post in threaded view
|

Re: SDTs with data types and argument names

Frank Ch. Eigler
In reply to this post by Craig Ringer

Hi -

> It'd be great to capture the probe argument names and their data types to
> systemtap when SDTs are generated from a probes.d file. It'd make sense to
> expose this capability for when probes are defined with STAP_PROBE(...) etc
> in their own builds too.

Yeah.  I believe there was a kernel-bpf-oriented group last year, who
were speculating extending sdt.h in a similarly motivated way.


> The goal is to let you write
>
> probe process("myapp").mark("some__tracepoint")
> {
>     printf("hit some__tracepoint(%s, %d)\n",
>         user_string(useful_firstarg_name),
>         some_secondarg->somemember->somelongmember);
> }
> and display useful arg names and types in `stap -L` too.

Note that one point of the sdt.h structure was to make the executables
self-sufficient with respect to extracting this data, even if there is
no debuginfo available.  Adding type names can only work if that
debuginfo is available after all, or else if it's synthetically
generated via @cast("<foo.h>") type constructs.


> Saving the argument names looks relatively simple in most cases. Define an
> additional set of macros in the usual STAP_PROBE2() etc style like the
> following pseudoishcode:
>
>     STAP_PROBE2_ARGAMES(provider, probename, argname1, argname2) \
>         const char "__stap_argnames_" ## provider ## "_" ## probename ##
> [2][] \
>               = { #argname1, #argname2 } \
>         __attribute__ ((unused)) \
>         __attribute__ ((section (".probes")));
>
> i.e generate some constant data with the probe names in a global array we
> can look up when compiling the tapscript based on the provider and probe
> name.

Yeah, that's a sensible way of doing it, without creating a new note
format or anything.  It's important that the section be marked with
attributes that will force it to be pulled into the main executable
via the usual linker scripts.

> [...]
> So my hope is it'll be possible to write
>
>     STAP_PROBE2(myprovider, myprobe, thing->foo, "foo", get_something(),
> "something");
>
> and have stap record the supplied argnames and infer the typeinfo then
> record that too, so it can look it up during tapscript translation.

(FWIW, I wouldn't consider it a failure if the typeinfo has to be
manually added.)


- FChE

Reply | Threaded
Open this post in threaded view
|

Re: SDTs with data types and argument names

Craig Ringer
On Fri, 10 Jan 2020 at 02:46, Frank Ch. Eigler <[hidden email]> wrote:

> > It'd be great to capture the probe argument names and their data types to
> > systemtap when SDTs are generated from a probes.d file. It'd make sense
> to
> > expose this capability for when probes are defined with STAP_PROBE(...)
> etc
> > in their own builds too.
>
> Yeah.  I believe there was a kernel-bpf-oriented group last year, who
> were speculating extending sdt.h in a similarly motivated way.
>

Good to know. Any idea who may've been involved? It'd be good to
collaborate and not duplicate work or explore a dead-end already followed.


> > The goal is to let you write
> >
> > probe process("myapp").mark("some__tracepoint")
> > {
> >     printf("hit some__tracepoint(%s, %d)\n",
> >         user_string(useful_firstarg_name),
> >         some_secondarg->somemember->somelongmember);
> > }
> > and display useful arg names and types in `stap -L` too.
>
> Note that one point of the sdt.h structure was to make the executables
> self-sufficient with respect to extracting this data, even if there is
> no debuginfo available.  Adding type names can only work if that
> debuginfo is available after all, or else if it's synthetically
> generated via @cast("<foo.h>") type constructs.
>

Indeed. And the latter option is hairy for complex and portable software:
you must get exactly the right header version, but you must also ensure you
have any number of preprocessor macros etc set precisely the same. There
can be header inclusion order considerations and more. I'm very reluctant
to use the automated header processing features.

Without debuginfo we'd still get useful probe names, which would IMO be
exceedingly useful. stap could expose them as $theArgName and still expose
them as $arg1 etc for BC, so that wouldn't upset anyone. It might also let
stap handle narrower integer types better. And *if* debuginfo was present,
it could allow the user to traverse structs etc via
$theArgName->member->foo .

I don't know of any way to ask gcc/gdb/binutils/etc to retain a subset of
debuginfo in an executable when it's being stripped, and I doubt that'd be
popular or accepted anyway. Where would you stop? In many cases the
immediate struct would be of little value without type info for its member
types and their member types and so on. So I realise that it's no
substitute for debuginfo, and doesn't make it possible to get full
functionality without it.

What it _should_ do is put static probes on an equal footing with DWARF
probes when debuginfo is present. Right now they're inferior in quite a
number of ways: no argument names, no argument types without explicit and
verbose casting, representations in monitor mode are hex statement
positions not probe names, and more.

> Saving the argument names looks relatively simple in most cases. Define an
> > additional set of macros in the usual STAP_PROBE2() etc style like the
> > following pseudoishcode:
> >
> >     STAP_PROBE2_ARGAMES(provider, probename, argname1, argname2) \
> >         const char "__stap_argnames_" ## provider ## "_" ## probename ##
> > [2][] \
> >               = { #argname1, #argname2 } \
> >         __attribute__ ((unused)) \
> >         __attribute__ ((section (".probes")));
> >
> > i.e generate some constant data with the probe names in a global array we
> > can look up when compiling the tapscript based on the provider and probe
> > name.
>
> Yeah, that's a sensible way of doing it, without creating a new note
> format or anything.  It's important that the section be marked with
> attributes that will force it to be pulled into the main executable
> via the usual linker scripts.
>

I'll look into that.

This won't be something I can leap to do in a hurry as I have to fit it in
bits and pieces around main deliverables. I'm sure you know the feeling.
But I'm keen to work on it when I get the chance.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Reply | Threaded
Open this post in threaded view
|

Re: SDTs with data types and argument names

Frank Ch. Eigler
Hi -

> > Yeah.  I believe there was a kernel-bpf-oriented group last year, who
> > were speculating extending sdt.h in a similarly motivated way.
>
> Good to know. Any idea who may've been involved? It'd be good to
> collaborate and not duplicate work or explore a dead-end already followed.

https://web.archive.org/web/20190528152614/http://vger.kernel.org/lpc-bpf2018.html#session-11

"enhancing user defined tracepoints"

(h/t serhei) (btw, where did vger itself go???  did it merge with
Decker and disappeared into another dimension?)


> Indeed. And the latter option is hairy for complex and portable software:
> you must get exactly the right header version, but you must also ensure you
> have any number of preprocessor macros etc set precisely the same. There
> can be header inclusion order considerations and more. I'm very reluctant
> to use the automated header processing features.

Those provisos are all valid, yet it turns out to be useful & capable
a lot of the time.  If there is a "-devel" level packaged set of
headers, they should be well enough engineered to let this work.


> [...]  I don't know of any way to ask gcc/gdb/binutils/etc to retain
> a subset of debuginfo in an executable when it's being stripped, and
> I doubt that'd be popular or accepted anyway. [...]

See "BTF" and "CTF" for two efforts to keep some wee subsets of
debuginfo on the installation medium.  And see
debuginfod.systemtap.org :-) for a distribution vehicle for full
mainstream debuginfo.


- FChE

Reply | Threaded
Open this post in threaded view
|

Re: SDTs with data types and argument names

Frank Ch. Eigler
Hi -

> > Good to know. Any idea who may've been involved? It'd be good to
> > collaborate and not duplicate work or explore a dead-end already followed.
>
> https://web.archive.org/web/20190528152614/http://vger.kernel.org/lpc-bpf2018.html#session-11
>
> "enhancing user defined tracepoints"

I believe this was also:

https://www.linuxplumbersconf.org/event/2/contributions/123/

- FChE