statistics with intermediate results

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

statistics with intermediate results

Martin Peschke
Hi,

another question of mine:

If I want to provide latencies then I need to measure two times,
send time and receive time. I can calculate a latency
when I know both times, which requires the first time to be
kept somewhere until I have measured the second time.

The problem is where to put the first timestamp. It would
be per request. But when I use dynamic instrumentation, e.g.
systemtap, then I can't put some spare bytes in a
per request data structure to store intermediate results.

I guess, one could report all events, like send time, receive
time and so on, through systemtap and defer all processing to
a user land script. That's the Linux Kernel Event Trace Tool
approach:
http://sourceware.org/ml/systemtap/2005-q4/msg00458.html

 From a performance point of view, I am not sure it is the
fastet way of getting latencies, because it involves huge
amounts of data being generated by probes and being
reported through relayfs, while we can't use the benefits
of immediate data reduction as provided systemtap's statistics.

I am wondering whether dynamic instrumentation is the answer
to this kind of measurement requirements.

Thanks in advance for your thoughts.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

James Dickens
On 1/11/06, Martin Peschke <[hidden email]> wrote:
> Hi,
>
> another question of mine:
>
> If I want to provide latencies then I need to measure two times,
> send time and receive time. I can calculate a latency
> when I know both times, which requires the first time to be
> kept somewhere until I have measured the second time.
>

and really you don't need to keep all the results, basicly you could
just store min, max, and mean or medium and get the information you
would need for most tasks.


> The problem is where to put the first timestamp. It would
> be per request. But when I use dynamic instrumentation, e.g.
> systemtap, then I can't put some spare bytes in a
> per request data structure to store intermediate results.
>
> I guess, one could report all events, like send time, receive
> time and so on, through systemtap and defer all processing to
> a user land script. That's the Linux Kernel Event Trace Tool
> approach:

You can look at dtrace as an example it has agreations that store
events like this and give the ability to print them. you can also
quanitize the results as well.



> http://sourceware.org/ml/systemtap/2005-q4/msg00458.html
>
>  From a performance point of view, I am not sure it is the
> fastet way of getting latencies, because it involves huge
> amounts of data being generated by probes and being
> reported through relayfs, while we can't use the benefits
> of immediate data reduction as provided systemtap's statistics.
>
Agregations are what is needed, because you really don't need to store
all the data, just the best, worst and average cases.

> I am wondering whether dynamic instrumentation is the answer
> to this kind of measurement requirements.
>
> Thanks in advance for your thoughts.
>
> Martin
>
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

Frank Ch. Eigler
In reply to this post by Martin Peschke
Martin Peschke <[hidden email]> writes:

> [...]
> The problem is where to put the first timestamp. It would
> be per request. But when I use dynamic instrumentation, e.g.
> systemtap, then I can't put some spare bytes in a
> per request data structure to store intermediate results.

I don't understand what is blocking you.  There is no "per request
data structure" in systemtap - spare or otherwise.  You copy values
out of kernel side with the $target variables, and correlate them on
the script side.

You can declare and use as many script-side arrays as you see fit, and
index them as you see fit.  As long as you can recompute the same
index tuple (a pid, request pointer address, and/or whatever) at the
probe points that correspond to the beginning and the end of a
computation, just use the array to store the temporaries ("start
time").

Once you have a real result ("elapsed time") you want to store, put
that in a new array, which can be one that carries statistical values.
Use the "<<<" accumulation operator to add values, and the @avg etc.
operators to read results.


> [...] I guess, one could report all events, like send time, receive
> time and so on, through systemtap and defer all processing to a user
> land script. That's the Linux Kernel Event Trace Tool approach:
> [...]

It is a possible way, but not generally necessary for systemtap.


- FChE
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

Martin Peschke
Frank Ch. Eigler wrote:
> Martin Peschke <[hidden email]> writes:
>>But when I use dynamic instrumentation, e.g.
>>systemtap, then I can't put some spare bytes in a
>>per request data structure to store intermediate results.
>
> I don't understand what is blocking you.  There is no "per request
> data structure" in systemtap - spare or otherwise.

Sorry, I wasn't clear. I mean that I can't enhance kernel
data structures later on.

I could do so prior to kernel build in preparation of a tapset
that would make use of these spare bytes for temporaries, though.
I guess, this kind of access to temporaries would be fastest,
while it preserves most advantages of dynamic instrumentation.

> You can declare and use as many script-side arrays as you see fit, and
> index them as you see fit.  As long as you can recompute the same
> index tuple (a pid, request pointer address, and/or whatever) at the
> probe points that correspond to the beginning and the end of a
> computation, just use the array to store the temporaries ("start
> time").

Sounds feasible. I will give it a try. Thanks.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

Martin Peschke
In reply to this post by James Dickens
James Dickens wrote:
> On 1/11/06, Martin Peschke <[hidden email]> wrote:
>>If I want to provide latencies then I need to measure two times,
>>send time and receive time. I can calculate a latency
>>when I know both times, which requires the first time to be
>>kept somewhere until I have measured the second time.
>
> and really you don't need to keep all the results, basicly you could
> just store min, max, and mean or medium and get the information you
> would need for most tasks.

You might be right regarding min/max/avg being sufficient for
some cases. However, I think histograms can be useful for other
cases. Latency histograms might show several peaks, with
one or more of them being unexpected and worth a closer look.

Martin
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

Jose R. Santos
In reply to this post by Martin Peschke
Martin Peschke wrote:

>I guess, one could report all events, like send time, receive
>time and so on, through systemtap and defer all processing to
>a user land script. That's the Linux Kernel Event Trace Tool
>approach:
>http://sourceware.org/ml/systemtap/2005-q4/msg00458.html
>
> From a performance point of view, I am not sure it is the
>fastet way of getting latencies, because it involves huge
>amounts of data being generated by probes and being
>reported through relayfs, while we can't use the benefits
>of immediate data reduction as provided systemtap's statistics.
>

One of the things that we are doing with the Kernel event trace tool is
add the capabilities for users to add their own trace hooks.  One can
chose to probe a single point in the kernel instead of doing a full
trace.  In the end though it really depends on which has the greater
overhead; doing aggregation in the systemtap script or printing every
single event to userspace.  It's obvious who win here.

One key advantage of having a trace is that it allows you to run once
and analyze in many different ways.  Like you said, histograms can be
very useful.

Good Luck

-JRS
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

William Cohen
In reply to this post by Martin Peschke
Martin Peschke wrote:

> Hi,
>
> another question of mine:
>
> If I want to provide latencies then I need to measure two times,
> send time and receive time. I can calculate a latency
> when I know both times, which requires the first time to be
> kept somewhere until I have measured the second time.
>
> The problem is where to put the first timestamp. It would
> be per request. But when I use dynamic instrumentation, e.g.
> systemtap, then I can't put some spare bytes in a
> per request data structure to store intermediate results.
>
> I guess, one could report all events, like send time, receive
> time and so on, through systemtap and defer all processing to
> a user land script. That's the Linux Kernel Event Trace Tool
> approach:
> http://sourceware.org/ml/systemtap/2005-q4/msg00458.html
>
>  From a performance point of view, I am not sure it is the
> fastet way of getting latencies, because it involves huge
> amounts of data being generated by probes and being
> reported through relayfs, while we can't use the benefits
> of immediate data reduction as provided systemtap's statistics.
>
> I am wondering whether dynamic instrumentation is the answer
> to this kind of measurement requirements.
>
> Thanks in advance for your thoughts.
>
> Martin

Associative arrays can be used for this purpose. Use the pointer to the
data structure as a key for the associative array. Store start time in
the associative array. Then when the data structure is encountered for
the completion operation fetch the time from the associative array and
compute the elapsed time.

How many outstanding operations are there going to be at any given time?

-Will
Reply | Threaded
Open this post in threaded view
|

Re: statistics with intermediate results

Martin Peschke
William Cohen wrote:
> Associative arrays can be used for this purpose. Use the pointer to the
> data structure as a key for the associative array. Store start time in
> the associative array. Then when the data structure is encountered for
> the completion operation fetch the time from the associative array and
> compute the elapsed time.
>
> How many outstanding operations are there going to be at any given time?

It depends...

For SCSI, there are certain limits for tagged command queueing.
Devices may impose limits, adapters may impose limits, other layers
and subsystems in Linux might impose limits (blocklayer?
SCSI mid layer?).

For the IBM zSeries FCP adapter driver there used to be
(rather arbitrary) limits of 32 concurrent commands per LUN
and 4096 concurrent commands per adapter. Experience shows that
when running some I/O stress workloadt or benchmark, we manage
to hit these limits easily.

In short, I would expect to see up to hundreds or maybe even thousands
of outstanding operations going to be at any given time for systems
like database servers, that is, for systems that are likely candidates
for a performance analysis.

I am a little concerned that searching huge systemtap arrays for
each request could be too expensive. But I don't know much about
the bowels of the systemtap runtime.

Martin
Reply | Threaded
Open this post in threaded view
|

RE: statistics with intermediate results

bibo,mao-2
In reply to this post by Martin Peschke
Currently systemtap is suitable for short time performance statistics. Sometimes user mainly want to get statistical raw data by sysmtemtap, and does not need analyze this data real-time. For long time running, I think it need Ping-Pong buffer, switch the buffer when relay the data into user layer by RelayFS and keep in trace on.

bibo,mao

>-----Original Message-----
>From: [hidden email] [mailto:[hidden email]]
>On Behalf Of Martin Peschke
>Sent: 2006年1月13日 1:12
>To: William Cohen
>Cc: [hidden email]
>Subject: Re: statistics with intermediate results
>
>William Cohen wrote:
>> Associative arrays can be used for this purpose. Use the pointer to the
>> data structure as a key for the associative array. Store start time in
>> the associative array. Then when the data structure is encountered for
>> the completion operation fetch the time from the associative array and
>> compute the elapsed time.
>>
>> How many outstanding operations are there going to be at any given time?
>
>It depends...
>
>For SCSI, there are certain limits for tagged command queueing.
>Devices may impose limits, adapters may impose limits, other layers
>and subsystems in Linux might impose limits (blocklayer?
>SCSI mid layer?).
>
>For the IBM zSeries FCP adapter driver there used to be
>(rather arbitrary) limits of 32 concurrent commands per LUN
>and 4096 concurrent commands per adapter. Experience shows that
>when running some I/O stress workloadt or benchmark, we manage
>to hit these limits easily.
>
>In short, I would expect to see up to hundreds or maybe even thousands
>of outstanding operations going to be at any given time for systems
>like database servers, that is, for systems that are likely candidates
>for a performance analysis.
>
>I am a little concerned that searching huge systemtap arrays for
>each request could be too expensive. But I don't know much about
>the bowels of the systemtap runtime.
>
>Martin