analysis of polling in systemtap

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

analysis of polling in systemtap

Martin Hunt
For this report, when I say "polling", I mean a systemtap script that
has a timer firing at fixed intervals so it can print out updated data.
Commonly this is done like so

probe timer.ms(1000)
{
 # clear screen
 # sort data
 # print top 'n' pieces of data
}

This timer probe presents implementation problems because on an MP
system, while the timer probe is printing data, kprobes may be firing
trying to update the data. We do things to minimize the problem, however
it is unavoidable that data will sometimes get dropped. Currently this
is a fatal situation, with an error printed and the script terminated.
"ERROR: locking timeout over variable global_called"

These conflicts should not be fatal and instead must simply result in a
quiet drop of data with perhaps an increment of a counter for drops.
There is a PR for this: http://sourceware.org/bugzilla/show_bug.cgi?
id=1379

In most of my scripts I like to clear the screen before each update.
Unfortunately if an error is hit, the error message is sometimes cleared
before the script exits. This is confusing because I don't know why my
script exited.  This is due to the buffering. Error and warning messages
are sent over a separate channel and are displayed immediately.
Obviously this is not always a good idea. Other times it is nice to be
able to separate them from normal program output. Suggestions?

To minimize dropped data, we should be as efficient as possible and
minimize the time global locks are held on data structures. That means,
if you only care about the top 10 elements in a map, don't print the
whole map.

For example, in a typical case,
probe timer.ms(500)
{
        cls()
        num_to_do = 10
        foreach ([n,f] in called-) {
                printf("%d called %s\t%d times\n", n, f, called[n,f])
                num_to_do--
                if (num_to_do <= 0)
                        break
        }
        delete called
}

For 107 elements in the called map,
sort took    46 usecs
foreach took 18 usecs
clear took    4 usecs
For larger arrays, the numbers get much worse, especially for sorts.

One way to improve these numbers is just find the top or bottom 'n'
elements in the array. For small values of 'n', this can be done much
more efficiently that sorting the whole array. I created a PR for this
http://sourceware.org/bugzilla/show_bug.cgi?id=2051

The syntax gets a bit confusing. I have never really liked having to use
a foreach to print an array. I think of them as objects and I want to
print them with a single line. For example,

print(called) - prints the whole map in a standard format.

printa(called, 10, "%1d called %2s\t%d times") - prints the first 10
elements in the 'called' array with a format string.  (%1d means print
key1 as decimal, etc).

These are more efficient than using foreach.

So the example timer probe might be written as:
probe timer.ms(500)
{
        cls()
        sortn(called, 10, SORT_VALUE)
        printa(called, 10, "%1d called %2s\t%d times")
        delete called
}

Comments?

Martin






Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Mathieu Desnoyers-2
* Martin Hunt ([hidden email]) wrote:

> For this report, when I say "polling", I mean a systemtap script that
> has a timer firing at fixed intervals so it can print out updated data.
> Commonly this is done like so
>
> probe timer.ms(1000)
> {
>  # clear screen
>  # sort data
>  # print top 'n' pieces of data
> }
>
> This timer probe presents implementation problems because on an MP
> system, while the timer probe is printing data, kprobes may be firing
> trying to update the data. We do things to minimize the problem, however
> it is unavoidable that data will sometimes get dropped. Currently this
> is a fatal situation, with an error printed and the script terminated.
> "ERROR: locking timeout over variable global_called"
>

You could also keep two arrays for your data. Your poller would first atomically
switch the pointer to the "empty" array before doing its sorting of the "full"
array. Once the pointer is changed, just wait until every writer to the "full"
array has finished (you might want to use a small lock for this, or RCU of your
timer can wait (if it's a thread)) and then play freely with your data.

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Frank Ch. Eigler
In reply to this post by Martin Hunt

hunt wrote:

> [...]  Unfortunately if an error is hit, the error message is
> sometimes cleared before the script exits. [...]  Error and warning
> messages are sent over a separate channel and are displayed
> immediately.  [...]

The error/warning messages already go to stderr, if I read the code
correctly.  A script that generates lots of stdout + stderr traffic
may need to be diagnosed the same way as for unix tools generally:
separate stderr redirection.

> [...]
> sortn(called, 10, SORT_VALUE)
>       printa(called, 10, "%1d called %2s\t%d times")
>       delete called
> [...]

One obvious problem with this is the race and possible mismatches
between the "sortn" and "printa" calls.  Compared to this, the
proposed sorted-limited-foreach is indeed somewhat slower, but
likewise requires the same number of script lines:

     foreach (x in called- limit 10)
       printf("%d called %s" ... )
     # there is also a race here.
     delete called

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Martin Hunt
On Mon, 2005-12-19 at 14:29 -0500, Frank Ch. Eigler wrote:

> hunt wrote:
>
> > [...]  Unfortunately if an error is hit, the error message is
> > sometimes cleared before the script exits. [...]  Error and warning
> > messages are sent over a separate channel and are displayed
> > immediately.  [...]
>
> The error/warning messages already go to stderr, if I read the code
> correctly.  A script that generates lots of stdout + stderr traffic
> may need to be diagnosed the same way as for unix tools generally:
> separate stderr redirection.

Yes, that's why it is done that way. Still, the problem is where error
messages scroll off the screen before the script terminates, leaving the
user puzzled about what happened. I was thinking maybe if the script
terminates abnormally, the last line printed before exiting should be an
error summary.

> > [...]
> > sortn(called, 10, SORT_VALUE)
> >       printa(called, 10, "%1d called %2s\t%d times")
> >       delete called
> > [...]
>
> One obvious problem with this is the race and possible mismatches
> between the "sortn" and "printa" calls.  

The same race condition exists with the generated code from foreach. A
write lock is needed for the sort and a readlock for the foreach loop.
There is a small race in there.  The only real risk is from someone
running multiple timer probes sorting the same data. In that case, they
deserve the confusing output they will probably get.  System stability
is not at risk.
 
> Compared to this, the
> proposed sorted-limited-foreach is indeed somewhat slower, but
> likewise requires the same number of script lines:
>
>      foreach ([x,y] in called- limit 10)
>        printf("%d called %s\t%d times", x, y, foo[x,y] )

This isn't bad either. It gets more complicated when you want stats and
histograms.

>      # there is also a race here.
>      delete called

Yes, this is a much more serious race. I've been thinking about how to
solve that. The best way to fix this from an efficiency and reliability
point of view is to simply add another flag to the runtime print and
aggregation commands to clear the map as it is printed. I don't have any
good ideas about how this would made part of the systemtap language.

Martin


Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Martin Hunt
In reply to this post by Mathieu Desnoyers-2
On Fri, 2005-12-16 at 15:18 -0500, Mathieu Desnoyers wrote:

> You could also keep two arrays for your data. Your poller would first atomically
> switch the pointer to the "empty" array before doing its sorting of the "full"
> array. Once the pointer is changed, just wait until every writer to the "full"
> array has finished (you might want to use a small lock for this, or RCU of your
> timer can wait (if it's a thread)) and then play freely with your data.

This is a good idea for maps.  But there is a problem of how the
translator determines when it should be done. If the maps are not
cleared in the timer probes, we don't want this. And it only applies to
maps, not pmaps.

There is a tradeoff here. Additional accuracy during polling of maps vs
additional memory and complexity. It may not be worth it. Timer events
that print data then clear are probably only used in situations where
high accuracy isn't necessary.

Martin



Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Martin Hunt
In reply to this post by Martin Hunt
On Mon, 2005-12-19 at 11:58 -0800, Martin Hunt wrote:

> >      # there is also a race here.
> >      delete called
>
> Yes, this is a much more serious race.

I shouldn't have said that.  It is only more serious if you consider the
problem of dueling timer probes printing the same data at the same time,
which isn't something I plan to worry about.

The real problem I want to solve is how to better handle timer events
that print data and then clear a pmap. Because clearing a large pmap can
be expensive and could be done for free while aggregating it.

> The best way to fix this from an efficiency and reliability
> point of view is to simply add another flag to the runtime print and
> aggregation commands to clear the map as it is printed. I don't have any
> good ideas about how this would made part of the systemtap language.
>
> Martin
>
>

Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Frank Ch. Eigler
Hi -

hunt wrote:

> [...] The real problem I want to solve is how to better handle timer
> events that print data and then clear a pmap. Because clearing a
> large pmap can be expensive and could be done for free while
> aggregating it.

This idiom (iterating then clearing an array) may well be frequent
enough to justify yet another addition to the "foreach" syntax, say a
"delete" keyword somewhere in the mix.  Before long, we'll get to the
Common Lisp LOOP macro.

- FChE
Reply | Threaded
Open this post in threaded view
|

RE: analysis of polling in systemtap

Stone, Joshua I
In reply to this post by Martin Hunt
Frank Ch. Eigler wrote:

> hunt wrote:
>> [...] The real problem I want to solve is how to better handle timer
>> events that print data and then clear a pmap. Because clearing a
>> large pmap can be expensive and could be done for free while
>> aggregating it.
>
> This idiom (iterating then clearing an array) may well be frequent
> enough to justify yet another addition to the "foreach" syntax, say a
> "delete" keyword somewhere in the mix.  Before long, we'll get to the
> Common Lisp LOOP macro.
>
> - FChE

Perhaps it's the wrong approach to invent a new syntax for everything
that users might want to do atomically on an array.  Can we instead
allow the user to explicitly lock/release variables?

This would give the user a lot of power over synchronization, but it
should still be safe.  We already have dynamic deadlock detection, so
that's ok.  It should be easy to add a dynamic check to make sure that
any locks acquired are released before the probe exits.  (This may even
be possible to check statically, though much harder.)  So the worst a
user could do with a buggy script is deadlock themselves and terminate;
the system remains unharmed.

We can still provide syntax to optimize some things, like the "aggregate
and clear" case that Martin describes, but it seems to me that we need a
more general functionality as well.


Josh

Reply | Threaded
Open this post in threaded view
|

Re: analysis of polling in systemtap

Frank Ch. Eigler
Hi -

On Wed, Dec 21, 2005 at 11:57:52AM -0800, Stone, Joshua I wrote:
> Perhaps it's the wrong approach to invent a new syntax for everything
> that users might want to do atomically on an array.  

Well, I hope not *everything*.  I was only kidding about LISP LOOP.

> Can we instead allow the user to explicitly lock/release variables?
> [...]

Perhaps ...  though it's been my preference to make implicit as many
facilities as reasonable and possible.  Can you imagine some plausible
scenarios where explicit locking over global variables may be
necessary to script programmers?

- FChE