return probe not executed on SMP system

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

return probe not executed on SMP system

Guang Lei Li
Hi,

  I met some difficulties when dealing with the return probe on a
multi-processor system(Power5 System, 4 CPU).

  This is the stap script I used:

global counter

function info()
%{
  struct task_struct *cur = current;
  _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid,
cur->thread_info->cpu);
%}

probe kernel.function("sys_read")
{
  if(pid() == target())
  {
    counter--
    info()
    log("pid:".string(pid())." target:".string(target())."entry")
  }
}

probe kernel.function("sys_read").return
{
  if(pid() == target())
  {
    counter++
    info()
    log("pid:".string(pid())." target:".string(target())."return")
  }
}

probe begin
{
  counter=100
}

probe end
{
  log("counter: ".string(counter))
}

then I run:
  stap -g a.stp -c "ls > a"

The output:

root:/root/temp>stap -g b.stp -c "ls > a"

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry

|3713|3713|3|0|pid:3713 target:3713entry
counter: 95

 It seemed that the return probe didn't work for me.
 I tried the same script on a uni-processor x86 system, it worked fine.

 And I also tried to write a simple c application which will open a file,
and read some data from this file. I run it:
   stap -g b.stp -c "./a.out"
 It gave the output like:

...
|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881return

|3881|3881|0|0|pid:3881 target:3881entry

|3881|3881|0|0|pid:3881 target:3881return

|3881|3881|0|0|pid:3881 target:3881entry
....

|3881|3881|3|0|pid:3881 target:3881entry

|3881|3881|3|0|pid:3881 target:3881return
counter: 33

 You can see that there are still some return probes not be executed at
all(if all are executed, the counter should be 100).

 Could anybody give me a hint about this problem?

Best Regards,

Li Guanglei
Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Hien Nguyen
You probably run out out kretprobe instances. When the return probe runs
out kretprobe instances (maxactive), it just skips and increase an
internal counter "nmissed".

Systemtap does not have the interface to change kretprobe->maxactive (in
this case increase the maxactive would help) or examine kretprobe->nmissed.

Hien.

Guang Lei Li wrote:

>Hi,
>
>  I met some difficulties when dealing with the return probe on a
>multi-processor system(Power5 System, 4 CPU).
>
>  This is the stap script I used:
>
>global counter
>
>function info()
>%{
>  struct task_struct *cur = current;
>  _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid,
>cur->thread_info->cpu);
>%}
>
>probe kernel.function("sys_read")
>{
>  if(pid() == target())
>  {
>    counter--
>    info()
>    log("pid:".string(pid())." target:".string(target())."entry")
>  }
>}
>
>probe kernel.function("sys_read").return
>{
>  if(pid() == target())
>  {
>    counter++
>    info()
>    log("pid:".string(pid())." target:".string(target())."return")
>  }
>}
>
>probe begin
>{
>  counter=100
>}
>
>probe end
>{
>  log("counter: ".string(counter))
>}
>
>then I run:
>  stap -g a.stp -c "ls > a"
>
>The output:
>
>root:/root/temp>stap -g b.stp -c "ls > a"
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>counter: 95
>
> It seemed that the return probe didn't work for me.
> I tried the same script on a uni-processor x86 system, it worked fine.
>
> And I also tried to write a simple c application which will open a file,
>and read some data from this file. I run it:
>   stap -g b.stp -c "./a.out"
> It gave the output like:
>
>...
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>....
>
>|3881|3881|3|0|pid:3881 target:3881entry
>
>|3881|3881|3|0|pid:3881 target:3881return
>counter: 33
>
> You can see that there are still some return probes not be executed at
>all(if all are executed, the counter should be 100).
>
> Could anybody give me a hint about this problem?
>
>Best Regards,
>
>Li Guanglei
>
>  
>

Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Hien Nguyen
In reply to this post by Guang Lei Li
Hi,
After looking deep into this issue I found out the problem is in the
stpd. The sptd unloads the systemtap module a little too early before
the return probes have a chance to fire their handlers.

If you have the systemtap src tree try this tempory_fix.patch. I will
file a bugzilla tomorrow.

Hien.



Guang Lei Li wrote:

>Hi,
>
>  I met some difficulties when dealing with the return probe on a
>multi-processor system(Power5 System, 4 CPU).
>
>  This is the stap script I used:
>
>global counter
>
>function info()
>%{
>  struct task_struct *cur = current;
>  _stp_printf("\n|%ld|%ld|%ld|%u|", cur->pid, cur->tgid,
>cur->thread_info->cpu);
>%}
>
>probe kernel.function("sys_read")
>{
>  if(pid() == target())
>  {
>    counter--
>    info()
>    log("pid:".string(pid())." target:".string(target())."entry")
>  }
>}
>
>probe kernel.function("sys_read").return
>{
>  if(pid() == target())
>  {
>    counter++
>    info()
>    log("pid:".string(pid())." target:".string(target())."return")
>  }
>}
>
>probe begin
>{
>  counter=100
>}
>
>probe end
>{
>  log("counter: ".string(counter))
>}
>
>then I run:
>  stap -g a.stp -c "ls > a"
>
>The output:
>
>root:/root/temp>stap -g b.stp -c "ls > a"
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>
>|3713|3713|3|0|pid:3713 target:3713entry
>counter: 95
>
> It seemed that the return probe didn't work for me.
> I tried the same script on a uni-processor x86 system, it worked fine.
>
> And I also tried to write a simple c application which will open a file,
>and read some data from this file. I run it:
>   stap -g b.stp -c "./a.out"
> It gave the output like:
>
>...
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>
>|3881|3881|0|0|pid:3881 target:3881return
>
>|3881|3881|0|0|pid:3881 target:3881entry
>....
>
>|3881|3881|3|0|pid:3881 target:3881entry
>
>|3881|3881|3|0|pid:3881 target:3881return
>counter: 33
>
> You can see that there are still some return probes not be executed at
>all(if all are executed, the counter should be 100).
>
> Could anybody give me a hint about this problem?
>
>Best Regards,
>
>Li Guanglei
>
>  
>

--- src.old/runtime/stpd/librelay.c 2005-10-19 12:35:35.000000000 -0700
+++ src-20051029/runtime/stpd/librelay.c 2005-11-03 17:06:51.000000000 -0800
@@ -729,11 +729,16 @@
  case STP_START:
  {
  struct transport_start *t = (struct transport_start *)data;
+ unsigned int mywait= 0xffffffff;
  dbug("probe_start() returned %d\n", t->pid);
+
  if (t->pid < 0)
  cleanup_and_exit(0);
  else if (target_cmd)
  kill (target_pid, SIGUSR1);
+ while(mywait> 0) {
+ mywait--;
+ }
  break;
  }
  default:
Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Guang Lei Li
Hien Nguyen <[hidden email]> wrote on 2005-11-04 09:10:53:

> Hi,
> After looking deep into this issue I found out the problem is in the
> stpd. The sptd unloads the systemtap module a little too early before
> the return probes have a chance to fire their handlers.
>
> If you have the systemtap src tree try this tempory_fix.patch. I will
> file a bugzilla tomorrow.
>
> Hien.
>
I applied your patch, and I also raise the maxaction from 1 to 10, then
it worked really well, all return probed are caught.

But the default setting of maxaction to 1 is too low for a multi-processor
system. It will cause a lot of misses of kretprobe(I counter it on Power5
system for a while, about 2000 kprobes are caught, but only about 400
kretprobe was caught, so about 1600 kretprobes are missed)

I read the "kernel/kprobes.c", and found:

/* Pre-allocate memory for max kretprobe instances */
if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
                rp->maxactive = max(10, 2 * NR_CPUS);
#else
                rp->maxactive = NR_CPUS;
#endif
        }

Is it appropriate to just delete "if (rp->maxactive <= 0)", to change the
above block as:

#ifdef CONFIG_PREEMPT
     rp->maxactive = max(10, 2 * NR_CPUS);
#else
     rp->maxactive = NR_CPUS;
#endif

so that kretprobe will never miss on a multi-processor system?

Thanks for your help.


Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Hien Nguyen
Guang Lei Li wrote:

>I applied your patch, and I also raise the maxaction from 1 to 10, then
>it worked really well, all return probed are caught.
>  
>
Do you mean maxactive? maxactive is not default to 1.

>But the default setting of maxaction to 1 is too low for a multi-processor
>system. It will cause a lot of misses of kretprobe(I counter it on Power5
>system for a while, about 2000 kprobes are caught, but only about 400
>kretprobe was caught, so about 1600 kretprobes are missed)
>
>I read the "kernel/kprobes.c", and found:
>
>/* Pre-allocate memory for max kretprobe instances */
>if (rp->maxactive <= 0) {
>#ifdef CONFIG_PREEMPT
>                rp->maxactive = max(10, 2 * NR_CPUS);
>#else
>                rp->maxactive = NR_CPUS;
>#endif
>        }
>  
>

By default maxactive is set to a value as in the #ifdef   #endif clause
above

>Is it appropriate to just delete "if (rp->maxactive <= 0)", to change the
>above block as:
>  
>
No, you should not make that change. When register a return probe, user
has an option to change maxactive.
We need to implement the systemtap's interface to make that change in
the future.

Hien.


Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Guang Lei Li
Hien Nguyen <[hidden email]> wrote on 2005-11-04 13:03:28:

> Guang Lei Li wrote:
>
> >I applied your patch, and I also raise the maxaction from 1 to 10, then
> >it worked really well, all return probed are caught.
> >
> >
> Do you mean maxactive? maxactive is not default to 1.

Thanks for your answer.

Yes, I mean maxactive, not maxaction.

I grep the systemTap source code, and found in line 2590 of tapsets.cxx:

 o->newline() << probe_name << ".maxactive = 1;";

So systemTap set the maxactive to 1. Am I right?

>
> >But the default setting of maxaction to 1 is too low for a
multi-processor
> >system. It will cause a lot of misses of kretprobe(I counter it on
Power5

> >system for a while, about 2000 kprobes are caught, but only about 400
> >kretprobe was caught, so about 1600 kretprobes are missed)
> >
> >I read the "kernel/kprobes.c", and found:
> >
> >/* Pre-allocate memory for max kretprobe instances */
> >if (rp->maxactive <= 0) {
> >#ifdef CONFIG_PREEMPT
> >                rp->maxactive = max(10, 2 * NR_CPUS);
> >#else
> >                rp->maxactive = NR_CPUS;
> >#endif
> >        }
> >
> >
>
> By default maxactive is set to a value as in the #ifdef   #endif clause
> above
>
> >Is it appropriate to just delete "if (rp->maxactive <= 0)", to change
the
> >above block as:
> >
> >
> No, you should not make that change. When register a return probe, user
> has an option to change maxactive.
> We need to implement the systemtap's interface to make that change in
> the future.
>
for systemTap, the kretprobe would never change the maxactive because
systemTap
has already set maxactive to 1.

We could implement a systemTap's interface to do this. But why not let
kretprobe
automatically detects and sets the real value of maxactive?

> Hien.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Hien Nguyen
Guang Lei Li wrote:

>
>I grep the systemTap source code, and found in line 2590 of tapsets.cxx:
>
> o->newline() << probe_name << ".maxactive = 1;";
>
>So systemTap set the maxactive to 1. Am I right?
>
>  
>
I did not know that. Systemtap should not set maxactive=1, it should be
0 instead.




Reply | Threaded
Open this post in threaded view
|

Re: return probe not executed on SMP system

Hien Nguyen
In reply to this post by Guang Lei Li

>I grep the systemTap source code, and found in line 2590 of tapsets.cxx:
>
> o->newline() << probe_name << ".maxactive = 1;";
>
>So systemTap set the maxactive to 1. Am I right?
>
>  
>
Confirm! You are right, systemtap set maxactive=1. It should set it to 0
or do not set it at all so kprobes can set it to the default value.