[TAPSETS] Linux Kernel Event Trace Tool

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
Hi folks,

  As mentioned by Jose this morning, we are developing a kernel event
trace tool using systemTap. It is used to collect the probed data
which will then be processed by a post-processing tool.This tool aims
to provide an easy way to trace various events inside the kernel for
the purpose of performance analysis.

  Although its requirement come internally, we hope this tool to be
open and could have a broader audience instead of confining it inside
IBM.

  As a start point, we implemented only a limited number of event
hooks. It now contains 6 kinds of events and totally 16 events.

  Here's the list we've implemented by now:

 System Call:
    entry of syscall
    return of syscall
 Process Creation:
    fork
    execve
 IO Scheduler:
    a request is retrieved from request queue
    a request is added to the request queue
    a request is removed from the request queue
 Task Dispatching(scheduler):
    context switch
    cpu idle
 Backtrace
    backtrace when cpu is idle
    backtrace when an io request is dispatched to scsi layer
 Scsi Activities
    mid-layer prepared a IO request
    Dispatch a command to the LLD
    I/O is done by LLD
    mid-layer processes the completed IO

  I posted a piece of codes from this tool onto mailing list a few
days ago. I've incorporated some of the feedbacks from the community
into this tool.

  I added some comments to the source codes to help better understand
this trace tool. I've tested this tool on 2.6.9-24EL on Power5, and
some test on 2.6.14.2 on i386. But there are still some bugs in
systemtap or elfutils that make some event hooks fail to run. I
addressed these exceptions in the comments of the src codes.

  Since the codes is a little larger to be inlined into this mail, I
also send it as an attachment.

  Suggestions, Comments, Flames are welcome. And thanks in advance
for your help.

<<<<<<<<<<<<<<<<<<<<< source codes >>>>>>>>>>>>>>>>>>>>>>>>>>>

--------------------- sample.stp ------------------------
/* Attention:
   If you choose to run all the following probes, your system
   will become rather slow.

   the log_user_data() in each event probe is only a demo of
   the existance of this function. The data logged by it is
   rather meaningless.
*/
probe begin
{
    log("begin")
    /* to take a snapshot of all the processes */
    process_snapshot()
}

/* to trace system calls entry, hookID = 1 */
probe addevent.syscall.entry
{
    log_user_data("Prototype for syscall entry, Ver: 0.1")
}

/* to trace system calls exit, hookID = 2 */
probe addevent.syscall.return
{
    log_user_data("Prototype for syscall return, Ver: 0.1")
}

/* to trace process creation, include "fork" and "execve",
  hookID: 3 for execve and 4 for fork */
probe addevent.process
{
    log_user_data("Prototype for process creation, Ver: 0.1")
}

/*
 to trace io scheduler activities,
 hookID 5: a request is retrieved from request queue
 hookID 6: a request is added to the request queue
 hookID 7: a request is removed from the request queue
*/
probe addevent.ioscheduler
{
    log_user_data("Prototype for io scheduler, Ver: 0.1")
}

/* to trace tasks dispatching.
  hookID: 8 for context switch and 9 when cpu is idle */
probe addevent.tskdispatch
{
    log_user_data("Prototype for tskdispatch, Ver: 0.1")
}

/* to trace backtrace,
  hookID=10: for cpuidle
  hookID=11: when an io request is dispatched to scsi layer,
      failed on PPC64, refer to backtrace.stp */
probe addevent.backtrace
{
    log_user_data("Prototype for backtrace, Ver: 0.1")
}

/* to trace scsi activities,
  failed to run on PPC64, refer to scsi.stp

  hookID = 13: mid-layer prepare a IO request
  hookID = 14: Dispatch a command to the low-level driver
  hookID = 15: I/O is done by low-level driver
  hookID = 16: mid-layer processes the completed IO
*/
probe addevent.scsi
{
    log_user_data("Prototype for scsi, Ver: 0.1")
}

probe end
{
    log("end")
}


--------------- tapsets/hookid_defs.stp ----------------
global
    /* syscall */
    HOOKID_SYSCALL_ENTRY, HOOKID_SYSCALL_RETURN,

    /* process creation */
    HOOKID_PROCESS_FORK, HOOKID_PROCESS_EXECVE,

    /* io scheduler */
    HOOKID_IOSCHED_NEXT_REQ, HOOKID_IOSCHED_ADD_REQ,
    HOOKID_IOSCHED_REMOVE_REQ,

    /* task dispatching */
    HOOKID_TASK_CTXSWITCH, HOOKID_TASK_CPUIDLE,

    /* backtrace */
    HOOKID_BT_CPUIDLE, HOOKID_BT_IOREQ_TO_SCSI,

    /* scsi */
    HOOKID_SCSI_IOENTRY, HOOKID_SCSI_IO_TO_LLD,
    HOOKID_SCSI_IODONE_BY_LLD, HOOKID_SCSI_IOCOMP_BY_MIDLEVEL

probe begin
{
    HOOKID_SYSCALL_ENTRY = 1
    HOOKID_SYSCALL_RETURN = 2

    HOOKID_PROCESS_EXECVE = 3
    HOOKID_PROCESS_FORK = 4

    HOOKID_IOSCHED_NEXT_REQ = 5
     HOOKID_IOSCHED_ADD_REQ = 6
    HOOKID_IOSCHED_REMOVE_REQ = 7

    HOOKID_TASK_CTXSWITCH = 8
    HOOKID_TASK_CPUIDLE = 9

    HOOKID_BT_CPUIDLE = 10
    HOOKID_BT_IOREQ_TO_SCSI = 11

    HOOKID_SCSI_IOENTRY = 13
    HOOKID_SCSI_IO_TO_LLD = 14
    HOOKID_SCSI_IODONE_BY_LLD =15
    HOOKID_SCSI_IOCOMP_BY_MIDLEVEL = 16
}
--------------- tapsets/logtrace.stp --------

/* Log the data common to all events */
function log_tracedata_common(hookID:long)
%{
    struct timeval tv;
    struct task_struct *cur = current;

    /* second|usec|pid|ppid|tid|cpuid|hoodID */

    do_gettimeofday (&tv);
    /* The actual format is a TBD. we currently print the ascii string
     for convenience. */
    _stp_printf("\n%ld|%ld|%ld|%ld|%ld|%u|%ld|", tv.tv_sec, tv.tv_usec,\
              cur->tgid, cur->parent->pid, cur->pid, cur->thread_info->cpu,\
              THIS->hookID);
%}

/* Log the data defined by users */
function log_user_data(input:string)
%{
    struct task_struct *cur = current;

    if(cur->tgid != _stp_pid)  {
        if( _stp_target != 0 && cur->tgid != _stp_target)
            return;
        /* here USER: is a seperator. we use this for
         convenience. It may be changed in future. */
        _stp_printf ("USER:%s|", THIS->input);
    }
%}


--------------- tapsets/utils.stp ----------------

/* return the current name of probe point */
function probename:string ()
%{
    char *ptr, *tok;
    char buffer[MAXSTRINGLEN];
    strlcpy(buffer, CONTEXT->probe_point, MAXSTRINGLEN);
    ptr = buffer;
    tok = strsep(&ptr, "\"");
    tok = strsep(&ptr, "@");
    strlcpy (THIS->__retvalue, tok, MAXSTRINGLEN);
%}

/* data tracing filter by pid
   return:
    1 - if continue to log the raw data
    0 - return without logging the raw data
 */
function filter_by_pid:long()
%{
    struct task_struct *cur = current;

    if(cur->tgid != _stp_pid)  {
        /* to trace a specific process if we explicitly specify
           which process we want to trace by:
            1. stap -c "process_to_trace" ...
            2. stap -x pid_to_trace ...
           else we will trace all the processes
         */
        if( _stp_target != 0 && cur->tgid != _stp_target) {
            THIS->__retvalue = 0;
            return;
        }

        THIS->__retvalue = 1;
    } else  /*skip the events generated by stap itself*/
        THIS->__retvalue = 0;
    return;
%}



--------------- tapsets/syscalls.stp -----------------------

/********************************************************
* We choose to probe both 64-bit and 32-bit application *
* on PPC64 platform. And by making use of conditional   *
* preprocessing, now the syscall probe could support    *
* ppc64 and i686 platform                               *
*                                                       *
* the syscall lists for PPC64 that I listed below need  *
* further modifications. Efforts are needed to address  *
* different kernel versions, expecially the RH shipped  *
* kernel(e.g. 2.6.9-24EL).                              *
********************************************************/

probe addevent.syscall
    = addevent.syscall.entry, addevent.syscall.return
{
}

probe addevent.syscall.entry
    =
%( arch == "ppc64" %?
%( kernel_v == "2.6.9" %? /* in fact, this refer to RH shipped kernel */
    kernel.function("sys_*"),
    kernel.function("compat_sys_*"),
    kernel.function("sys32_execve"),
    kernel.function("sys32_time"),
    kernel.function("ppc64_sys32_stime"),
    kernel.function("sys32_ptrace"),
    kernel.function("sys32_pause"),
    kernel.function("sys32_olduname"),
    kernel.function("sys32_sigaction"),
    kernel.function("sys32_sigsuspend"),
    kernel.function("sys32_gettimeofday"),
    kernel.function("sys32_settimeofday"),
    kernel.function("old32_readdir"),
    kernel.function("sys32_sysinfo"),
    kernel.function("sys32_ipc"),
    kernel.function("sys32_sigreturn"),
    kernel.function("sys32_adjtimex"),
    kernel.function("ppc64_personality"),
    kernel.function("sys32_getdents"),
    kernel.function("sys32_sysctl"),
    kernel.function("sys32_sched_setparam"),
    kernel.function("sys32_sched_rr_get_interval"),
    kernel.function("sys32_rt_sigreturn"),
    kernel.function("sys32_rt_sigaction"),
    kernel.function("sys32_rt_sigprocmask"),
    kernel.function("sys32_rt_sigpending"),
    kernel.function("sys32_rt_sigtimedwait"),
    kernel.function("sys32_rt_sigqueueinfo"),
    kernel.function("sys32_sigaltstack"),
    kernel.function("sys32_sendfile64"),
    kernel.function("ppc32_timer_create"),
    kernel.function("compat_timer_settime"),
    kernel.function("compat_timer_gettime"),
    kernel.function("compat_clock_settime"),
    kernel.function("compat_clock_gettime"),
    kernel.function("compat_clock_getres"),
    kernel.function("compat_clock_nanosleep"),
    kernel.function("sys32_swapcontext"),
    kernel.function("sys32_utimes"),
    kernel.function("compat_statfs64"),
    kernel.function("compat_fstatfs64"),
    kernel.function("ppc_rtas"),
    kernel.function("compat_mbind"),
    kernel.function("compat_get_mempolicy"),
    kernel.function("compat_set_mempolicy"),
    kernel.function("sys64_time"),
    kernel.function("ppc64_sys_stime"),
    kernel.function("ppc64_newuname"),
    kernel.function("ppc64_personality"),
    kernel.function("ppc_rtas")
%:                /*this is a TBD */
    kernel.function("sys_*"),
    kernel.function("compat_sys_*"),
    kernel.function("sys32_getdents"),
    kernel.function("sys32_adjtimex"),
    kernel.function("sys32_pause"),
    kernel.function("sys32_gettimeofday"),
    kernel.function("sys32_settimeofday"),
    kernel.function("sys32_ipc"),
    kernel.function("sys32_execve"),
    kernel.function("sys32_sysctl"),
    kernel.function("sys32_olduname"),
    kernel.function("sys32_utimes"),
    kernel.function("ppc_rtas"),
    kernel.function("sys64_time"),
    kernel.function("ppc64_newuname")
%)
%:
%( arch == "i686" %?
    kernel.function("sys_*")
%: **ERROR**
%)
%)
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_SYSCALL_ENTRY)
        log_syscall_tracedata_extra()
    }
}

probe addevent.syscall.return
    =
%( arch == "ppc64" %?
%( kernel_v == "2.6.9" %? /* in fact, this refer to RH shipped kernel */
    kernel.function("sys_*").return,
    kernel.function("compat_sys_*").return,
    kernel.function("sys32_execve").return,
    kernel.function("sys32_time").return,
    kernel.function("ppc64_sys32_stime").return,
    kernel.function("sys32_ptrace").return,
    kernel.function("sys32_pause").return,
    kernel.function("sys32_olduname").return,
    kernel.function("sys32_sigaction").return,
    kernel.function("sys32_sigsuspend").return,
    kernel.function("sys32_gettimeofday").return,
    kernel.function("sys32_settimeofday").return,
    kernel.function("old32_readdir").return,
    kernel.function("sys32_sysinfo").return,
    kernel.function("sys32_ipc").return,
    kernel.function("sys32_sigreturn").return,
    kernel.function("sys32_adjtimex").return,
    kernel.function("ppc64_personality").return,
    kernel.function("sys32_getdents").return,
    kernel.function("sys32_sysctl").return,
    kernel.function("sys32_sched_setparam").return,
    kernel.function("sys32_sched_rr_get_interval").return,
    kernel.function("sys32_rt_sigreturn").return,
    kernel.function("sys32_rt_sigaction").return,
    kernel.function("sys32_rt_sigprocmask").return,
    kernel.function("sys32_rt_sigpending").return,
    kernel.function("sys32_rt_sigtimedwait").return,
    kernel.function("sys32_rt_sigqueueinfo").return,
    kernel.function("sys32_sigaltstack").return,
    kernel.function("sys32_sendfile64").return,
    kernel.function("ppc32_timer_create").return,
    kernel.function("compat_timer_settime").return,
    kernel.function("compat_timer_gettime").return,
    kernel.function("compat_clock_settime").return,
    kernel.function("compat_clock_gettime").return,
    kernel.function("compat_clock_getres").return,
    kernel.function("compat_clock_nanosleep").return,
    kernel.function("sys32_swapcontext").return,
    kernel.function("sys32_utimes").return,
    kernel.function("compat_statfs64").return,
    kernel.function("compat_fstatfs64").return,
    kernel.function("ppc_rtas").return,
    kernel.function("compat_mbind").return,
    kernel.function("compat_get_mempolicy").return,
    kernel.function("compat_set_mempolicy").return,
    kernel.function("sys64_time").return,
    kernel.function("ppc64_sys_stime").return,
    kernel.function("ppc64_newuname").return,
    kernel.function("ppc64_personality").return,
    kernel.function("ppc_rtas").return
%:                /*this is a TBD */
    kernel.function("sys_*").return,
    kernel.function("compat_sys_*").return,
    kernel.function("sys32_getdents").return,
    kernel.function("sys32_adjtimex").return,
    kernel.function("sys32_pause").return,
    kernel.function("sys32_gettimeofday").return,
    kernel.function("sys32_settimeofday").return,
    kernel.function("sys32_ipc").return,
    kernel.function("sys32_execve").return,
    kernel.function("sys32_sysctl").return,
    kernel.function("sys32_olduname").return,
    kernel.function("sys32_utimes").return,
    kernel.function("ppc_rtas").return,
    kernel.function("sys64_time").return,
    kernel.function("ppc64_newuname").return
%)
%:
%( arch == "i686" %?
    kernel.function("sys_*").return
%: **ERROR**
%)
%)
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_SYSCALL_RETURN)
        log_syscall_tracedata_extra()
    }
}

/* log the system call name */
function log_syscall_tracedata_extra()
%{
    char *tok, *ptr;
    char buffer[MAXSTRINGLEN];

    ptr=buffer;
    strlcpy (buffer, CONTEXT->probe_point, MAXSTRINGLEN);
    tok = strsep(&ptr, "\"");
    tok = strsep(&ptr, "@");
    _stp_printf("%s|", tok);
%}

----------------- tapsets/process.stp ----------------

/* record the newly created process name */
function log_execve_extra(var:long)
%{
    long long tmp = THIS->var;
    long tmp1=(long)tmp;
    _stp_printf("%s|", (char *)tmp1);
%}


/* record the newly forked process id */
function log_fork_extra(var:long)
%{
    long pid = (long)THIS->var;
    _stp_printf("%ld|", pid);
%}


/************************************************************
* This function could be used to take a snapshot of all the *
* processes. It's not a probe, so the data format doesn't   *
* follow the format used by probe handlers                  *
************************************************************/
function process_snapshot()
%{
    struct task_struct *tsk;
    struct list_head *cur, *head;
    head = &(current->tasks);

    /* iterate all the processes, and record the pid and process
    name for each entry */
    list_for_each(cur, head) {
        tsk = (struct task_struct *)(list_entry(cur, struct task_struct, tasks));
        _stp_printf("PID:%ld|PNAME: %s\n", tsk->pid, tsk->comm);
    }
%}


probe addevent.process
    =  addevent.process.fork, addevent.process.execve
{
}

/*
  we should capture both do_execve for 64-bit app
  and compat_do_execve for 32-bit app
*/
probe addevent.process.execve
    =  kernel.function("*do_execve")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_PROCESS_EXECVE)
        log_execve_extra($filename)
    }
}

probe addevent.process.fork
    = kernel.function("copy_process")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_PROCESS_FORK)
        log_fork_extra($pid)
    }
}

------------------ tapsets/ioscheduler.stp ------------------

/************************************************************
* Probe the event when a request is retrieved from request  *
* queue(dispatched to corresponding device), the event when *
* a request is added by block layer into the request queue, *
* and the event is removed by from the request queue        *
************************************************************/
probe addevent.ioscheduler
    = addevent.ioscheduler.elv_next_request,
    addevent.ioscheduler.elv_add_request,
    addevent.ioscheduler.elv_remove_request
{
}

/* when a request is retrieved from request queue */
probe addevent.ioscheduler.elv_next_request
    =  kernel.function("elv_next_request")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_IOSCHED_NEXT_REQ)
        log_ioscheduler_tracedata_extra_elv_next($q)
    }
}


/* when a request is added to the request queue */
probe addevent.ioscheduler.elv_add_request
    = kernel.function("__elv_add_request")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_IOSCHED_ADD_REQ)
        log_ioscheduler_tracedata_extra_others($q, $rq)
    }
}

/* when a request is removed from the request queue */
probe addevent.ioscheduler.elv_remove_request
    = kernel.function("elv_remove_request")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_IOSCHED_REMOVE_REQ)
        log_ioscheduler_tracedata_extra_others($q, $rq)
    }
}


%{
#include <linux/blkdev.h>
#include <linux/elevator.h>
%}

/* log the data specific to __elv_add_request and elv_remove_request,
  for kernel == 2.6.9. Need to confirm the actual kernel version range
  i.e. kernel_v <= 2.6.x.  */
%( kernel_v == "2.6.9" %?
function log_ioscheduler_tracedata_extra_others(var_q:long, var_rq:long)
%{
    struct request *rq = (struct request *)((long)THIS->var_rq);
    struct request_queue *q = (struct request_queue *)((long)THIS->var_q);

    /* elevator name|major|minor| */
    _stp_printf("%s|%ld|%ld|", q->elevator.elevator_name, rq->rq_disk->major,\
                  rq->rq_disk->first_minor);
%}

/* log the data specific to __elv_add_request and elv_remove_request,
  for kernel > 2.6.9. The exact version need to be determined. */
%:
function log_ioscheduler_tracedata_extra_others(var_q:long, var_rq:long)
%{
    struct request *rq = (struct request *)((long)THIS->var_rq);
    struct request_queue *q = (struct request_queue *)((long)THIS->var_q);

    /* elevator name|major|minor| */
    _stp_printf("%s|%ld|%ld|", q->elevator->elevator_type->elevator_name,\
                rq->rq_disk->major, rq->rq_disk->first_minor);
%}
%)

/* log the data specific to elv_next_request, for kernel == 2.6.9 */
%( kernel_v == "2.6.9" %?
function log_ioscheduler_tracedata_extra_elv_next(var:long)
%{

    struct request_queue *q;
    struct request *rq;

    q = (struct request_queue *)((long)THIS->var);

    /* If there is a request in the request queue:
        elevator name|major|minor|
       if there is no request in the request queue:
        elevator name|empty|
    */
    if(list_empty(&(q->queue_head))) {
        _stp_printf("%s|empty|", q->elevator.elevator_name);
    } else  {
        rq = list_entry_rq(q->queue_head.next);
        _stp_printf("%s|%ld|%ld", q->elevator.elevator_name, \
                     rq->rq_disk->major, rq->rq_disk->first_minor);
    }
%}

%:
/* log the data specific to elv_next_request , for kernel > 2.6.9*/
function log_ioscheduler_tracedata_extra_elv_next(var:long)
%{

    struct request_queue *q;
    struct request *rq;

    q = (struct request_queue *)((long)THIS->var);

    /* If there is a request in the request queue:
        elevator name|major|minor|
       if there is no request in the request queue:
        elevator name|empty|
    */
    if(list_empty(&(q->queue_head)))  {
        _stp_printf("%s|empty|", q->elevator->elevator_type->elevator_name);
    } else  {
        rq = list_entry_rq(q->queue_head.next);
        _stp_printf("%s|%ld|%ld", q->elevator->elevator_type->elevator_name,\
                     rq->rq_disk->major, rq->rq_disk->first_minor);
    }
%}
%)

---------------------- tapsets/tskdispatch.stp ------------------------

/**********************************************************
* Dispatching when the cpu is idle or when a new process  *
* is chosen to run.                                       *
*                                                         *
* The actual locations for these two kinds of events are  *
* the labels go_idle and switch_tasks inside the function *
* schedule. But currently SystemTap doesn't support       *
* specifying probe points by label.                       *
*                                                         *
* Bugzilla shows that #1564 has been fixed                *
**********************************************************/
probe addevent.tskdispatch
    = addevent.tskdispatch.ctxswitch,
    addevent.tskdispatch.cpuidle
{
}

/*******************************************************
* Pls refer to #908 in bugzilla.                       *
*                                                      *
* Currently systemTap can't access arguments of inline *
* functions. So we choose to probe __switch_to instead *
* of context_switch()                                  *
*******************************************************/

probe addevent.tskdispatch.ctxswitch
    = kernel.function("__switch_to")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_TASK_CTXSWITCH)
/* multi-arches processing */
%( arch == "ppc64" %?
        log_ctxswitch_extra($prev, $new)
%:
    %( arch == "i686" %?
        log_ctxswitch_extra($prev_p, $next_p)
    %: UNSUPPORTED
    %)
%)
    }  //end if

}

/* Only applicable to SMP systems */
probe addevent.tskdispatch.cpuidle
    = kernel.inline("idle_balance")
{
    /* we didn't call filter_by_pid() here,
     so that we can get all the idle events
     despite how the cpu enters idle */
    log_tracedata_common(HOOKID_TASK_CPUIDLE)
    log_cpuidle_extra()
}

function log_ctxswitch_extra(prev:long, next:long)
%{
    struct task_struct *prev_tsk, *next_tsk;

    prev_tsk = (struct task_struct *)((long)THIS->prev);
    next_tsk = (struct task_struct *)((long)THIS->next);

    _stp_printf("%ld|%ld|%ld|", prev_tsk->pid, next_tsk->pid, prev_tsk->state);
%}

function log_cpuidle_extra()
%{
    struct task_struct *cur = current;
    _stp_printf("%ld|", cur->pid);
%}

---------------- tapsets/backtrace.stp ------

/******************************************************
* Print the backtrace when:                           *
*  1. cpu is idle                                     *
*  2. when an io request is dispatched to scsi layer  *
*  ...                                                *
*                                                     *
*   To reduce overhead of symbol lookup,We only print *
* the address sequence of backtrace, for example:     *
*   c016844f c030fe6b                                 *
*                                                     *
*   Some post-processing is needed to get the symbol  *
* info associated with these addresses.               *
*                                                     *
*   We could use _stp_symbol_print to get such infor. *
*   For example:                                      *
*     _stp_symbol_print(0xc016844f);                  *
*                                                     *
*   The output should look like:                      *
*     0xc016844f : sys_read+0x2/0x62 []               *
******************************************************/
probe addevent.backtrace
    = addevent.backtrace.cpuidle,
    addevent.backtrace.scsiioentry
{
}

/*
 Only applicable to SMP systems
 Refer to addevent.tskdispatch.cpuidle in tskdispatch.stp
*/
probe addevent.backtrace.cpuidle
    = kernel.inline("idle_balance")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_BT_CPUIDLE)
        log_backtrace_cpuidle()
    }
}

/* This failed to work on PPC64, refer to the comments
 in scsi.stp  */
probe addevent.backtrace.scsiioentry
    = module("scsi_mod").function("scsi_prep_fn")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_BT_IOREQ_TO_SCSI)
        log_backtrace_scsiioentry()
    }
}

function log_backtrace_cpuidle()
%{
    /* print the backtrace without symbols information */
    _stp_stack_sprint(_stp_stdout,CONTEXT->regs,0);
    _stp_printf("|");
%}

function log_backtrace_scsiioentry()
%{
    /* print the backtrace without symbols information */
    _stp_stack_sprint(_stp_stdout,CONTEXT->regs,0);
    _stp_printf("|");
%}

----------- tapsets/scsi.stp -------------

/***********************************************************
* Trace the following activities of scsi layer:            *
*  1. I/O Entry                                            *
*  2. Issuing I/O to LLD                                   *
*  3. I/O done by LLD                                      *
*  4. I/O Complete                                         *
*                                                          *
*   Due to a bug for elfutils on ppc64, it failed to find  *
* any module debuginfo. Hien listed this as a bug in:      *
*
* http://sourceware.org/ml/systemtap/2005-q4/msg00222.html *
*                                                          *
* Roland is fixing this.                                   *
***********************************************************/
%{
#include <linux/types.h>
#include <scsi/scsi_cmnd.h>
#include <scsi/scsi_device.h>
#include <scsi/scsi_host.h>
#include <linux/timer.h>
#include <linux/blkdev.h>
%}

probe addevent.scsi
    = addevent.scsi.ioentry,
    addevent.scsi.iodispatching,
    addevent.scsi.iodone,
    addevent.scsi.iocompleted
{
}

/* mid-layer prepare a IO request */
probe addevent.scsi.ioentry
    = module("scsi_mod").function("scsi_prep_fn")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_SCSI_IOENTRY)
        log_scsi_ioentry_extra($q, $req)
    }
}

/* Dispatch a command to the low-level driver. */
probe addevent.scsi.iodispatching
    = module("scsi_mod").function("scsi_dispatch_cmd")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_SCSI_IO_TO_LLD)
        /* sdev_state|scsi_info|data_direction|request_buffer|request_bufflen| */
        log_scsi_dispatch_extra_1($cmd)
        /* cmd_identifier|  */
        log_scsi_dispatch_extra_2($cmd)
    }
}

/* I/O is done by low-level driver*/
probe addevent.scsi.iodone
    = module("scsi_mod").function("scsi_done")
{
    /* scsi timer check. We should record the hook only
     * when the timer is inactive. But there's a gap between
     * the checking and the actual calling of scsi_delete_timer.
     */
    if(filter_by_pid() == 1 ) {
        if( scsi_timer_pending($cmd) == 1) {
            log_tracedata_common(HOOKID_SCSI_IODONE_BY_LLD)
            log_scsi_iodone_extra($cmd)
        }
    }
}

/* mid-layer processes the completed IO */
probe addevent.scsi.iocompleted
    = module("scsi_mod").function("scsi_io_completion")
{
    if(filter_by_pid() == 1 ) {
        log_tracedata_common(HOOKID_SCSI_IOCOMP_BY_MIDLEVEL)
        log_scsi_iocompleted_extra($cmd, $good_bytes)
    }
}

function scsi_timer_pending:long(var:long)
%{
    struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
    THIS->__retvalue = timer_pending(&cmd->eh_timeout);
%}


/* log the info about scsi io entry */
function log_scsi_ioentry_extra(var_q:long, var_rq:long)
%{
    struct request_queue *q = (struct request_queue *)((long)THIS->var_q);
    struct request *rq = (struct request *)((long)THIS->var_rq);
    struct scsi_device *sdev = (struct scsi_device *)(q->queuedata);

    /* major|minor|scsi_device_state| */
    _stp_printf("%d|%d|%d|", rq->rq_disk->major, rq->rq_disk->first_minor, sdev->sdev_state);
%}

/* log the info about scsi_dispatching_cmd */
function log_scsi_dispatch_extra_1(var:long)
%{
    struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
    long long scsi_info;

    scsi_info = ((cmd->device->host->host_no & 0xFF) << 24) |
        ((cmd->device->channel & 0xFF) << 16) |
        ((cmd->device->lun & 0xFF) <<  8) |
        (cmd->device->id & 0xFF);

    /* sdev_state|scsi_info|data_direction|cmd_identifier|request_buffer|request_bufflen
     *
     * sdev_state could be: SDEV_DEL, SDEV_BLOCK or something else.
     * Instead of skipping SDEV_DEL & SDEV_BLOCK, I choose to log them
     */
    _stp_printf("%d|%lld|%d|%0x|%d", cmd->device->sdev_state, scsi_info, cmd->sc_data_direction,
        cmd->request_buffer, cmd->request_bufflen);

%}

/* log the info about scsi_dispatching_cmd
 *
 * cmd_identifier|
 *
 * Now we can use cmd->serial_number as cmd identifier(still valid till
 * 2.6.14.2.  But according to the comments of struct scsi_cmnd, it's
 * a better to use cmd->pid since cmd->serial_number will be killed one
 * day in the future
 *
 * But when scsi_dispatch_cmd is called, cmd->serial_number is still not
 * initialized.
 * For kernel >= 2.6.14, it will be set later by calling scsi_cmd_get_serial.
 * So I choose to record cmd->device->host->cmd_pid. But there is a gap between
 * the time when cmd->device->host->cmd_pid is retrieved at the beginning of
 * scsi_dispatch_cmd and the actual calling of scsi_cmd_get_serial.
 *
 * For kernel <=2.6.9, it will be set by a global counter.
 *
 * NOTE: The kernel version need further investigation.
 */

%( kernel_v >= "2.6.14" %?
function log_scsi_dispatch_extra_2(var:long)
%{
    struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
    long long scsi_info;

    _stp_printf("%d|", cmd->device->host->cmd_pid);
%}
%:

function log_scsi_dispatch_extra_2(var:long)
%{
    struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
    long long scsi_info;

    /* systemTap failed to access global variable. So I temporarily use 0.
    _stp_printf("%d|", scsi_pid);
    */
    _stp_printf("%d|", 0);

%}
%)

/* log the info about scsi_done */
function log_scsi_iodone_extra(var:long)
%{
    struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
    long long scsi_info;

    scsi_info = ((cmd->device->host->host_no & 0xFF) << 24) |
        ((cmd->device->channel & 0xFF) << 16) |
        ((cmd->device->lun & 0xFF) <<  8) |
        (cmd->device->id & 0xFF);

    /* scsi_info|data_direction|cmd_identifier| */
    _stp_printf("%lld|%d|%d", scsi_info, cmd->sc_data_direction, cmd->pid);
%}

/* log the info about scsi_dispatching_cmd */
function log_scsi_iocompleted_extra(var_cmd:long, var_goodbytes:long)
%{
    struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var_cmd);
    long long scsi_info;
    unsigned long goodbytes = (unsigned long)(THIS->var_goodbytes);

    scsi_info = ((cmd->device->host->host_no & 0xFF) << 24) |
        ((cmd->device->channel & 0xFF) << 16) |
        ((cmd->device->lun & 0xFF) <<  8) |
        (cmd->device->id & 0xFF);

    /* scsi_info|data_direction|cmd_identifier|goodbytes */
    _stp_printf("%lld|%d|%d|%u", scsi_info, cmd->sc_data_direction, cmd->pid, goodbytes);
%}


begin 666 stap-trace.tgz
M'XL(`/3#GT,``^P]:W?CMK'[U?H5J+/>2*ZLM^3$6V_KM96L3KRV*VN3[&E[
M>"@2DEA3)$U2MG6[^]_O# `^1/$E6W;2E#S)FB(&P&!F,)@9O!Q7M@Y<6U;H
MP<'"<%QYK,-;H]:LO]K:TX#GL-ME?^&)_F7OS4:[<WC8:?8.6Z\:S4[CL/V*
M=+>'0O*S@#;;A+RR3=--@\M*_R]]G$3^#_LG9Q_[VZ@#&=SK=!+YWVZV./];
MK5:WW4'^'[9[KTAC&Y5G/?_C_"=D2"?4)JY)''ENZ;3FN!:9F/!E1HF&7^;4
M<*E*Z!W\)3/3O'%JI1+D6Q@,A@D/Y#=U,EX>00**%#D8U.KPUZ&N$RH84LOD
MXG+4AYR:0^XU78?\VG0*&,CX#N6%ZB&RH1)%7CB4I3A+QZ5S*(,@NF-*[JB]
M)(YNWNO+2NFWIN1_YY/<_SG7),6<6[+B(O<>6T=F_^^V/?W?Z'8.H?\?]IKM
MHO^_Q%/?)R/LBHYB:Y9+X$TF*IV;3 7HYE1BLJ'*KHR2,#<-3R#(?KUDV2;T
MPAMJ&U2O31:&XFJF4=Z%?BKM[U9*_RGMI)4@S4V;EC]<7OXT.).N/U^?GIR?
M2_V+T?!SI;0#>%W:1ZROBR>MJ(12_+R Z]="0<0]R?U?:.\MV(&;V'^'#8!K
M=KO=POY[D2>;_YKI*#.J+G1J/W(02-?_S5:[Y?&_#7H?]'^KT0.3L-#_+_#4
M][?VE/;)MAXLZ]2TEF :SEQ25BID`#:H;<@XP,@Z>;]P-(,Z#ODH*S-\`VC;
MJE5)J]'HQI2U3;S8< DCW]26YSA>3FP*IJDY<>]EF[XE2W,!)JM!;*IJ#MBV
MXX4+=K2+IFP=AM2YJ6J3I5<6?%\8*N7&-C1P[A!SPG[\>/&)_$@-:D-KKQ9C
M75/(N:90`TQA&:K'+P[T2C"YO;(PUP^(R[7 A?Q@0N&,9&\)U2#=1HO9@=^D
MY=4C"JT2P"UH8UEVL2$V,2W,7@'LET27W:"$6FYZ/2?M`Q*K1./NR,RTT%N
M!@!QF8,!)@IX$).%7B4`Z9?URV#TX?+3B)Q<?":_G R')Q>CSV\)9'%G)H"A
M&^+[0!J4#R2U9<-=(N6B>'WL#T\_0 $G[P?G@]%G).8/@]%%__J:_' Y)"?D
MZF0X&IQ^.C\9DJM/PZO+ZSX0\)IROV:UK!3.3Y@``6=5ZLJ:[J0R8=NT_PR"
M[0!I=)7,Y#L*`JY0[0X((Q,%NFH.P0W*DG73F#)2<T=0L/0MT2;$,-TJN;<U
M%YW*&*F.XA5(>16TA )*H/L]&5%T7LB5#@-;E5POL+1VNU$E[TW'1<B/)Z31
M:C:;!\UVXY!\NC[9.KU.%B!(]A$A/RYD8ZI3#>A RE/QXV^*4=/&\QH8L)7,
MLK;VU$NE)RE\:-85L_D#1_U^!OU$!FFX75"'.1 VA3Y)43(FMCGW4QA1X'5!
MR]!M+=E%JP)9K)BV31W+-%0-A$*E=R NE6JT"B8VX6ID5>7Z;ZR;R@UHIR5H
M)\T0,N,!L@JK/+>AA@IEB,[-.UX&PW0MWW98X/E)B#!670M95*6=X]CO-:K?
M209]<"6!4)64=A(!(<&'2P'C[?4@P3W[BN*0GX.<)JGM6<,;&[CN(4:AN+.H
M3<H338=!1AHO)4M3RQ5R?$R:I$(@,=:5]/R^P>7UZ8?^F731_W4D#?M_KPCX
M$&:AO%"Q#?\*',JO;P'\*Q(CGAI<T.+D*@<M0JQ!4JQ10I(B4-LBQ<G965Y*
MF&@:.$"'*GEM^\1(D S>91+Z2S8](C(81Y)UL&W19-C_>/ES_VED*>W]I_2-
M9BCZ0J7D+[IF+![J8_T&FEN;O5M+H3J]DUW3QK0]3E2HEE$.:R*.115M`F,D
MB->:*#"-M4Z-:HDP2X 3#FG0JO5JW]?(!?44JC'1;,X?67$7,!(+6,_\`T-F
M2J$8K49K(DVZ(W_A)3W46,!DKQRD0!V[K))=LO?7DL<JDI."=[(MW1[AD%\E
M^&[S'Q4DY0X8< O%]>5HW[XEQZ0<_5HIEUF6T8?!]<$[7DCE;32W)*0PI@@O
M:;T@5@X+.'G,(H8\IU_F\K]-^\M<,^!?I,>.!-ZO9-DPQ$S*NWO.ESU=9?_O
M5LGMP3N?T]Z+A*54B0UI]JT$0][-P3M69N0;L K08_54WI+GDI)WGI",< 1\
MP ">)PR&$)LQLRO!"0&'2JTQ$3@J>)W&Z^!-<I<6#?U\-._W*NGL7[<+8E1!
M:O<E^7GJ#Y% NX"-B9Q8XQ%R^"W ;\*C"F8`"@R8.P'V/@M)^T.0L3[J'($N
M3V9G:0<,."U4F&%N6!JX$N[R"]A?*!@P"('G[4KL8_E-&;C)LDDS*JN5"A^+
MHM+#2TC3$RB77X&YZ"5A":QG\(H,UUY"#UBMJ89LP4Q)DKI=I01#'^HET >;
MR":IQF@@(&(A?\\H?[D%,%UY/;- ;EMS<@'=*V:!-WVRX_]C6;EA`(^>`LZ8
M_SULM%M^_+_=ZV+\O]4NXO\O\A3Q_TWQ*N+_1?P_BE<1_R_B_U&\_ECQ?Q;Y
M!YN.,<0W"5B$\"B#$*19(XJUP/ZJJ3I-@8[D:]5$!-(@FAF.0J[.'SB*HXGP
M/\]7J^733M'Z'O'P?"/$3ET`/4Q0CV@.8V]PEO.QJ8.38]XLK.HOD&CH2\(L
M8U]ARZIJXSCF8.,,+& 24)>[3_2!+<0["M6G-)J][SJ=";RT&Q/:&S]S^Z[-
M.246])P#Z*@*X(L3->C*4"I"XU/JBB6:K,U\<#,F)@Q6CJEH,BIHT=VIXS>;
MKFFPI^$)-%:8@L(5H\P/X?AP=\3#TUDH,X:<71/Y?HB2.6=]9+V2<N/!XP[X
M)8GY'M<^C-W!V&3!\"0T,0H7N&4W27A[]058D2-<1PLNNJS^N?'0JC<>>BWR
MCW]%\_$_CU0]:[,`ODBOS'<%G@6H!U0-X<FKD-L!_5LSF=OIS5F5R"7V)=F"
MH5E!EP5Y>_WQ2BP1=DK!JF:_0->Y\?2&5Q^S&4*?<85R"O)>MM"LA6;H8'Z6
M=_&S-)9U&?KP4V<KWH^DTZM/@[/SOC=+X6,@"0S*H4D:9A--P!S@/?'>M&]
MTY"KJ]->!YQFCP[8.[$>:!&0!QJ.9&6+O6-F;>*)#\T&\W4!U>_B9PE^[%;"
MZS[QHV532YH86R#"X'+8_[LTNI2N3Z\'ZZ0(8>:38R6R%$<UC$(#S:RXH4P8
M?KPW.UQ#S)EA$82$P5E4;B2'=W7Q185<U=/+BU'_U]'!.YM.G6H#(R$K@9 O
MNUYP/P''U>;\#O#\K3W3XGF))SO^$U&0CZ@C*_[3//3C/X?M=INM_VQWB_C/
M2SQ%_&=3O(KX3Q'_B>)5Q'^*^$\4KS]6_ <?:-29L 0P`,!B,\RM" 5W0$+%
MJC&#WA,1+D!_$K6=0Y29Z6 ND]B+G#KD:9+,M8:_#DHW%28JCK?%%;>6WIOD
M1C-4IGN9_^,0)FB>5M7E,05#>VI*K(FXWL8!V55FDBL[-VB!.YK*N[)OW&->
M;V*[!N.52Y2%;4/9X+9>,R=U)%M$-:EC?(LA"<LR;3>$,Y]77R*=N6]FF1HB
M-EYR=))H]U1:O5],_P^4IHQ=_=[ABO2;9K?7@4X/M5-@WD1[@%Z_GO<)4AEQ
M/T,FYTK 8,6%=Q\X$U96QL8X^5[,X-$R?Z4[@1O]S?>-[W"H&0LZ;9T/S/3Q
M1<7Q105L"A 466']2;:G"^;)H\CR& 3+Z(D?C S7)KFGV-],ARE23F))\B37
M1+%U,52)&:$8Q003Z\$5`.4,K?0DGH,:2N9XP-F$!;-^"YX:8QB=7/\DG8Y^
MO093X/1#!8,I\X7N:@>R#3W7(:%@)U]0A=_98BK+4GH=MA:2U^"CS!>NE%];
M-KVKDM>@`\&7/P()#>?6>M_ULC)+%LL._+ JD/N(?+JX_G1U=3D<]<_@=P77
M.NQ\):1>IZ".M$E)1(/2XV(QH9ZX'I,GOH7+@\$0T5042P4W[T>Y@ M>JJ4=
ML(VY&D%Q!,,80[#>7G^F3[G*!4"5.A8.U*!Y_'$%SQZPQ=B"T8UL?OJA,T9;
MWB)!V<I:C"A*>R2]6+V(Q%]?NX@:7Q+O^XQ10, JV6>L@E=</>1]#JTZ6LFV
MNN8(H3$,XY60+Y=8;A-=J2B6*8JEBAXB4(FF\A:%?@:IX *[-#8^%:5?$B5@
M< .\Q1 7C2D);""5U2PJ^JT]W^+!)SO^@R=Q:*JDTHGS//&?9J_3X_&?3J=W
MB.?_M!K=P^+\EQ=YIKH):ITP?0Z#!%/DJ&?C#E2HDLC787_T:7@!!I@(5G-C
M7[%I$(T6.:Z&EZ?@#4O@#?_D%^-][/_:/_VY[Q6CF;[A;(>+B.[Q\8N)['BI
MKN4(MGU4>16HNOQ)9#&\[\0:!7XEX;%%E!*$Y$/9@\D;/VMT*D-D9]/685)#
M$@!&*,T_8M;S\[-J%/CL\J(OO?_,TB)93B\_7F'21\3EY_ZY9W*-Z50S< "/
MXS#H\.9:"N<R)+5*:_SDK(.T=BRK(:%32N0@I'9AV(]G(R3V4C@)R8>E!*9!
MVG>1),X22/B^%,,J;'8C_'V%7YC:+,7R"9/:T13!+$SKI#",'#>[T>0HT[",
MWK,>79*M_X56>*SR?Y6E_UN-3KLI]'^KR\[_:36:G>+\GQ=YGK8?>,U7WM93
M3 `4$P!Y:%],`+P<[8L)@,WH]4>;`, NLQY1')O PU[G8,Q5'&FW^"L/0?FQ
M<&]%$K% B^""E1HY,9CVFLLW:(+CDCT>A50UH>0Y&7%!D1^(JX)TW'N'03)G
MA6/!E_V%PN@L)T;I&%(8<?-KSL&ZQS,]C!INE^(3#;SI3"D-V&=4W%2'IK#-
MMR)V:S,-S08(03KHX/T)%"!F)(+UEMZZ4983X"<4`R^1;=Y.E= 'G$<`9)8,
ML^$'Z,&:97EU<O@RK4UK?'?>0:O3/Z]DZ/;'"^1ZX%?0:B7,+[[5V&*HZOIW
MF[H+V^!Q_?CB:OZ"M82P;?+N6/1 #3*1%;?*59,?^@\13] 9O;>DTP^K,4GL
MM$)72H& I'9+H@]4N:-I$*XV3TAG;90XE),,Q@$L9OFG0N#IJVD`IJXN<,]>
M&HRC367V(0/(63@6-=0TJ"EUL5'F1)67J:5EP@'B`(?K8%7-3BUJZ> :NS00
MS5(RFL9E-@U(5O^-&#^DL=6"CHVJ47-3&P]$4C&FGM$JQ=53(3 *@X2T9!B:
MLR%M&RN6-#34[^34HFTW'TTX7+;L<#@<)N:R<Y,#DK+#AK(!D2/JO:RYV:!L
M+VR6H&!K=)>MSDR77D.=:#H%;94D#4()V!(+MR5T/Z%P.*#H$CD@ISD@%3QU
M*5>9'#)_F0`)8UL.0$,V3$>GU$HEY;ULB:G--+ %(I=>*<Z23)PDE@B@23H4
M, [$14ZO:#[6DI2@@,!^-@<+UP0;*T$3>$AG0P(!0+7D&%'2QA,.9-#[E,$@
MIPX+:(13IY&GOL\&9;8I?_3^[-F&X#PJ-%UCYQP_MS>@Y1J*LDV+[($ASY"?
MUI_2.T$^<0PD#6?"]XY*\1/LB6=3@V3M[_>'P\LA.#9\.OU):PFBYU<S8&&-
M1L^U\+<K)!BOPK[]+:U7@4-V#TH!7!6X3$#.\12P&,LVLU#/P,T&Y/TT$RZ0
M_4S0D.62!]8W?C.!5U5&=MEYP2,6<7;!GF&<"<F44AXB>"9A)FR@>S,E9F7
MR4-<H?CSM)]IRFS B"6=,\.:09V9;]6NS@F>6TBC5G;>#)ZQG1,^9'/GS!$R
MO7/U2M\"S]%W`D,\7=#6[/%L%1HQR_-FF.;/$#'2\V;8N ;/9,\+'[+<LUD0
M-N SH3V[(\<`YAOJV;"37,"!79-=HK#QLP$CIGZ.=N7.$+:T\HR[.4;=J'&V
M->6\1MLG^@;;,W V&#!R#5J;V2//90ML,F[G-O%R#Y@;6%DY.GRN?KE19U@3
M\$<Y(2%)?A9?A"_<R>N,A$[VXZN%^:I>;":)'MJ77!8N$%5FLDWV71/7Q8(!
M_E9\&2]PFN(?'T]^O1X-!Q<_GO<O_L66R[KV,4_C!_CIBK4D9?ZE2OP-R\Q9
MDM@V")R]"PK!Q;-0&3DFD-FA5OD-E%@EN__<W4U*^MONVL;G/0?7J +P_];R
MU.SU/]#!GK+XYU7F^L].KROV_W:ZK6ZGA_=_=;J=8OW/2SS%\I]-\2J6_Q3+
M?Z)X%<M_BN4_4;Q^M\M_"#<[^18KOF.'6WG P=!>TQ6CCWU'H"/L<,"[%5./
M&55H\*6;>IYQMZ%M!Z6#`>=;B(\R]7RSDN^>DB2@P9VLX]4PS$Z-U(G[H/AU
M(.R0:[2/L-'<'D>="_8XWB[)*7E4VFF2`Y1=C%-HQH))KF=,V_(]*Z6TTP`@
M07M/RP#0%$M>`5RA_(H/P/:CY=^&Q9P(MNW*G6HJ^=,Q/Z\,]V#Q4YUQ+X0I
M;O -G>7M[>2 %MWC;0&XE$MCVU'YKF3("8V_GVG@ZGBP`'@O\T/6>'EX&3"#
MP_/_^)W "MD5X))K<N=A%T_K$W M#^X!Z>N#!!#L.&JLB-\;S+ 6.PE%N=1!
MR/TZ@3_0=MY>,/!PTR$TOT'>O"'K!.$`W+':B4H($+2!YUKO<-;A*TA'+%PS
M=&AV?=^YT:S@PB&'3)E6=/F@S1JJN0[5)QBF2*K5J_/9'))L^U]0]@D>0(;]
MWSWL-OSU__ /VO^X#:RP_U_@*>S_3?$J[/_"_H_B5=C_A?T?Q>OW;/\KILWO
M0C3H/9AU?,I2]6VYV)@OC_*+0._*A2P[C)OL'W=N@>7B7Z;R=D<DPO?F,3^_
M`%[C8Z]E[DP@0-,[C" 67>@&-R%LP9"+XHH0R9B"<8F'+*S>^X*HQIU:$!Q8
M\/2[,_E1K1ZB?,\"UT[\?D'YAIGA!EA>,]/%'N69MWPCA#!Q:V3@?HL7R<"H
MPCVS*C]D0]S)P\\#]0\7$ML+3%T76R=$.JL6?1GF\\U@@-+QJ(TM"^"J$\EL
M?Z^!*8X,.TG#2V'7SK!3M='!`4\3WR&9?3HF;\K"X0&K'L]C\F[P@3Z/YO:Z
MBU!E>T)"8H4"@9\$1&F'=0!V^#;8%X1M)N"GCR F\%W"[V6&#+MWAWD.J6=W
M!%?G\&SK4%7"L5^_1.=J<':$TGAU<?*Q?T3VG'\:.%7A'^3!WG *B%W:PT4U
MLK3,;]DQ6?M8P]Y27?_,.[Q_32CXN^!ZB1% D2UP3,3N']44RH&1S-L+9%F0
M`ZDJYE-7H8)M0B%)BSE\.8),[&VB^W[13ST1:'5;NS=SMJ+Y7N-R$'9#4OPB
MOC!18P\PPL%2$E#;PA?WVGO8AG3?:U1>',TD^S_'_F]Q3O3C?8QT_Z]YV&@&
M_E^WW4+_K]5I%/[?2SR%_[<I7H7_5_A_4;P*_Z_P_Z)X_6[]O\<_V -9U)U[
M$>A0X.0%+B"^TUR-,NT57,=S%&D&FPD8U"])GQG4&](3IP<&CK/ &K$0$ X\
M12=GYC:O634-G)G(F=/+W.&93TT4*C??#49^YL<^//,9GTR2\<!1[I'H$[8B
M!W?4L\5H551TP?T?$\T0F[M19_/[.D!?0'9<(%TC'S30:V(+.NM[LB-*U[R;
MG#9"&FN:N:YU5*\[,&(H%#MIS;2G];E>YPO)P)*LX\!\<-NISYUIH]%JM6HS
M=ZX_G4!#4V?[^QT\E99/I&GI:C'(_)1N%',)/-YCZJS>`8]=@=G/DC(WU*0T
M<!Q 02:ESD!AQ=TLS]:"QR4$E]'OQ6PQ@B)7]]JC=2_N':F&#M(5WT.'I,6E
M0F^*^:R(?J+ZON-<4P_X#5UXF@-J<9D,+H-K>M?<OC!:+WKO3/AT+W_E))8O
MD/$\J]LJ>0WHAQ9/>D=CL_%P/D>Q%.,7J,D#'=JE$]6&$=.N)3<X1.]\S?8R
M@(2I6VJ[.+^LPN>''4"0K96G7S@=0(M\89E5#4P`1.6+=Z<RGZ-?^0F^,K]1
M/:"DCS*_Y+E9?@VXB]K@3=)P.;4VT: DDIZUY67U>(!:&LU"H>6CA$\F.V3(
M26^ ] _ 94,=ZXA$F5'EIH8'I @;*11>PA,TV>5OI1W0K?ZQZ3PC+E@WV @J
MS@EG%T5_BTIY*EM@O+KW>.@VR\I6;&!%;-0U>.GB9'-<CXN?Q0@,6@6[(-\R
M4L/<]2S1P/EREI5O,Q';=CB)&9R8(,\A0F?!,7<5+XOH14A KQ,)OG\-&!A6
M%"):Y]U<Q54*JHU$-@:*)Q<O-5,2.=C^IZUTGN@)?A$=XF,8(@%HDJEIJE"C
M2YVUFZS6.<(7@:S&M44XT1]KR#X4'(I&AA+B;CR/6WZP*@5OH+R#=W3&4 '_
MR%LD$UHUSN_[&[.[J;!C:&80.5U=/;ZN3P$-Z58<O(SO]NUZXZ+7O6]PJ;MT
M6XF]''Z]B+C,=CAW:,@F^Z@<HU3VTBK^!>DH+5Y0.GPU? A<:-C@`B]_%D+]
MPO_;W>Q2]"I3W ?O`O4M.);,,.G_V[O:WK9M(+RO]J\0#'2)$]>SDS3IG";
MMJ9 L*(=UJZ?[hidden email]:9";,FPI#09\N-WSQU)42^Q[,9KNHX'M'4EBCH>J7LA
M[\42/A GU?-6XM[K78C949*1-J"<^0^ZVN2U*(1[? P%2?X^"R/O1Z]W_>)%
MVWOVS-O9:WNW]/D5VH\^#4.$/ULM^_N5+2=I:+?RGE(K.!;E6P5CW4A-\3(2
MLR#G%@M0<&_F_UG/YO!HX+UY?O+N[/G)RX[\^O7EZ]]^QP9$'$W]A+/;PB&)
M10#9QY+R'T+B,IC-<%<_3Z/(.B!3VLKTI1;,5(F1T@*=3&25]J[I'V3ZMNF3
M(=W)YE2UB4=G><*@F$.#;^5)HMH7" .*K[:FFT1)KZ1FJ*NOHL\Z23VRD0F&
M_CP83L["='I.0HEL)2SG[%%:UMAB(IY)RX!_HB/$X_?WNF2PLD0?CJ *L'62
MK\/(\U#X+F#.D?1'-T-(_T3"]PU".*N*`Y2*K<!/[W==TK\D*Z'5>/!AO-&[
M8V0[IG.^B#] CM61DBK)M5J&Z*13]1ZZ*P,/HP0=!2&9_T2#O_UQEWM&1565
M9N#X2-&C8^_(Q7ZB-A1)3].ZBR+"F(,>Y8U=]/8FRBU'I5I5< $\2P2R%"F)
M!K0TJ:;2HR#'9/!W]^-Q`H5D'OB\SZ4J,2*!<BBJ%CHKTZY&,RL/L$"Q9T><
MT*%$+Z(4#442=M/WCQUQ_?2KUV]/!ESH1O6A-WHYOYQ.+1>$5_3U!!>\6]9E
M/]-<+HECE4NB+VDF:IG_SM=A_D5^4^0PN3D3,?=HT/QFT)?,ZJJ,2[91HTJY
MJ/FD+@*<O75EL2<(X)W3M<D-?_J];B49Q/+E03>J.#,UZ2DF^:B]F$_"9+I#
M4;-4=B?LB\)^20E?(3BUU!RWEA*,FONO28\KVR)0=:FYI87#+&&K9$USCN[O
MG/=&&L;!1>B/Y8YY-_K.W6IO9AV:5NWO=>%D=%BP@FX?I:LMHHYG4^[_$^[J
MH #U_A_$,50I\'\G_W]_=_?@0-?_W=_?Y_J__1U7__>K@//_6!4OY__A_#^*
M>#G_#^?_4<3KF_7_\%XJNX&=Q>44@PU2^E94J&+18B@=>N!(Z_1Y17G$8.I?
MT<PG5X=+1JA"%?:1]?^6/M+1+>FFMS/\E= ?KCYX2^\:GW(VIV9C'.4R'GF;
M/R9<//&G7)5?Y>"N$T%U>5?/U%1E]5DV:.)1$'@2T=QL\.=%F! )`C\DA:>L
M<;\/[0J+YD^J"RXF5]WDZHS&H7^F_-M$G.HBB$-Q5Y]E5\ROY!,2,+(:3U;(
M+.V(L2,$SVR_W!R._8]!*,* 7C@OSQ^NLBVP&82S-%%!W.N+(UXYU#8?3TOC
MX;VZO]Z<_#F0:8O]&;SWHSE/'K9 F%'0%"' -S=+IXDWI:6 *A!DM%V(*)"-
M3IE">PZ]S1:_14).A+9,D_9AHZ&]Y]>@%(')T;Q,(UY5^DN;H6A#!\-+S^,D
M2* ;X'[5-\9,Y/0C-28[<,)[]D-O'GWF0LR<I0E!"G):9U^1UDH:J@(:5T&4
MQG@XMZ<X]V?@CUQ4>HOHZ$53NLL?'>]A,C,(QNCCW!\-91+\&RY%,9Q@F=Y@
MB_DQS,CZDA%K8:IO<XYGJ-RMUWG,4@GQ+ERD&L$M?!S/&Z X"S%]1*3"<.A(
M@LU0=9+.8O5S-+\T/E1F]SV.)FEBJ3SK8,@U_)7/I8F95?%9$G4CYK-GB1=-
MD%84W//0W.+0>;[#3(5_F%_X%O$]VUD:^CM/D9P!3''BA[R/\F7,O$&/2]QZ
M+5M?AJDW&PI!ZG9[^P/UO/$^W%#<R'!:<):,#'S<3EAL'WFQ8MK2R39=[7AJ
M3]0C-IBT31<0(!RT#U96>N?6QF'5Y=L-#K8!TZA_H?6^5+V01Y%GD'J"EAL$
M]VF>O\\8+%0LR:0QFLU6Q\CJ9EV(60A]`3[WPZ-VDM/;TIP4Q3C?*LARZX4?
MCC;>]S9R@9%G(_K.U(OX<\B6.=PS].H]E!L2U&B(I:_*Y>*DJ+N\\(ZR17BX
MAC!'!0OE'PL@?2BE+V?JS$64D%4=3=')(N[8A7B44SZ()"(,D2.%)Q4)1.+<
M07RIS.Q0B4XR[&:IRL"QD.^V.R+JE*A$)TI:CL5LV9*3T.7,8"4WO.;]Z;JD
MX#B#/'L0Z<&_U.GD6F1)0W5VM%BN7*QD+_P7),NW)5C@ER$3L;U]N(S($(^Y
M+Q<:;'/(&[UCKZ>=#Q=TN*7.AN4A[L->/.)HZ"T>3%[,K#Z&!\&]]*H5WV2_
MJ'[A?4_"[J&WWQ\<[C[_(6-R-KG'J4\&-?F?=I\\4?6?#_9ZN[O]'WK]_=[!
M@3O_^1I \O07LG)#:!0#I+0CE8J/,#)OJS0T*10RTYL=L$F5XU,"<6_!TTI?
M&L'>G@]9OXOIB6X3Z0-4,(2]'=6&;B9)%K#9H')1!#&[RVL-$GY6\K!_'=!T
MP?F-=YNM;!I=WAI@]1$Y]43%"[ 'R'GZ!)6I/X3CU@3Q^5FN`7;H:G)RD(O-
M%O^OU6[JW'AWY^3(/-2QR521W$)VZDQ"/"NK=NRIVI;"'9$ZKLJK/5?,4A"T
M:-?Z8QXE$<*.6,?654=5S^\0?T??<:L&C^L@L=#868"&*;NY'!XJA_H"1'0"
M%TX_P_OW.I2IA20&+=Z8::G,#AVL(*70>KL2"B?I)-!JCR_@J8H!Z.07=9B7
M\;D;]R#RN$A+.H%;J F !):*F$\&,#J45WG.D1"VC;DC+NOFJ?W"4QB$"23*
M/6,>.2B]:!J9UY0>JZ!/$&5#J262/?!%!.)D)I[EBM2U)_"IWFQ'60\O)I,*
M@5,TD3\KG\Q9RI$QXTD5PDE\J3NNQ==JNPC=\^'HDG]9Z^RHWQL(GK K)KYU
MHS\0/(<A"&(17[]+)BV+A>5N`<H!4-=#[F05N@P*D+D5HS;W:\=L#68!&P!N
MULIM>I9S(IA^!8HZ%TC3D *L:W=0$^"7:[TW6"%4+O<D?5$U`5ZYYON#Y<*)
M*K+.<)AD+:.C1D4"2S]^.#;RA$MZ+4C XL"!`P<.'#APX,"!`P<.'#APX,"!
9`P<.'#APX,"!`P<.'*P!_@%>`SSL`/ `````
`
end


Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
I was told that they had some troubles of reading the original mail
sent by myself.
I reattached the source files.
sorry for the inconvenience.

README (208 bytes) Download Attachment
sample.stp (1K) Download Attachment
utils.stp (2K) Download Attachment
backtrace.stp (3K) Download Attachment
hookid_defs.stp (1014 bytes) Download Attachment
ioscheduler.stp (5K) Download Attachment
logtrace.stp (6K) Download Attachment
process.stp (3K) Download Attachment
scsi.stp (7K) Download Attachment
syscalls.stp (8K) Download Attachment
tskdispatch.stp (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
In reply to this post by Li Guanglei
> Hien Nguyen wrote:
>
> >2. Like Frank once commented,consider using printf for some of the
> >logtrace functions,so we could emplement them in script instead of
> >embeded C. For example, the log_tracedata_common() function could
> >implement in script as follow
> >  printf("%d|%d|%d|%d|%d|%d\n", gettimeofday_ms(), pid(), ppid(),
> > tid(), cpuid(), hoodID)
> >
> > [...]
> >
> >3. Consider using gettimeofday_s() for second, gettimeofday_ms()
> > for millisecond, gettimeofday_ns() for naosenconds etc... They are
> > new code under tapset, so you may not be aware.
> >
---------------------------------------------------------------------

Thanks. In fact one reason of using embedded c tapsets is because I
can get values directly by accessing a variable, which is cheaper than
calling functions such as pid(), tid(), stp_pid() etc.

Another factor I considered is that some embedded c tapsets seem not
easy to be replaced by a equivalent stap script, e.g.
log_ioscheduler_tracedata_extra_elv_next(). To keep all my codes
consistent, I use log_tracedata_common() and log_tracedata_xxx_extra()
for all events.

Frank suggested making those embedded c functions to returns a string,
which the caller script routine could print, so that we can keep as
much logic as possible in script. The codes of mine hide the logging
details inside those embedded c functions. I am not sure which of
these two are better, any suggestions?


Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Mathieu Desnoyers-2
In reply to this post by Li Guanglei
Just to throw some ideas :

Which protection against concurrency between multiple data sources do you use
(SMP, process vs interrupt) ?

I have looked at the CVS SystemTAP code : the _stp_printk function is all but
atomic.

Mathieu


* Guanglei Li ([hidden email]) wrote:

> Hi folks,
>
>   As mentioned by Jose this morning, we are developing a kernel event
> trace tool using systemTap. It is used to collect the probed data
> which will then be processed by a post-processing tool.This tool aims
> to provide an easy way to trace various events inside the kernel for
> the purpose of performance analysis.
>
>   Although its requirement come internally, we hope this tool to be
> open and could have a broader audience instead of confining it inside
> IBM.
>
>   As a start point, we implemented only a limited number of event
> hooks. It now contains 6 kinds of events and totally 16 events.
>
>   Here's the list we've implemented by now:
>
>  System Call:
>     entry of syscall
>     return of syscall
>  Process Creation:
>     fork
>     execve
>  IO Scheduler:
>     a request is retrieved from request queue
>     a request is added to the request queue
>     a request is removed from the request queue
>  Task Dispatching(scheduler):
>     context switch
>     cpu idle
>  Backtrace
>     backtrace when cpu is idle
>     backtrace when an io request is dispatched to scsi layer
>  Scsi Activities
>     mid-layer prepared a IO request
>     Dispatch a command to the LLD
>     I/O is done by LLD
>     mid-layer processes the completed IO
>
>   I posted a piece of codes from this tool onto mailing list a few
> days ago. I've incorporated some of the feedbacks from the community
> into this tool.
>
>   I added some comments to the source codes to help better understand
> this trace tool. I've tested this tool on 2.6.9-24EL on Power5, and
> some test on 2.6.14.2 on i386. But there are still some bugs in
> systemtap or elfutils that make some event hooks fail to run. I
> addressed these exceptions in the comments of the src codes.
>
>   Since the codes is a little larger to be inlined into this mail, I
> also send it as an attachment.
>
>   Suggestions, Comments, Flames are welcome. And thanks in advance
> for your help.
>
> <<<<<<<<<<<<<<<<<<<<< source codes >>>>>>>>>>>>>>>>>>>>>>>>>>>
>
> --------------------- sample.stp ------------------------
> /* Attention:
>    If you choose to run all the following probes, your system
>    will become rather slow.
>
>    the log_user_data() in each event probe is only a demo of
>    the existance of this function. The data logged by it is
>    rather meaningless.
> */
> probe begin
> {
>     log("begin")
>     /* to take a snapshot of all the processes */
>     process_snapshot()
> }
>
> /* to trace system calls entry, hookID = 1 */
> probe addevent.syscall.entry
> {
>     log_user_data("Prototype for syscall entry, Ver: 0.1")
> }
>
> /* to trace system calls exit, hookID = 2 */
> probe addevent.syscall.return
> {
>     log_user_data("Prototype for syscall return, Ver: 0.1")
> }
>
> /* to trace process creation, include "fork" and "execve",
>   hookID: 3 for execve and 4 for fork */
> probe addevent.process
> {
>     log_user_data("Prototype for process creation, Ver: 0.1")
> }
>
> /*
>  to trace io scheduler activities,
>  hookID 5: a request is retrieved from request queue
>  hookID 6: a request is added to the request queue
>  hookID 7: a request is removed from the request queue
> */
> probe addevent.ioscheduler
> {
>     log_user_data("Prototype for io scheduler, Ver: 0.1")
> }
>
> /* to trace tasks dispatching.
>   hookID: 8 for context switch and 9 when cpu is idle */
> probe addevent.tskdispatch
> {
>     log_user_data("Prototype for tskdispatch, Ver: 0.1")
> }
>
> /* to trace backtrace,
>   hookID=10: for cpuidle
>   hookID=11: when an io request is dispatched to scsi layer,
>       failed on PPC64, refer to backtrace.stp */
> probe addevent.backtrace
> {
>     log_user_data("Prototype for backtrace, Ver: 0.1")
> }
>
> /* to trace scsi activities,
>   failed to run on PPC64, refer to scsi.stp
>
>   hookID = 13: mid-layer prepare a IO request
>   hookID = 14: Dispatch a command to the low-level driver
>   hookID = 15: I/O is done by low-level driver
>   hookID = 16: mid-layer processes the completed IO
> */
> probe addevent.scsi
> {
>     log_user_data("Prototype for scsi, Ver: 0.1")
> }
>
> probe end
> {
>     log("end")
> }
>
>
> --------------- tapsets/hookid_defs.stp ----------------
> global
>     /* syscall */
>     HOOKID_SYSCALL_ENTRY, HOOKID_SYSCALL_RETURN,
>
>     /* process creation */
>     HOOKID_PROCESS_FORK, HOOKID_PROCESS_EXECVE,
>
>     /* io scheduler */
>     HOOKID_IOSCHED_NEXT_REQ, HOOKID_IOSCHED_ADD_REQ,
>     HOOKID_IOSCHED_REMOVE_REQ,
>
>     /* task dispatching */
>     HOOKID_TASK_CTXSWITCH, HOOKID_TASK_CPUIDLE,
>
>     /* backtrace */
>     HOOKID_BT_CPUIDLE, HOOKID_BT_IOREQ_TO_SCSI,
>
>     /* scsi */
>     HOOKID_SCSI_IOENTRY, HOOKID_SCSI_IO_TO_LLD,
>     HOOKID_SCSI_IODONE_BY_LLD, HOOKID_SCSI_IOCOMP_BY_MIDLEVEL
>
> probe begin
> {
>     HOOKID_SYSCALL_ENTRY = 1
>     HOOKID_SYSCALL_RETURN = 2
>
>     HOOKID_PROCESS_EXECVE = 3
>     HOOKID_PROCESS_FORK = 4
>
>     HOOKID_IOSCHED_NEXT_REQ = 5
>      HOOKID_IOSCHED_ADD_REQ = 6
>     HOOKID_IOSCHED_REMOVE_REQ = 7
>
>     HOOKID_TASK_CTXSWITCH = 8
>     HOOKID_TASK_CPUIDLE = 9
>
>     HOOKID_BT_CPUIDLE = 10
>     HOOKID_BT_IOREQ_TO_SCSI = 11
>
>     HOOKID_SCSI_IOENTRY = 13
>     HOOKID_SCSI_IO_TO_LLD = 14
>     HOOKID_SCSI_IODONE_BY_LLD =15
>     HOOKID_SCSI_IOCOMP_BY_MIDLEVEL = 16
> }
> --------------- tapsets/logtrace.stp --------
>
> /* Log the data common to all events */
> function log_tracedata_common(hookID:long)
> %{
>     struct timeval tv;
>     struct task_struct *cur = current;
>
>     /* second|usec|pid|ppid|tid|cpuid|hoodID */
>
>     do_gettimeofday (&tv);
>     /* The actual format is a TBD. we currently print the ascii string
>      for convenience. */
>     _stp_printf("\n%ld|%ld|%ld|%ld|%ld|%u|%ld|", tv.tv_sec, tv.tv_usec,\
>               cur->tgid, cur->parent->pid, cur->pid, cur->thread_info->cpu,\
>               THIS->hookID);
> %}
>
> /* Log the data defined by users */
> function log_user_data(input:string)
> %{
>     struct task_struct *cur = current;
>
>     if(cur->tgid != _stp_pid)  {
>         if( _stp_target != 0 && cur->tgid != _stp_target)
>             return;
>         /* here USER: is a seperator. we use this for
>          convenience. It may be changed in future. */
>         _stp_printf ("USER:%s|", THIS->input);
>     }
> %}
>
>
> --------------- tapsets/utils.stp ----------------
>
> /* return the current name of probe point */
> function probename:string ()
> %{
>     char *ptr, *tok;
>     char buffer[MAXSTRINGLEN];
>     strlcpy(buffer, CONTEXT->probe_point, MAXSTRINGLEN);
>     ptr = buffer;
>     tok = strsep(&ptr, "\"");
>     tok = strsep(&ptr, "@");
>     strlcpy (THIS->__retvalue, tok, MAXSTRINGLEN);
> %}
>
> /* data tracing filter by pid
>    return:
>     1 - if continue to log the raw data
>     0 - return without logging the raw data
>  */
> function filter_by_pid:long()
> %{
>     struct task_struct *cur = current;
>
>     if(cur->tgid != _stp_pid)  {
>         /* to trace a specific process if we explicitly specify
>            which process we want to trace by:
>             1. stap -c "process_to_trace" ...
>             2. stap -x pid_to_trace ...
>            else we will trace all the processes
>          */
>         if( _stp_target != 0 && cur->tgid != _stp_target) {
>             THIS->__retvalue = 0;
>             return;
>         }
>
>         THIS->__retvalue = 1;
>     } else  /*skip the events generated by stap itself*/
>         THIS->__retvalue = 0;
>     return;
> %}
>
>
>
> --------------- tapsets/syscalls.stp -----------------------
>
> /********************************************************
> * We choose to probe both 64-bit and 32-bit application *
> * on PPC64 platform. And by making use of conditional   *
> * preprocessing, now the syscall probe could support    *
> * ppc64 and i686 platform                               *
> *                                                       *
> * the syscall lists for PPC64 that I listed below need  *
> * further modifications. Efforts are needed to address  *
> * different kernel versions, expecially the RH shipped  *
> * kernel(e.g. 2.6.9-24EL).                              *
> ********************************************************/
>
> probe addevent.syscall
>     = addevent.syscall.entry, addevent.syscall.return
> {
> }
>
> probe addevent.syscall.entry
>     =
> %( arch == "ppc64" %?
> %( kernel_v == "2.6.9" %? /* in fact, this refer to RH shipped kernel */
>     kernel.function("sys_*"),
>     kernel.function("compat_sys_*"),
>     kernel.function("sys32_execve"),
>     kernel.function("sys32_time"),
>     kernel.function("ppc64_sys32_stime"),
>     kernel.function("sys32_ptrace"),
>     kernel.function("sys32_pause"),
>     kernel.function("sys32_olduname"),
>     kernel.function("sys32_sigaction"),
>     kernel.function("sys32_sigsuspend"),
>     kernel.function("sys32_gettimeofday"),
>     kernel.function("sys32_settimeofday"),
>     kernel.function("old32_readdir"),
>     kernel.function("sys32_sysinfo"),
>     kernel.function("sys32_ipc"),
>     kernel.function("sys32_sigreturn"),
>     kernel.function("sys32_adjtimex"),
>     kernel.function("ppc64_personality"),
>     kernel.function("sys32_getdents"),
>     kernel.function("sys32_sysctl"),
>     kernel.function("sys32_sched_setparam"),
>     kernel.function("sys32_sched_rr_get_interval"),
>     kernel.function("sys32_rt_sigreturn"),
>     kernel.function("sys32_rt_sigaction"),
>     kernel.function("sys32_rt_sigprocmask"),
>     kernel.function("sys32_rt_sigpending"),
>     kernel.function("sys32_rt_sigtimedwait"),
>     kernel.function("sys32_rt_sigqueueinfo"),
>     kernel.function("sys32_sigaltstack"),
>     kernel.function("sys32_sendfile64"),
>     kernel.function("ppc32_timer_create"),
>     kernel.function("compat_timer_settime"),
>     kernel.function("compat_timer_gettime"),
>     kernel.function("compat_clock_settime"),
>     kernel.function("compat_clock_gettime"),
>     kernel.function("compat_clock_getres"),
>     kernel.function("compat_clock_nanosleep"),
>     kernel.function("sys32_swapcontext"),
>     kernel.function("sys32_utimes"),
>     kernel.function("compat_statfs64"),
>     kernel.function("compat_fstatfs64"),
>     kernel.function("ppc_rtas"),
>     kernel.function("compat_mbind"),
>     kernel.function("compat_get_mempolicy"),
>     kernel.function("compat_set_mempolicy"),
>     kernel.function("sys64_time"),
>     kernel.function("ppc64_sys_stime"),
>     kernel.function("ppc64_newuname"),
>     kernel.function("ppc64_personality"),
>     kernel.function("ppc_rtas")
> %:                /*this is a TBD */
>     kernel.function("sys_*"),
>     kernel.function("compat_sys_*"),
>     kernel.function("sys32_getdents"),
>     kernel.function("sys32_adjtimex"),
>     kernel.function("sys32_pause"),
>     kernel.function("sys32_gettimeofday"),
>     kernel.function("sys32_settimeofday"),
>     kernel.function("sys32_ipc"),
>     kernel.function("sys32_execve"),
>     kernel.function("sys32_sysctl"),
>     kernel.function("sys32_olduname"),
>     kernel.function("sys32_utimes"),
>     kernel.function("ppc_rtas"),
>     kernel.function("sys64_time"),
>     kernel.function("ppc64_newuname")
> %)
> %:
> %( arch == "i686" %?
>     kernel.function("sys_*")
> %: **ERROR**
> %)
> %)
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_SYSCALL_ENTRY)
>         log_syscall_tracedata_extra()
>     }
> }
>
> probe addevent.syscall.return
>     =
> %( arch == "ppc64" %?
> %( kernel_v == "2.6.9" %? /* in fact, this refer to RH shipped kernel */
>     kernel.function("sys_*").return,
>     kernel.function("compat_sys_*").return,
>     kernel.function("sys32_execve").return,
>     kernel.function("sys32_time").return,
>     kernel.function("ppc64_sys32_stime").return,
>     kernel.function("sys32_ptrace").return,
>     kernel.function("sys32_pause").return,
>     kernel.function("sys32_olduname").return,
>     kernel.function("sys32_sigaction").return,
>     kernel.function("sys32_sigsuspend").return,
>     kernel.function("sys32_gettimeofday").return,
>     kernel.function("sys32_settimeofday").return,
>     kernel.function("old32_readdir").return,
>     kernel.function("sys32_sysinfo").return,
>     kernel.function("sys32_ipc").return,
>     kernel.function("sys32_sigreturn").return,
>     kernel.function("sys32_adjtimex").return,
>     kernel.function("ppc64_personality").return,
>     kernel.function("sys32_getdents").return,
>     kernel.function("sys32_sysctl").return,
>     kernel.function("sys32_sched_setparam").return,
>     kernel.function("sys32_sched_rr_get_interval").return,
>     kernel.function("sys32_rt_sigreturn").return,
>     kernel.function("sys32_rt_sigaction").return,
>     kernel.function("sys32_rt_sigprocmask").return,
>     kernel.function("sys32_rt_sigpending").return,
>     kernel.function("sys32_rt_sigtimedwait").return,
>     kernel.function("sys32_rt_sigqueueinfo").return,
>     kernel.function("sys32_sigaltstack").return,
>     kernel.function("sys32_sendfile64").return,
>     kernel.function("ppc32_timer_create").return,
>     kernel.function("compat_timer_settime").return,
>     kernel.function("compat_timer_gettime").return,
>     kernel.function("compat_clock_settime").return,
>     kernel.function("compat_clock_gettime").return,
>     kernel.function("compat_clock_getres").return,
>     kernel.function("compat_clock_nanosleep").return,
>     kernel.function("sys32_swapcontext").return,
>     kernel.function("sys32_utimes").return,
>     kernel.function("compat_statfs64").return,
>     kernel.function("compat_fstatfs64").return,
>     kernel.function("ppc_rtas").return,
>     kernel.function("compat_mbind").return,
>     kernel.function("compat_get_mempolicy").return,
>     kernel.function("compat_set_mempolicy").return,
>     kernel.function("sys64_time").return,
>     kernel.function("ppc64_sys_stime").return,
>     kernel.function("ppc64_newuname").return,
>     kernel.function("ppc64_personality").return,
>     kernel.function("ppc_rtas").return
> %:                /*this is a TBD */
>     kernel.function("sys_*").return,
>     kernel.function("compat_sys_*").return,
>     kernel.function("sys32_getdents").return,
>     kernel.function("sys32_adjtimex").return,
>     kernel.function("sys32_pause").return,
>     kernel.function("sys32_gettimeofday").return,
>     kernel.function("sys32_settimeofday").return,
>     kernel.function("sys32_ipc").return,
>     kernel.function("sys32_execve").return,
>     kernel.function("sys32_sysctl").return,
>     kernel.function("sys32_olduname").return,
>     kernel.function("sys32_utimes").return,
>     kernel.function("ppc_rtas").return,
>     kernel.function("sys64_time").return,
>     kernel.function("ppc64_newuname").return
> %)
> %:
> %( arch == "i686" %?
>     kernel.function("sys_*").return
> %: **ERROR**
> %)
> %)
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_SYSCALL_RETURN)
>         log_syscall_tracedata_extra()
>     }
> }
>
> /* log the system call name */
> function log_syscall_tracedata_extra()
> %{
>     char *tok, *ptr;
>     char buffer[MAXSTRINGLEN];
>
>     ptr=buffer;
>     strlcpy (buffer, CONTEXT->probe_point, MAXSTRINGLEN);
>     tok = strsep(&ptr, "\"");
>     tok = strsep(&ptr, "@");
>     _stp_printf("%s|", tok);
> %}
>
> ----------------- tapsets/process.stp ----------------
>
> /* record the newly created process name */
> function log_execve_extra(var:long)
> %{
>     long long tmp = THIS->var;
>     long tmp1=(long)tmp;
>     _stp_printf("%s|", (char *)tmp1);
> %}
>
>
> /* record the newly forked process id */
> function log_fork_extra(var:long)
> %{
>     long pid = (long)THIS->var;
>     _stp_printf("%ld|", pid);
> %}
>
>
> /************************************************************
> * This function could be used to take a snapshot of all the *
> * processes. It's not a probe, so the data format doesn't   *
> * follow the format used by probe handlers                  *
> ************************************************************/
> function process_snapshot()
> %{
>     struct task_struct *tsk;
>     struct list_head *cur, *head;
>     head = &(current->tasks);
>
>     /* iterate all the processes, and record the pid and process
>     name for each entry */
>     list_for_each(cur, head) {
>         tsk = (struct task_struct *)(list_entry(cur, struct task_struct, tasks));
>         _stp_printf("PID:%ld|PNAME: %s\n", tsk->pid, tsk->comm);
>     }
> %}
>
>
> probe addevent.process
>     =  addevent.process.fork, addevent.process.execve
> {
> }
>
> /*
>   we should capture both do_execve for 64-bit app
>   and compat_do_execve for 32-bit app
> */
> probe addevent.process.execve
>     =  kernel.function("*do_execve")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_PROCESS_EXECVE)
>         log_execve_extra($filename)
>     }
> }
>
> probe addevent.process.fork
>     = kernel.function("copy_process")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_PROCESS_FORK)
>         log_fork_extra($pid)
>     }
> }
>
> ------------------ tapsets/ioscheduler.stp ------------------
>
> /************************************************************
> * Probe the event when a request is retrieved from request  *
> * queue(dispatched to corresponding device), the event when *
> * a request is added by block layer into the request queue, *
> * and the event is removed by from the request queue        *
> ************************************************************/
> probe addevent.ioscheduler
>     = addevent.ioscheduler.elv_next_request,
>     addevent.ioscheduler.elv_add_request,
>     addevent.ioscheduler.elv_remove_request
> {
> }
>
> /* when a request is retrieved from request queue */
> probe addevent.ioscheduler.elv_next_request
>     =  kernel.function("elv_next_request")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_IOSCHED_NEXT_REQ)
>         log_ioscheduler_tracedata_extra_elv_next($q)
>     }
> }
>
>
> /* when a request is added to the request queue */
> probe addevent.ioscheduler.elv_add_request
>     = kernel.function("__elv_add_request")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_IOSCHED_ADD_REQ)
>         log_ioscheduler_tracedata_extra_others($q, $rq)
>     }
> }
>
> /* when a request is removed from the request queue */
> probe addevent.ioscheduler.elv_remove_request
>     = kernel.function("elv_remove_request")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_IOSCHED_REMOVE_REQ)
>         log_ioscheduler_tracedata_extra_others($q, $rq)
>     }
> }
>
>
> %{
> #include <linux/blkdev.h>
> #include <linux/elevator.h>
> %}
>
> /* log the data specific to __elv_add_request and elv_remove_request,
>   for kernel == 2.6.9. Need to confirm the actual kernel version range
>   i.e. kernel_v <= 2.6.x.  */
> %( kernel_v == "2.6.9" %?
> function log_ioscheduler_tracedata_extra_others(var_q:long, var_rq:long)
> %{
>     struct request *rq = (struct request *)((long)THIS->var_rq);
>     struct request_queue *q = (struct request_queue *)((long)THIS->var_q);
>
>     /* elevator name|major|minor| */
>     _stp_printf("%s|%ld|%ld|", q->elevator.elevator_name, rq->rq_disk->major,\
>                   rq->rq_disk->first_minor);
> %}
>
> /* log the data specific to __elv_add_request and elv_remove_request,
>   for kernel > 2.6.9. The exact version need to be determined. */
> %:
> function log_ioscheduler_tracedata_extra_others(var_q:long, var_rq:long)
> %{
>     struct request *rq = (struct request *)((long)THIS->var_rq);
>     struct request_queue *q = (struct request_queue *)((long)THIS->var_q);
>
>     /* elevator name|major|minor| */
>     _stp_printf("%s|%ld|%ld|", q->elevator->elevator_type->elevator_name,\
>                 rq->rq_disk->major, rq->rq_disk->first_minor);
> %}
> %)
>
> /* log the data specific to elv_next_request, for kernel == 2.6.9 */
> %( kernel_v == "2.6.9" %?
> function log_ioscheduler_tracedata_extra_elv_next(var:long)
> %{
>
>     struct request_queue *q;
>     struct request *rq;
>
>     q = (struct request_queue *)((long)THIS->var);
>
>     /* If there is a request in the request queue:
>         elevator name|major|minor|
>        if there is no request in the request queue:
>         elevator name|empty|
>     */
>     if(list_empty(&(q->queue_head))) {
>         _stp_printf("%s|empty|", q->elevator.elevator_name);
>     } else  {
>         rq = list_entry_rq(q->queue_head.next);
>         _stp_printf("%s|%ld|%ld", q->elevator.elevator_name, \
>                      rq->rq_disk->major, rq->rq_disk->first_minor);
>     }
> %}
>
> %:
> /* log the data specific to elv_next_request , for kernel > 2.6.9*/
> function log_ioscheduler_tracedata_extra_elv_next(var:long)
> %{
>
>     struct request_queue *q;
>     struct request *rq;
>
>     q = (struct request_queue *)((long)THIS->var);
>
>     /* If there is a request in the request queue:
>         elevator name|major|minor|
>        if there is no request in the request queue:
>         elevator name|empty|
>     */
>     if(list_empty(&(q->queue_head)))  {
>         _stp_printf("%s|empty|", q->elevator->elevator_type->elevator_name);
>     } else  {
>         rq = list_entry_rq(q->queue_head.next);
>         _stp_printf("%s|%ld|%ld", q->elevator->elevator_type->elevator_name,\
>                      rq->rq_disk->major, rq->rq_disk->first_minor);
>     }
> %}
> %)
>
> ---------------------- tapsets/tskdispatch.stp ------------------------
>
> /**********************************************************
> * Dispatching when the cpu is idle or when a new process  *
> * is chosen to run.                                       *
> *                                                         *
> * The actual locations for these two kinds of events are  *
> * the labels go_idle and switch_tasks inside the function *
> * schedule. But currently SystemTap doesn't support       *
> * specifying probe points by label.                       *
> *                                                         *
> * Bugzilla shows that #1564 has been fixed                *
> **********************************************************/
> probe addevent.tskdispatch
>     = addevent.tskdispatch.ctxswitch,
>     addevent.tskdispatch.cpuidle
> {
> }
>
> /*******************************************************
> * Pls refer to #908 in bugzilla.                       *
> *                                                      *
> * Currently systemTap can't access arguments of inline *
> * functions. So we choose to probe __switch_to instead *
> * of context_switch()                                  *
> *******************************************************/
>
> probe addevent.tskdispatch.ctxswitch
>     = kernel.function("__switch_to")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_TASK_CTXSWITCH)
> /* multi-arches processing */
> %( arch == "ppc64" %?
>         log_ctxswitch_extra($prev, $new)
> %:
>     %( arch == "i686" %?
>         log_ctxswitch_extra($prev_p, $next_p)
>     %: UNSUPPORTED
>     %)
> %)
>     }  //end if
>
> }
>
> /* Only applicable to SMP systems */
> probe addevent.tskdispatch.cpuidle
>     = kernel.inline("idle_balance")
> {
>     /* we didn't call filter_by_pid() here,
>      so that we can get all the idle events
>      despite how the cpu enters idle */
>     log_tracedata_common(HOOKID_TASK_CPUIDLE)
>     log_cpuidle_extra()
> }
>
> function log_ctxswitch_extra(prev:long, next:long)
> %{
>     struct task_struct *prev_tsk, *next_tsk;
>
>     prev_tsk = (struct task_struct *)((long)THIS->prev);
>     next_tsk = (struct task_struct *)((long)THIS->next);
>
>     _stp_printf("%ld|%ld|%ld|", prev_tsk->pid, next_tsk->pid, prev_tsk->state);
> %}
>
> function log_cpuidle_extra()
> %{
>     struct task_struct *cur = current;
>     _stp_printf("%ld|", cur->pid);
> %}
>
> ---------------- tapsets/backtrace.stp ------
>
> /******************************************************
> * Print the backtrace when:                           *
> *  1. cpu is idle                                     *
> *  2. when an io request is dispatched to scsi layer  *
> *  ...                                                *
> *                                                     *
> *   To reduce overhead of symbol lookup,We only print *
> * the address sequence of backtrace, for example:     *
> *   c016844f c030fe6b                                 *
> *                                                     *
> *   Some post-processing is needed to get the symbol  *
> * info associated with these addresses.               *
> *                                                     *
> *   We could use _stp_symbol_print to get such infor. *
> *   For example:                                      *
> *     _stp_symbol_print(0xc016844f);                  *
> *                                                     *
> *   The output should look like:                      *
> *     0xc016844f : sys_read+0x2/0x62 []               *
> ******************************************************/
> probe addevent.backtrace
>     = addevent.backtrace.cpuidle,
>     addevent.backtrace.scsiioentry
> {
> }
>
> /*
>  Only applicable to SMP systems
>  Refer to addevent.tskdispatch.cpuidle in tskdispatch.stp
> */
> probe addevent.backtrace.cpuidle
>     = kernel.inline("idle_balance")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_BT_CPUIDLE)
>         log_backtrace_cpuidle()
>     }
> }
>
> /* This failed to work on PPC64, refer to the comments
>  in scsi.stp  */
> probe addevent.backtrace.scsiioentry
>     = module("scsi_mod").function("scsi_prep_fn")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_BT_IOREQ_TO_SCSI)
>         log_backtrace_scsiioentry()
>     }
> }
>
> function log_backtrace_cpuidle()
> %{
>     /* print the backtrace without symbols information */
>     _stp_stack_sprint(_stp_stdout,CONTEXT->regs,0);
>     _stp_printf("|");
> %}
>
> function log_backtrace_scsiioentry()
> %{
>     /* print the backtrace without symbols information */
>     _stp_stack_sprint(_stp_stdout,CONTEXT->regs,0);
>     _stp_printf("|");
> %}
>
> ----------- tapsets/scsi.stp -------------
>
> /***********************************************************
> * Trace the following activities of scsi layer:            *
> *  1. I/O Entry                                            *
> *  2. Issuing I/O to LLD                                   *
> *  3. I/O done by LLD                                      *
> *  4. I/O Complete                                         *
> *                                                          *
> *   Due to a bug for elfutils on ppc64, it failed to find  *
> * any module debuginfo. Hien listed this as a bug in:      *
> *
> * http://sourceware.org/ml/systemtap/2005-q4/msg00222.html *
> *                                                          *
> * Roland is fixing this.                                   *
> ***********************************************************/
> %{
> #include <linux/types.h>
> #include <scsi/scsi_cmnd.h>
> #include <scsi/scsi_device.h>
> #include <scsi/scsi_host.h>
> #include <linux/timer.h>
> #include <linux/blkdev.h>
> %}
>
> probe addevent.scsi
>     = addevent.scsi.ioentry,
>     addevent.scsi.iodispatching,
>     addevent.scsi.iodone,
>     addevent.scsi.iocompleted
> {
> }
>
> /* mid-layer prepare a IO request */
> probe addevent.scsi.ioentry
>     = module("scsi_mod").function("scsi_prep_fn")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_SCSI_IOENTRY)
>         log_scsi_ioentry_extra($q, $req)
>     }
> }
>
> /* Dispatch a command to the low-level driver. */
> probe addevent.scsi.iodispatching
>     = module("scsi_mod").function("scsi_dispatch_cmd")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_SCSI_IO_TO_LLD)
>         /* sdev_state|scsi_info|data_direction|request_buffer|request_bufflen| */
>         log_scsi_dispatch_extra_1($cmd)
>         /* cmd_identifier|  */
>         log_scsi_dispatch_extra_2($cmd)
>     }
> }
>
> /* I/O is done by low-level driver*/
> probe addevent.scsi.iodone
>     = module("scsi_mod").function("scsi_done")
> {
>     /* scsi timer check. We should record the hook only
>      * when the timer is inactive. But there's a gap between
>      * the checking and the actual calling of scsi_delete_timer.
>      */
>     if(filter_by_pid() == 1 ) {
>         if( scsi_timer_pending($cmd) == 1) {
>             log_tracedata_common(HOOKID_SCSI_IODONE_BY_LLD)
>             log_scsi_iodone_extra($cmd)
>         }
>     }
> }
>
> /* mid-layer processes the completed IO */
> probe addevent.scsi.iocompleted
>     = module("scsi_mod").function("scsi_io_completion")
> {
>     if(filter_by_pid() == 1 ) {
>         log_tracedata_common(HOOKID_SCSI_IOCOMP_BY_MIDLEVEL)
>         log_scsi_iocompleted_extra($cmd, $good_bytes)
>     }
> }
>
> function scsi_timer_pending:long(var:long)
> %{
>     struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
>     THIS->__retvalue = timer_pending(&cmd->eh_timeout);
> %}
>
>
> /* log the info about scsi io entry */
> function log_scsi_ioentry_extra(var_q:long, var_rq:long)
> %{
>     struct request_queue *q = (struct request_queue *)((long)THIS->var_q);
>     struct request *rq = (struct request *)((long)THIS->var_rq);
>     struct scsi_device *sdev = (struct scsi_device *)(q->queuedata);
>
>     /* major|minor|scsi_device_state| */
>     _stp_printf("%d|%d|%d|", rq->rq_disk->major, rq->rq_disk->first_minor, sdev->sdev_state);
> %}
>
> /* log the info about scsi_dispatching_cmd */
> function log_scsi_dispatch_extra_1(var:long)
> %{
>     struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
>     long long scsi_info;
>
>     scsi_info = ((cmd->device->host->host_no & 0xFF) << 24) |
>         ((cmd->device->channel & 0xFF) << 16) |
>         ((cmd->device->lun & 0xFF) <<  8) |
>         (cmd->device->id & 0xFF);
>
>     /* sdev_state|scsi_info|data_direction|cmd_identifier|request_buffer|request_bufflen
>      *
>      * sdev_state could be: SDEV_DEL, SDEV_BLOCK or something else.
>      * Instead of skipping SDEV_DEL & SDEV_BLOCK, I choose to log them
>      */
>     _stp_printf("%d|%lld|%d|%0x|%d", cmd->device->sdev_state, scsi_info, cmd->sc_data_direction,
>         cmd->request_buffer, cmd->request_bufflen);
>
> %}
>
> /* log the info about scsi_dispatching_cmd
>  *
>  * cmd_identifier|
>  *
>  * Now we can use cmd->serial_number as cmd identifier(still valid till
>  * 2.6.14.2.  But according to the comments of struct scsi_cmnd, it's
>  * a better to use cmd->pid since cmd->serial_number will be killed one
>  * day in the future
>  *
>  * But when scsi_dispatch_cmd is called, cmd->serial_number is still not
>  * initialized.
>  * For kernel >= 2.6.14, it will be set later by calling scsi_cmd_get_serial.
>  * So I choose to record cmd->device->host->cmd_pid. But there is a gap between
>  * the time when cmd->device->host->cmd_pid is retrieved at the beginning of
>  * scsi_dispatch_cmd and the actual calling of scsi_cmd_get_serial.
>  *
>  * For kernel <=2.6.9, it will be set by a global counter.
>  *
>  * NOTE: The kernel version need further investigation.
>  */
>
> %( kernel_v >= "2.6.14" %?
> function log_scsi_dispatch_extra_2(var:long)
> %{
>     struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
>     long long scsi_info;
>
>     _stp_printf("%d|", cmd->device->host->cmd_pid);
> %}
> %:
>
> function log_scsi_dispatch_extra_2(var:long)
> %{
>     struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
>     long long scsi_info;
>
>     /* systemTap failed to access global variable. So I temporarily use 0.
>     _stp_printf("%d|", scsi_pid);
>     */
>     _stp_printf("%d|", 0);
>
> %}
> %)
>
> /* log the info about scsi_done */
> function log_scsi_iodone_extra(var:long)
> %{
>     struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var);
>     long long scsi_info;
>
>     scsi_info = ((cmd->device->host->host_no & 0xFF) << 24) |
>         ((cmd->device->channel & 0xFF) << 16) |
>         ((cmd->device->lun & 0xFF) <<  8) |
>         (cmd->device->id & 0xFF);
>
>     /* scsi_info|data_direction|cmd_identifier| */
>     _stp_printf("%lld|%d|%d", scsi_info, cmd->sc_data_direction, cmd->pid);
> %}
>
> /* log the info about scsi_dispatching_cmd */
> function log_scsi_iocompleted_extra(var_cmd:long, var_goodbytes:long)
> %{
>     struct scsi_cmnd *cmd = (struct scsi_cmnd *)((long)THIS->var_cmd);
>     long long scsi_info;
>     unsigned long goodbytes = (unsigned long)(THIS->var_goodbytes);
>
>     scsi_info = ((cmd->device->host->host_no & 0xFF) << 24) |
>         ((cmd->device->channel & 0xFF) << 16) |
>         ((cmd->device->lun & 0xFF) <<  8) |
>         (cmd->device->id & 0xFF);
>
>     /* scsi_info|data_direction|cmd_identifier|goodbytes */
>     _stp_printf("%lld|%d|%d|%u", scsi_info, cmd->sc_data_direction, cmd->pid, goodbytes);
> %}
>
>
> begin 666 stap-trace.tgz
> M'XL(`/3#GT,``^P]:W?CMK'[U?H5J+/>2*ZLM^3$6V_KM96L3KRV*VN3[&E[
> M>"@2DEA3)$U2MG6[^]_O# `^1/$E6W;2E#S)FB(&P&!F,)@9O!Q7M@Y<6U;H
> MP<'"<%QYK,-;H]:LO]K:TX#GL-ME?^&)_F7OS4:[<WC8:?8.6Z\:S4[CL/V*
> M=+>'0O*S@#;;A+RR3=--@\M*_R]]G$3^#_LG9Q_[VZ@#&=SK=!+YWVZV./];
> MK5:WW4'^'[9[KTAC&Y5G/?_C_"=D2"?4)JY)''ENZ;3FN!:9F/!E1HF&7^;4
> M<*E*Z!W\)3/3O'%JI1+D6Q@,A@D/Y#=U,EX>00**%#D8U.KPUZ&N$RH84LOD
> MXG+4AYR:0^XU78?\VG0*&,CX#N6%ZB&RH1)%7CB4I3A+QZ5S*(,@NF-*[JB]
> M)(YNWNO+2NFWIN1_YY/<_SG7),6<6[+B(O<>6T=F_^^V/?W?Z'8.H?\?]IKM
> MHO^_Q%/?)R/LBHYB:Y9+X$TF*IV;3 7HYE1BLJ'*KHR2,#<-3R#(?KUDV2;T
> MPAMJ&U2O31:&XFJF4=Z%?BKM[U9*_RGMI)4@S4V;EC]<7OXT.).N/U^?GIR?
> M2_V+T?!SI;0#>%W:1ZROBR>MJ(12_+R Z]="0<0]R?U?:.\MV(&;V'^'#8!K
> M=KO=POY[D2>;_YKI*#.J+G1J/W(02-?_S5:[Y?&_#7H?]'^KT0.3L-#_+_#4
> M][?VE/;)MAXLZ]2TEF :SEQ25BID`#:H;<@XP,@Z>;]P-(,Z#ODH*S-\`VC;
> MJE5)J]'HQI2U3;S8< DCW]26YSA>3FP*IJDY<>]EF[XE2W,!)JM!;*IJ#MBV
> MXX4+=K2+IFP=AM2YJ6J3I5<6?%\8*N7&-C1P[A!SPG[\>/&)_$@-:D-KKQ9C
> M75/(N:90`TQA&:K'+P[T2C"YO;(PUP^(R[7 A?Q@0N&,9&\)U2#=1HO9@=^D
> MY=4C"JT2P"UH8UEVL2$V,2W,7@'LET27W:"$6FYZ/2?M`Q*K1./NR,RTT%N
> M!@!QF8,!)@IX$).%7B4`Z9?URV#TX?+3B)Q<?":_G R')Q>CSV\)9'%G)H"A
> M&^+[0!J4#R2U9<-=(N6B>'WL#T\_0 $G[P?G@]%G).8/@]%%__J:_' Y)"?D
> MZF0X&IQ^.C\9DJM/PZO+ZSX0\)IROV:UK!3.3Y@``6=5ZLJ:[J0R8=NT_PR"
> M[0!I=)7,Y#L*`JY0[0X((Q,%NFH.P0W*DG73F#)2<T=0L/0MT2;$,-TJN;<U
> M%YW*&*F.XA5(>16TA )*H/L]&5%T7LB5#@-;E5POL+1VNU$E[TW'1<B/)Z31
> M:C:;!\UVXY!\NC[9.KU.%B!(]A$A/RYD8ZI3#>A RE/QXV^*4=/&\QH8L)7,
> MLK;VU$NE)RE\:-85L_D#1_U^!OU$!FFX75"'.1 VA3Y)43(FMCGW4QA1X'5!
> MR]!M+=E%JP)9K)BV31W+-%0-A$*E=R NE6JT"B8VX6ID5>7Z;ZR;R@UHIR5H
> M)\T0,N,!L@JK/+>AA@IEB,[-.UX&PW0MWW98X/E)B#!670M95*6=X]CO-:K?
> M209]<"6!4)64=A(!(<&'2P'C[?4@P3W[BN*0GX.<)JGM6<,;&[CN(4:AN+.H
> M3<H338=!1AHO)4M3RQ5R?$R:I$(@,=:5]/R^P>7UZ8?^F731_W4D#?M_KPCX
> M$&:AO%"Q#?\*',JO;P'\*Q(CGAI<T.+D*@<M0JQ!4JQ10I(B4-LBQ<G965Y*
> MF&@:.$"'*GEM^\1(D S>91+Z2S8](C(81Y)UL&W19-C_>/ES_VED*>W]I_2-
> M9BCZ0J7D+[IF+![J8_T&FEN;O5M+H3J]DUW3QK0]3E2HEE$.:R*.115M`F,D
> MB->:*#"-M4Z-:HDP2X 3#FG0JO5JW]?(!?44JC'1;,X?67$7,!(+6,_\`T-F
> M2J$8K49K(DVZ(W_A)3W46,!DKQRD0!V[K))=LO?7DL<JDI."=[(MW1[AD%\E
> M^&[S'Q4DY0X8< O%]>5HW[XEQZ0<_5HIEUF6T8?!]<$[7DCE;32W)*0PI@@O
> M:;T@5@X+.'G,(H8\IU_F\K]-^\M<,^!?I,>.!-ZO9-DPQ$S*NWO.ESU=9?_O
> M5LGMP3N?T]Z+A*54B0UI]JT$0][-P3M69N0;L K08_54WI+GDI)WGI",< 1\
> MP ">)PR&$)LQLRO!"0&'2JTQ$3@J>)W&Z^!-<I<6#?U\-._W*NGL7[<+8E1!
> M:O<E^7GJ#Y% NX"-B9Q8XQ%R^"W ;\*C"F8`"@R8.P'V/@M)^T.0L3[J'($N
> M3V9G:0<,."U4F&%N6!JX$N[R"]A?*!@P"('G[4KL8_E-&;C)LDDS*JN5"A^+
> MHM+#2TC3$RB77X&YZ"5A":QG\(H,UUY"#UBMJ89LP4Q)DKI=I01#'^HET >;
> MR":IQF@@(&(A?\\H?[D%,%UY/;- ;EMS<@'=*V:!-WVRX_]C6;EA`(^>`LZ8
> M_SULM%M^_+_=ZV+\O]4NXO\O\A3Q_TWQ*N+_1?P_BE<1_R_B_U&\_ECQ?Q;Y
> M!YN.,<0W"5B$\"B#$*19(XJUP/ZJJ3I-@8[D:]5$!-(@FAF.0J[.'SB*HXGP
> M/\]7J^733M'Z'O'P?"/$3ET`/4Q0CV@.8V]PEO.QJ8.38]XLK.HOD&CH2\(L
> M8U]ARZIJXSCF8.,,+& 24)>[3_2!+<0["M6G-)J][SJ=";RT&Q/:&S]S^Z[-
> M.246])P#Z*@*X(L3->C*4"I"XU/JBB6:K,U\<#,F)@Q6CJEH,BIHT=VIXS>;
> MKFFPI^$)-%:8@L(5H\P/X?AP=\3#TUDH,X:<71/Y?HB2.6=]9+V2<N/!XP[X
> M)8GY'M<^C-W!V&3!\"0T,0H7N&4W27A[]058D2-<1PLNNJS^N?'0JC<>>BWR
> MCW]%\_$_CU0]:[,`ODBOS'<%G@6H!U0-X<FKD-L!_5LSF=OIS5F5R"7V)=F"
> MH5E!EP5Y>_WQ2BP1=DK!JF:_0->Y\?2&5Q^S&4*?<85R"O)>MM"LA6;H8'Z6
> M=_&S-)9U&?KP4V<KWH^DTZM/@[/SOC=+X6,@"0S*H4D:9A--P!S@/?'>M&]
> MTY"KJ]->!YQFCP[8.[$>:!&0!QJ.9&6+O6-F;>*)#\T&\W4!U>_B9PE^[%;"
> MZS[QHV532YH86R#"X'+8_[LTNI2N3Z\'ZZ0(8>:38R6R%$<UC$(#S:RXH4P8
> M?KPW.UQ#S)EA$82$P5E4;B2'=W7Q185<U=/+BU'_U]'!.YM.G6H#(R$K@9 O
> MNUYP/P''U>;\#O#\K3W3XGF))SO^$U&0CZ@C*_[3//3C/X?M=INM_VQWB_C/
> M2SQ%_&=3O(KX3Q'_B>)5Q'^*^$\4KS]6_ <?:-29L 0P`,!B,\RM" 5W0$+%
> MJC&#WA,1+D!_$K6=0Y29Z6 ND]B+G#KD:9+,M8:_#DHW%28JCK?%%;>6WIOD
> M1C-4IGN9_^,0)FB>5M7E,05#>VI*K(FXWL8!V55FDBL[-VB!.YK*N[)OW&->
> M;V*[!N.52Y2%;4/9X+9>,R=U)%M$-:EC?(LA"<LR;3>$,Y]77R*=N6]FF1HB
> M-EYR=))H]U1:O5],_P^4IHQ=_=[ABO2;9K?7@4X/M5-@WD1[@%Z_GO<)4AEQ
> M/T,FYTK 8,6%=Q\X$U96QL8X^5[,X-$R?Z4[@1O]S?>-[W"H&0LZ;9T/S/3Q
> M1<7Q105L"A 466']2;:G"^;)H\CR& 3+Z(D?C S7)KFGV-],ARE23F))\B37
> M1+%U,52)&:$8Q003Z\$5`.4,K?0DGH,:2N9XP-F$!;-^"YX:8QB=7/\DG8Y^
> MO093X/1#!8,I\X7N:@>R#3W7(:%@)U]0A=_98BK+4GH=MA:2U^"CS!>NE%];
> M-KVKDM>@`\&7/P()#>?6>M_ULC)+%LL._+ JD/N(?+JX_G1U=3D<]<_@=P77
> M.NQ\):1>IZ".M$E)1(/2XV(QH9ZX'I,GOH7+@\$0T5042P4W[T>Y@ M>JJ4=
> ML(VY&D%Q!,,80[#>7G^F3[G*!4"5.A8.U*!Y_'$%SQZPQ=B"T8UL?OJA,T9;
> MWB)!V<I:C"A*>R2]6+V(Q%]?NX@:7Q+O^XQ10, JV6>L@E=</>1]#JTZ6LFV
> MNN8(H3$,XY60+Y=8;A-=J2B6*8JEBAXB4(FF\A:%?@:IX *[-#8^%:5?$B5@
> M< .\Q1 7C2D);""5U2PJ^JT]W^+!)SO^@R=Q:*JDTHGS//&?9J_3X_&?3J=W
> MB.?_M!K=P^+\EQ=YIKH):ITP?0Z#!%/DJ&?C#E2HDLC787_T:7@!!I@(5G-C
> M7[%I$(T6.:Z&EZ?@#4O@#?_D%^-][/_:/_VY[Q6CF;[A;(>+B.[Q\8N)['BI
> MKN4(MGU4>16HNOQ)9#&\[\0:!7XEX;%%E!*$Y$/9@\D;/VMT*D-D9]/685)#
> M$@!&*,T_8M;S\[-J%/CL\J(OO?_,TB)93B\_7F'21\3EY_ZY9W*-Z50S< "/
> MXS#H\.9:"N<R)+5*:_SDK(.T=BRK(:%32N0@I'9AV(]G(R3V4C@)R8>E!*9!
> MVG>1),X22/B^%,,J;'8C_'V%7YC:+,7R"9/:T13!+$SKI#",'#>[T>0HT[",
> MWK,>79*M_X56>*SR?Y6E_UN-3KLI]'^KR\[_:36:G>+\GQ=YGK8?>,U7WM93
> M3 `4$P!Y:%],`+P<[8L)@,WH]4>;`, NLQY1')O PU[G8,Q5'&FW^"L/0?FQ
> M<&]%$K% B^""E1HY,9CVFLLW:(+CDCT>A50UH>0Y&7%!D1^(JX)TW'N'03)G
> MA6/!E_V%PN@L)T;I&%(8<?-KSL&ZQS,]C!INE^(3#;SI3"D-V&=4W%2'IK#-
> MMR)V:S,-S08(03KHX/T)%"!F)(+UEMZZ4983X"<4`R^1;=Y.E= 'G$<`9)8,
> ML^$'Z,&:97EU<O@RK4UK?'?>0:O3/Z]DZ/;'"^1ZX%?0:B7,+[[5V&*HZOIW
> MF[H+V^!Q_?CB:OZ"M82P;?+N6/1 #3*1%;?*59,?^@\13] 9O;>DTP^K,4GL
> MM$)72H& I'9+H@]4N:-I$*XV3TAG;90XE),,Q@$L9OFG0N#IJVD`IJXN<,]>
> M&HRC367V(0/(63@6-=0TJ"EUL5'F1)67J:5EP@'B`(?K8%7-3BUJZ> :NS00
> MS5(RFL9E-@U(5O^-&#^DL=6"CHVJ47-3&P]$4C&FGM$JQ=53(3 *@X2T9!B:
> MLR%M&RN6-#34[^34HFTW'TTX7+;L<#@<)N:R<Y,#DK+#AK(!D2/JO:RYV:!L
> M+VR6H&!K=)>MSDR77D.=:#H%;94D#4()V!(+MR5T/Z%P.*#H$CD@ISD@%3QU
> M*5>9'#)_F0`)8UL.0$,V3$>GU$HEY;ULB:G--+ %(I=>*<Z23)PDE@B@23H4
> M, [$14ZO:#[6DI2@@,!^-@<+UP0;*T$3>$AG0P(!0+7D&%'2QA,.9-#[E,$@
> MIPX+:(13IY&GOL\&9;8I?_3^[-F&X#PJ-%UCYQP_MS>@Y1J*LDV+[($ASY"?
> MUI_2.T$^<0PD#6?"]XY*\1/LB6=3@V3M[_>'P\LA.#9\.OU):PFBYU<S8&&-
> M1L^U\+<K)!BOPK[]+:U7@4-V#TH!7!6X3$#.\12P&,LVLU#/P,T&Y/TT$RZ0
> M_4S0D.62!]8W?C.!5U5&=MEYP2,6<7;!GF&<"<F44AXB>"9A)FR@>S,E9F7
> MR4-<H?CSM)]IRFS B"6=,\.:09V9;]6NS@F>6TBC5G;>#)ZQG1,^9'/GS!$R
> MO7/U2M\"S]%W`D,\7=#6[/%L%1HQR_-FF.;/$#'2\V;8N ;/9,\+'[+<LUD0
> M-N SH3V[(\<`YAOJV;"37,"!79-=HK#QLP$CIGZ.=N7.$+:T\HR[.4;=J'&V
> M->6\1MLG^@;;,W V&#!R#5J;V2//90ML,F[G-O%R#Y@;6%DY.GRN?KE19U@3
> M\$<Y(2%)?A9?A"_<R>N,A$[VXZN%^:I>;":)'MJ77!8N$%5FLDWV71/7Q8(!
> M_E9\&2]PFN(?'T]^O1X-!Q<_GO<O_L66R[KV,4_C!_CIBK4D9?ZE2OP-R\Q9
> MDM@V")R]"PK!Q;-0&3DFD-FA5OD-E%@EN__<W4U*^MONVL;G/0?7J +P_];R
> MU.SU/]#!GK+XYU7F^L].KROV_W:ZK6ZGA_=_=;J=8OW/2SS%\I]-\2J6_Q3+
> M?Z)X%<M_BN4_4;Q^M\M_"#<[^18KOF.'6WG P=!>TQ6CCWU'H"/L<,"[%5./
> M&55H\*6;>IYQMZ%M!Z6#`>=;B(\R]7RSDN^>DB2@P9VLX]4PS$Z-U(G[H/AU
> M(.R0:[2/L-'<'D>="_8XWB[)*7E4VFF2`Y1=C%-HQH))KF=,V_(]*Z6TTP`@
> M07M/RP#0%$M>`5RA_(H/P/:CY=^&Q9P(MNW*G6HJ^=,Q/Z\,]V#Q4YUQ+X0I
> M;O -G>7M[>2 %MWC;0&XE$MCVU'YKF3("8V_GVG@ZGBP`'@O\T/6>'EX&3"#
> MP_/_^)W "MD5X))K<N=A%T_K$W M#^X!Z>N#!!#L.&JLB-\;S+ 6.PE%N=1!
> MR/TZ@3_0=MY>,/!PTR$TOT'>O"'K!.$`W+':B4H($+2!YUKO<-;A*TA'+%PS
> M=&AV?=^YT:S@PB&'3)E6=/F@S1JJN0[5)QBF2*K5J_/9'))L^U]0]@D>0(;]
> MWSWL-OSU__ /VO^X#:RP_U_@*>S_3?$J[/_"_H_B5=C_A?T?Q>OW;/\KILWO
> M0C3H/9AU?,I2]6VYV)@OC_*+0._*A2P[C)OL'W=N@>7B7Z;R=D<DPO?F,3^_
> M`%[C8Z]E[DP@0-,[C" 67>@&-R%LP9"+XHH0R9B"<8F'+*S>^X*HQIU:$!Q8
> M\/2[,_E1K1ZB?,\"UT[\?D'YAIGA!EA>,]/%'N69MWPCA#!Q:V3@?HL7R<"H
> MPCVS*C]D0]S)P\\#]0\7$ML+3%T76R=$.JL6?1GF\\U@@-+QJ(TM"^"J$\EL
> M?Z^!*8X,.TG#2V'7SK!3M='!`4\3WR&9?3HF;\K"X0&K'L]C\F[P@3Z/YO:Z
> MBU!E>T)"8H4"@9\$1&F'=0!V^#;8%X1M)N"GCR F\%W"[V6&#+MWAWD.J6=W
> M!%?G\&SK4%7"L5^_1.=J<':$TGAU<?*Q?T3VG'\:.%7A'^3!WG *B%W:PT4U
> MLK3,;]DQ6?M8P]Y27?_,.[Q_32CXN^!ZB1% D2UP3,3N']44RH&1S-L+9%F0
> M`ZDJYE-7H8)M0B%)BSE\.8),[&VB^W[13ST1:'5;NS=SMJ+Y7N-R$'9#4OPB
> MOC!18P\PPL%2$E#;PA?WVGO8AG3?:U1>',TD^S_'_F]Q3O3C?8QT_Z]YV&@&
> M_E^WW4+_K]5I%/[?2SR%_[<I7H7_5_A_4;P*_Z_P_Z)X_6[]O\<_V -9U)U[
> M$>A0X.0%+B"^TUR-,NT57,=S%&D&FPD8U"])GQG4&](3IP<&CK/ &K$0$ X\
> M12=GYC:O634-G)G(F=/+W.&93TT4*C??#49^YL<^//,9GTR2\<!1[I'H$[8B
> M!W?4L\5H551TP?T?$\T0F[M19_/[.D!?0'9<(%TC'S30:V(+.NM[LB-*U[R;
> MG#9"&FN:N:YU5*\[,&(H%#MIS;2G];E>YPO)P)*LX\!\<-NISYUIH]%JM6HS
> M=ZX_G4!#4V?[^QT\E99/I&GI:C'(_)1N%',)/-YCZJS>`8]=@=G/DC(WU*0T
> M<!Q 02:ESD!AQ=TLS]:"QR4$E]'OQ6PQ@B)7]]JC=2_N':F&#M(5WT.'I,6E
> M0F^*^:R(?J+ZON-<4P_X#5UXF@-J<9D,+H-K>M?<OC!:+WKO3/AT+W_E))8O
> MD/$\J]LJ>0WHAQ9/>D=CL_%P/D>Q%.,7J,D#'=JE$]6&$=.N)3<X1.]\S?8R
> M@(2I6VJ[.+^LPN>''4"0K96G7S@=0(M\89E5#4P`1.6+=Z<RGZ-?^0F^,K]1
> M/:"DCS*_Y+E9?@VXB]K@3=)P.;4VT: DDIZUY67U>(!:&LU"H>6CA$\F.V3(
> M26^ ] _ 94,=ZXA$F5'EIH8'I @;*11>PA,TV>5OI1W0K?ZQZ3PC+E@WV @J
> MS@EG%T5_BTIY*EM@O+KW>.@VR\I6;&!%;-0U>.GB9'-<CXN?Q0@,6@6[(-\R
> M4L/<]2S1P/EREI5O,Q';=CB)&9R8(,\A0F?!,7<5+XOH14A KQ,)OG\-&!A6
> M%"):Y]U<Q54*JHU$-@:*)Q<O-5,2.=C^IZUTGN@)?A$=XF,8(@%HDJEIJE"C
> M2YVUFZS6.<(7@:S&M44XT1]KR#X4'(I&AA+B;CR/6WZP*@5OH+R#=W3&4 '_
> MR%LD$UHUSN_[&[.[J;!C:&80.5U=/;ZN3P$-Z58<O(SO]NUZXZ+7O6]PJ;MT
> M6XF]''Z]B+C,=CAW:,@F^Z@<HU3VTBK^!>DH+5Y0.GPU? A<:-C@`B]_%D+]
> MPO_;W>Q2]"I3W ?O`O4M.);,,.G_V[O:WK9M(+RO]J\0#'2)$]>SDS3IG";
> MMJ9 L*(=UJZ?[hidden email]:9";,FPI#09\N-WSQU)42^Q[,9KNHX'M'4EBCH>J7LA
> M[\42/A GU?-6XM[K78C949*1-J"<^0^ZVN2U*(1[? P%2?X^"R/O1Z]W_>)%
> MVWOVS-O9:WNW]/D5VH\^#4.$/ULM^_N5+2=I:+?RGE(K.!;E6P5CW4A-\3(2
> MLR#G%@M0<&_F_UG/YO!HX+UY?O+N[/G)RX[\^O7EZ]]^QP9$'$W]A+/;PB&)
> M10#9QY+R'T+B,IC-<%<_3Z/(.B!3VLKTI1;,5(F1T@*=3&25]J[I'V3ZMNF3
> M(=W)YE2UB4=G><*@F$.#;^5)HMH7" .*K[:FFT1)KZ1FJ*NOHL\Z23VRD0F&
> M_CP83L["='I.0HEL)2SG[%%:UMAB(IY)RX!_HB/$X_?WNF2PLD0?CJ *L'62
> MK\/(\U#X+F#.D?1'-T-(_T3"]PU".*N*`Y2*K<!/[W==TK\D*Z'5>/!AO-&[
> M8V0[IG.^B#] CM61DBK)M5J&Z*13]1ZZ*P,/HP0=!2&9_T2#O_UQEWM&1565
> M9N#X2-&C8^_(Q7ZB-A1)3].ZBR+"F(,>Y8U=]/8FRBU'I5I5< $\2P2R%"F)
> M!K0TJ:;2HR#'9/!W]^-Q`H5D'OB\SZ4J,2*!<BBJ%CHKTZY&,RL/L$"Q9T><
> MT*%$+Z(4#442=M/WCQUQ_?2KUV]/!ESH1O6A-WHYOYQ.+1>$5_3U!!>\6]9E
> M/]-<+HECE4NB+VDF:IG_SM=A_D5^4^0PN3D3,?=HT/QFT)?,ZJJ,2[91HTJY
> MJ/FD+@*<O75EL2<(X)W3M<D-?_J];B49Q/+E03>J.#,UZ2DF^:B]F$_"9+I#
> M4;-4=B?LB\)^20E?(3BUU!RWEA*,FONO28\KVR)0=:FYI87#+&&K9$USCN[O
> MG/=&&L;!1>B/Y8YY-_K.W6IO9AV:5NWO=>%D=%BP@FX?I:LMHHYG4^[_$^[J
> MH #U_A_$,50I\'\G_W]_=_?@0-?_W=_?Y_J__1U7__>K@//_6!4OY__A_#^*
> M>#G_#^?_4<3KF_7_\%XJNX&=Q>44@PU2^E94J&+18B@=>N!(Z_1Y17G$8.I?
> MT<PG5X=+1JA"%?:1]?^6/M+1+>FFMS/\E= ?KCYX2^\:GW(VIV9C'.4R'GF;
> M/R9<//&G7)5?Y>"N$T%U>5?/U%1E]5DV:.)1$'@2T=QL\.=%F! )`C\DA:>L
> M<;\/[0J+YD^J"RXF5]WDZHS&H7^F_-M$G.HBB$-Q5Y]E5\ROY!,2,+(:3U;(
> M+.V(L2,$SVR_W!R._8]!*,* 7C@OSQ^NLBVP&82S-%%!W.N+(UXYU#8?3TOC
> MX;VZO]Z<_#F0:8O]&;SWHSE/'K9 F%'0%"' -S=+IXDWI:6 *A!DM%V(*)"-
> M3IE">PZ]S1:_14).A+9,D_9AHZ&]Y]>@%(')T;Q,(UY5^DN;H6A#!\-+S^,D
> M2* ;X'[5-\9,Y/0C-28[<,)[]D-O'GWF0LR<I0E!"G):9U^1UDH:J@(:5T&4
> MQG@XMZ<X]V?@CUQ4>HOHZ$53NLL?'>]A,C,(QNCCW!\-91+\&RY%,9Q@F=Y@
> MB_DQS,CZDA%K8:IO<XYGJ-RMUWG,4@GQ+ERD&L$M?!S/&Z X"S%]1*3"<.A(
> M@LU0=9+.8O5S-+\T/E1F]SV.)FEBJ3SK8,@U_)7/I8F95?%9$G4CYK-GB1=-
> MD%84W//0W.+0>;[#3(5_F%_X%O$]VUD:^CM/D9P!3''BA[R/\F7,O$&/2]QZ
> M+5M?AJDW&PI!ZG9[^P/UO/$^W%#<R'!:<):,#'S<3EAL'WFQ8MK2R39=[7AJ
> M3]0C-IBT31<0(!RT#U96>N?6QF'5Y=L-#K8!TZA_H?6^5+V01Y%GD'J"EAL$
> M]VF>O\\8+%0LR:0QFLU6Q\CJ9EV(60A]`3[WPZ-VDM/;TIP4Q3C?*LARZX4?
> MCC;>]S9R@9%G(_K.U(OX<\B6.=PS].H]E!L2U&B(I:_*Y>*DJ+N\\(ZR17BX
> MAC!'!0OE'PL@?2BE+V?JS$64D%4=3=')(N[8A7B44SZ()"(,D2.%)Q4)1.+<
> M07RIS.Q0B4XR[&:IRL"QD.^V.R+JE*A$)TI:CL5LV9*3T.7,8"4WO.;]Z;JD
> MX#B#/'L0Z<&_U.GD6F1)0W5VM%BN7*QD+_P7),NW)5C@ER$3L;U]N(S($(^Y
> M+Q<:;'/(&[UCKZ>=#Q=TN*7.AN4A[L->/.)HZ"T>3%[,K#Z&!\&]]*H5WV2_
> MJ'[A?4_"[J&WWQ\<[C[_(6-R-KG'J4\&-?F?=I\\4?6?#_9ZN[O]'WK]_=[!
> M@3O_^1I \O07LG)#:!0#I+0CE8J/,#)OJS0T*10RTYL=L$F5XU,"<6_!TTI?
> M&L'>G@]9OXOIB6X3Z0-4,(2]'=6&;B9)%K#9H')1!#&[RVL-$GY6\K!_'=!T
> MP?F-=YNM;!I=WAI@]1$Y]43%"[ 'R'GZ!)6I/X3CU@3Q^5FN`7;H:G)RD(O-
> M%O^OU6[JW'AWY^3(/-2QR521W$)VZDQ"/"NK=NRIVI;"'9$ZKLJK/5?,4A"T
> M:-?Z8QXE$<*.6,?654=5S^\0?T??<:L&C^L@L=#868"&*;NY'!XJA_H"1'0"
> M%TX_P_OW.I2IA20&+=Z8::G,#AVL(*70>KL2"B?I)-!JCR_@J8H!Z.07=9B7
> M\;D;]R#RN$A+.H%;J F !):*F$\&,#J45WG.D1"VC;DC+NOFJ?W"4QB$"23*
> M/6,>.2B]:!J9UY0>JZ!/$&5#J262/?!%!.)D)I[EBM2U)_"IWFQ'60\O)I,*
> M@5,TD3\KG\Q9RI$QXTD5PDE\J3NNQ==JNPC=\^'HDG]9Z^RHWQL(GK K)KYU
> MHS\0/(<A"&(17[]+)BV+A>5N`<H!4-=#[F05N@P*D+D5HS;W:\=L#68!&P!N
> MULIM>I9S(IA^!8HZ%TC3D *L:W=0$^"7:[TW6"%4+O<D?5$U`5ZYYON#Y<*)
> M*K+.<)AD+:.C1D4"2S]^.#;RA$MZ+4C XL"!`P<.'#APX,"!`P<.'#APX,"!
> 9`P<.'#APX,"!`P<.'*P!_@%>`SSL`/ `````
> `
> end
>
>
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
>  Mathieu:
> Just to throw some ideas :
>
> Which protection against concurrency between multiple data sources
> do you use
> (SMP, process vs interrupt) ?
>
> I have looked at the CVS SystemTAP code : the _stp_printk function
> is all but
> atomic.

I didn't use protection mechanism in my tapsets.

The systemtap will use a per-cpu buffer to store the data to be
printed. When using relayfs mode, process running on a CPU will print
the data into the buffer related to this CPU itself. For each probe
handler, systemtap will use atomic operations(atomic_read, atomic_inc
etc) to avoid reentrancy.

Anyway, I think systemtap itself should handle the concurrency issue.
Am I right?


Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
In reply to this post by Li Guanglei
>
> ----------- tapsets/scsi.stp -------------
> [...]
>
> /* mid-layer prepare a IO request */
> probe addevent.scsi.ioentry
>    = module("scsi_mod").function("scsi_prep_fn")
> [...]

Jose mentioned that such probe definitions will work only if scsi_mod
is compiled as a module. But many people compiled the scsi_mod into
the kernel, then the module("scsi_mod").function(...) can' work for
this case.

Is there a way to let scsi.stp works for both cases?


Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
In reply to this post by Li Guanglei

----- Original Message -----
From: "Guanglei Li" <[hidden email]>
To: "Mathieu Desnoyers" <[hidden email]>
Cc: <[hidden email]>; <[hidden email]>
Sent: Sunday, December 18, 2005 8:59 PM
Subject: Re: [TAPSETS] Linux Kernel Event Trace Tool


>>  Mathieu:
>> Just to throw some ideas :
>>
>> Which protection against concurrency between multiple data sources
>> do you use
>> (SMP, process vs interrupt) ?
>>
>> I have looked at the CVS SystemTAP code : the _stp_printk function
>> is all but
>> atomic.
>
> [...]
> Anyway, I think systemtap itself should handle the concurrency
> issue. Am I right?
>
>
after thinking Mathieu's idea again, the touble may happen if a probe
handler makes mutiple calls to _stp_printf and during its execution,
it is migrated to another CPU.
_stp_printf will get the current CPU id every time it is called.

If calling _stp_printf only one time in the probe handler, such issues
should be avoid. Am I right?

and after looked into the generated c code (stap -p3), I saw that
_stp_print_flush() will be called at the end of every probe handler.
the comments of _stp_print_flush() says this is a must. But each probe
handler of my tapsets will print only a single row of data. calling
_stp_print_flush() for every single line of data will be expensive.
Any way to avoid this? such as not calling _stp_printf at end of probe
handlers. The print buffer of each cpu is 8K which could be able to
holds lots of lines of data.


Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Mathieu Desnoyers-2
* Guanglei Li ([hidden email]) wrote:

>
> ----- Original Message -----
> From: "Guanglei Li" <[hidden email]>
> To: "Mathieu Desnoyers" <[hidden email]>
> Cc: <[hidden email]>; <[hidden email]>
> Sent: Sunday, December 18, 2005 8:59 PM
> Subject: Re: [TAPSETS] Linux Kernel Event Trace Tool
>
>
> >> Mathieu:
> >>Just to throw some ideas :
> >>
> >>Which protection against concurrency between multiple data sources
> >>do you use
> >>(SMP, process vs interrupt) ?
> >>
> >>I have looked at the CVS SystemTAP code : the _stp_printk function
> >>is all but
> >>atomic.
> >
> >[...]
> >Anyway, I think systemtap itself should handle the concurrency
> >issue. Am I right?
> >
> >
> after thinking Mathieu's idea again, the touble may happen if a probe
> handler makes mutiple calls to _stp_printf and during its execution,
> it is migrated to another CPU.
> _stp_printf will get the current CPU id every time it is called.
>

This will happend on systems where preemption is enabled if you do not disable
preemption. You could save the cpu_id() once for the current logging, and then
keep the same until the end. The side-effect is that, sometimes, a CPU will
write in another cpu's buffer.

> If calling _stp_printf only one time in the probe handler, such issues
> should be avoid. Am I right?
>

If preemption is not disabled or interrupts disabled, you will have issues with
_stp_printf.

RelayFS does specify that it's up to the client to deal with locking. From the
_stp_printf function :

runtime/print.c:

#ifndef STP_RELAYFS
....
#else

#define _stp_printf(args...) _stp_sprintf(_stp_stdout,args)

#endif

runtime/string.c:_stp_sprintf()

Think that :

this function can be migrated between cpus (no preempt_disable)
this function can be reentered by a different thread (no preempt_disable)
this function can be reentered by a softirq handler (no bh_disable).
this function can be reentered by an IRQ handler, if they are probed (no irq
disable)
this function can be reentered by an NMI, if they are instrumented.

void _stp_sprintf (String str, const char *fmt, ...)
{
  int num;
  va_list args;
  if (str == _stp_stdout) {
    int cpu = smp_processor_id();
    char *buf = &_stp_pbuf[cpu][STP_PRINT_BUF_START] + _stp_pbuf_len[cpu];
    int size = STP_PRINT_BUF_LEN -_stp_pbuf_len[cpu] + 1;
                        /* This does not work with multiple writers :
                         * _stp_pbuf_len might have changed. */
    va_start(args, fmt);
    num = vsnprintf(buf, size, fmt, args);
            /* You write to a buffer position.. while doing that, you might be
             * interrupted by the sources listed above. If you didn't increment
             * the _stp_pbuf_len[cpu] before, then you will get corruption. */
             
    va_end(args);
    if (unlikely(num >= size)) {
      /* overflowed the buffer */
      if (_stp_pbuf_len[cpu] == 0) {   /* value could have changed... */
        _stp_pbuf_len[cpu] = STP_PRINT_BUF_LEN;
        _stp_print_flush();
      } else {
        *buf ='\0';
        _stp_print_flush();
        va_start(args, fmt);
        _stp_vsprintf(_stp_stdout, fmt, args);
        va_end(args);
      }
    } else {
      _stp_pbuf_len[cpu] += num;
            /* The position is only incremented after the write. Furthermore,
             * it's not an atomic operation : if it's interrupted between the
             * read and the write, you will lose an event. */
    }

  } else {
    va_start(args, fmt);
    num = vscnprintf(str->buf + str->len, STP_STRING_SIZE - str->len, fmt, args);
        /* once again, not reentrant for str. */
    va_end(args);
    if (likely(num > 0))
      str->len += num;
  }
}


This is just to show that _stp_sprintf is not reentrant as said. Or maybe did I
miss an interrupt disable taken before that ? (it couldn't ensure NMI
reentrancy, but it's better than nothing)

What LTTng actually does is to provide a secure way to log to RelayFS buffers
from _any_ context, without doing anything more than preemption disabling and
atomic cmpxchg buffer space reservation.


> and after looked into the generated c code (stap -p3), I saw that
> _stp_print_flush() will be called at the end of every probe handler.
> the comments of _stp_print_flush() says this is a must. But each probe
> handler of my tapsets will print only a single row of data. calling
> _stp_print_flush() for every single line of data will be expensive.
> Any way to avoid this? such as not calling _stp_printf at end of probe
> handlers. The print buffer of each cpu is 8K which could be able to
> holds lots of lines of data.
>
>

What I see of the _stp_sprintf code shows me that it's flushed when the buffer
is full. I didn't look much further though, so the implementers should answer
better than me to this question.

Mathieu


OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Frank Ch. Eigler
Hi -

On Sun, Dec 18, 2005 at 12:34:34PM -0500, Mathieu Desnoyers wrote:

> [...]  If preemption is not disabled or interrupts disabled, you
> will have issues with _stp_printf.  [...]

Looking in stp_printf for reentrancy prevention is a mistake.  The
reentrancy treatment is supposed to be provided by the probe point
providers: kprobes or the other entry interfacing functions.  (Not all
of them do the right thing yet.)  The runtime functions callable from
probes should assume atomic calls: no reentrancy, no interrupts, and
so on.

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Mathieu Desnoyers-2
* Frank Ch. Eigler ([hidden email]) wrote:

> Hi -
>
> On Sun, Dec 18, 2005 at 12:34:34PM -0500, Mathieu Desnoyers wrote:
>
> > [...]  If preemption is not disabled or interrupts disabled, you
> > will have issues with _stp_printf.  [...]
>
> Looking in stp_printf for reentrancy prevention is a mistake.  The
> reentrancy treatment is supposed to be provided by the probe point
> providers: kprobes or the other entry interfacing functions.  (Not all
> of them do the right thing yet.)  The runtime functions callable from
> probes should assume atomic calls: no reentrancy, no interrupts, and
> so on.
>
> - FChE
>

So it leaves the locking to the probe provider. You know that it limits the set
of instrumentable kernel functions to the ones that can be protected by
disabling interrupts.

Page faults and NMI handlers are two examples where this locking scheme fails.
All traps, exceptions and fault handlers cannot then be safely instrumented, or
at the cost of doing each one a special case (for each architecture).

If someone's goal is to trace the kernel with SystemTAP probes, these
constraints could be unacceptable.

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Frank Ch. Eigler

Mathieu Desnoyers <[hidden email]> writes:

> > [...] The runtime functions callable from
> > probes should assume atomic calls: no reentrancy, no interrupts, and
> > so on.

> So it leaves the locking to the probe provider. You know that it
> limits the set of instrumentable kernel functions to the ones that
> can be protected by disabling interrupts.

Not exactly.  Those functions could still in theory be instrumented,
but an actual run-time reentry into the probing system would be
detected, the probe short-circuited, and a "missed probe" would be
recorded.

> Page faults and NMI handlers are two examples where this locking
> scheme fails.  All traps, exceptions and fault handlers cannot then
> be safely instrumented,

I am probably missing something.  Could you sketch out a scenario that
causes a safety problem?

> or at the cost of doing each one a special case (for each
> architecture). [...]

Depending on what's involved, writing per-architecture support is not
necessarily beyond our means.

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Mathieu Desnoyers-2
* Frank Ch. Eigler ([hidden email]) wrote:

>
> Mathieu Desnoyers <[hidden email]> writes:
>
> Not exactly.  Those functions could still in theory be instrumented,
> but an actual run-time reentry into the probing system would be
> detected, the probe short-circuited, and a "missed probe" would be
> recorded.
>
> > Page faults and NMI handlers are two examples where this locking
> > scheme fails.  All traps, exceptions and fault handlers cannot then
> > be safely instrumented,
>
> I am probably missing something.  Could you sketch out a scenario that
> causes a safety problem?
>

Sure,


Let's think someone is probing a device which uses an NMI watchdog, and is
interested to see if each event is really delivered to the CPU. In that
scenario, if the tracer discards one event, then the user would have to restart
the tracing.

Now, let's think that the NMI is not the only asynchronous source of
non-maskable interrupts : spurious interrupts could be another example or page
faults (internal source though).

If the person is interested into tracing the whole system (process events,
softirqs, interrupts and NMI) to see the root of a problem, then a "event lost"
will happen as soon as an NMI tries to reenter into the logging code. That will
leave an incomplete trace.

The cleanest way I found to log NMIs without losing event is to make the
logging code reentrant by itself by using atomic operations.

Mathieu



OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
> If the person is interested into tracing the whole system (process
> events,
> softirqs, interrupts and NMI) to see the root of a problem, then a
> "event lost"
> will happen as soon as an NMI tries to reenter into the logging
> code. That will
> leave an incomplete trace.
>
>  Mathieu Desnoyers <[hidden email]>

> FChE
> Not exactly.  Those functions could still in theory be instrumented,
> but an actual run-time reentry into the probing system would be
> detected, the probe short-circuited, and a "missed probe" would be
> recorded.

So how about using semaphore in probe handler or spin lock in
_stp_sprintf. So that no event will be lost. The current
implementation will just abandon the processing of a probe handler if
there's already another probe handler running, even on a different
CPU.

> Mathieu Desnoyers
>This will happend on systems where preemption is enabled if you do
>not disable
>preemption. You could save the cpu_id() once for the current logging,
>and then
>keep the same until the end. The side-effect is that, sometimes, a
>CPU will
>write in another cpu's buffer.

But I have no way to pass cpuid into _stp_printf. So for my tapsets,
is there a good way to solve multiple calls to _stp_printf besides
print all stuffs using a single call of _stp_printf?




Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Frank Ch. Eigler
In reply to this post by Mathieu Desnoyers-2

Mathieu Desnoyers <[hidden email]> writes:

> [...]
> > I am probably missing something.  Could you sketch out a scenario that
> > causes a safety problem?

> Let's think someone is probing a device which uses an NMI watchdog,
> and is interested to see if each event is really delivered to the
> CPU. In that scenario, if the tracer discards one event, then the
> user would have to restart the tracing.  [...]

OK I understand, but that's not a safety problem.

> [...]  The cleanest way I found to log NMIs without losing event is
> to make the logging code reentrant by itself by using atomic
> operations.

This approach may not work so easily for systemtap, since scripts have
a lot more local state (context) than a single tracing function call.
Plus they sometimes hold locks, during which reentrancy is bad.


"Guanglei Li" <[hidden email]> wrote:

> So how about using semaphore in probe handler or spin lock in
> _stp_sprintf. So that no event will be lost.

Whom would such a spinlock protect against?  There is intended to be
no locking that synchronizes separate CPUs running probes, except when
these scripts access shared ("global"-declared) systemtap variables.

> The current implementation will just abandon the processing of a
> probe handler if there's already another probe handler running, even
> on a different CPU.

Why do you think so?  Reentrancy is assessed on a per-cpu basis.
Concurrency across CPUs is not explicitly blocked, except as above.

> > This will happend on systems where preemption is enabled if you do
> > not disable preemption. [...]

(We do need to disable preemption during probe handler execution.)

- FChE
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Li Guanglei
>> So how about using semaphore in probe handler or spin lock in
>> _stp_sprintf. So that no event will be lost.
>
> Whom would such a spinlock protect against?  There is intended to be
> no locking that synchronizes separate CPUs running probes, except
> when
> these scripts access shared ("global"-declared) systemtap variables.
> - FChE

protect against _stp_pbuf[cpu] & _stp_pbuf_len[cpu]. But as you
mentioned, it may not be so easy for systemtap.


>> The current implementation will just abandon the processing of a
>> probe handler if there's already another probe handler running,
>> even
>> on a different CPU.
>
> Why do you think so?  Reentrancy is assessed on a per-cpu basis.
> Concurrency across CPUs is not explicitly blocked, except as above.
> - FChE
sorry, I missed the following statement:
struct context* c = per_cpu_ptr (contexts, smp_processor_id());
so the context is a per cpu variable.


Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Mathieu Desnoyers-2
In reply to this post by Li Guanglei
* Guanglei Li ([hidden email]) wrote:

> >If the person is interested into tracing the whole system (process
> >events,
> >softirqs, interrupts and NMI) to see the root of a problem, then a
> >"event lost"
> >will happen as soon as an NMI tries to reenter into the logging
> >code. That will
> >leave an incomplete trace.
> >
> > Mathieu Desnoyers <[hidden email]>
>
> >FChE
> >Not exactly.  Those functions could still in theory be instrumented,
> >but an actual run-time reentry into the probing system would be
> >detected, the probe short-circuited, and a "missed probe" would be
> >recorded.
>
> So how about using semaphore in probe handler or spin lock in
> _stp_sprintf. So that no event will be lost. The current
> implementation will just abandon the processing of a probe handler if
> there's already another probe handler running, even on a different
> CPU.
>

Semaphore and spin locks needs to explicitely disable interrupts to insure
no deadlock occurs when a resource is shared with interrupt handlers.

NMIs, by nature, cannot be disabled.


Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Mathieu Desnoyers-2
In reply to this post by Frank Ch. Eigler
* Frank Ch. Eigler ([hidden email]) wrote:

>
> Mathieu Desnoyers <[hidden email]> writes:
>
> > [...]  The cleanest way I found to log NMIs without losing event is
> > to make the logging code reentrant by itself by using atomic
> > operations.
>
> This approach may not work so easily for systemtap, since scripts have
> a lot more local state (context) than a single tracing function call.
> Plus they sometimes hold locks, during which reentrancy is bad.
>

I understand. My suggestion is to use a reentrant mechanism when the goal is to
log information to a buffer without preprocessing. I understand that SystemTAP
preprocessing might not be easily doable atomically.

Mathieu

OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
Reply | Threaded
Open this post in threaded view
|

Re: [TAPSETS] Linux Kernel Event Trace Tool

Ananth N Mavinakayanahalli-2
In reply to this post by Frank Ch. Eigler
On Tue, Dec 20, 2005 at 08:45:21AM -0500, Frank Ch. Eigler wrote:
>
> Mathieu Desnoyers <[hidden email]> writes:
 
...

> > > This will happend on systems where preemption is enabled if you do
> > > not disable preemption. [...]
>
> (We do need to disable preemption during probe handler execution.)

If you are referring to kprobe handler execution, preemption is already
disabled in kernel as soon as we enter the kprobe infrastructure and
reenabled only after single-stepping. Systemtap doesn't have to disable
it again.

Ananth