Josh Stone reported a failure as follows:
----- ... 'kernel.function("__switch_to").return'. This one is a problem with kretprobes only, as all of my other probes in __switch_to behaved just fine, even in the middle of the function. Running this gave "Kernel BUG at kprobes:449" (the full dump is included below). The line mentioned is in trampoline_probe_handler: BUG_ON(!orig_ret_address || (orig_ret_address == trampoline_address)); ----- The problem is probably that kretprobe_instance objects are hashed by the current task pointer. Upon entry to __switch_to(), the object is placed on the list for the "prev" task, but upon return it's sought on the list for the "next" task. If this indeed the problem, then: 1. Return probes on __switch_to should be blacklisted in the SystemTap translator and kprobes unless and until a fix is found. 2. A fix in kprobes would presumably require kprobes to notice (either at registration time or at function entry) that we're probing __switch_to(). Upon function entry, we'd have to invoke architecture-specific (and potentially version-dependent) code to grab the next-task pointer out of the arg list and hash on that. By the way, context_switch() should NOT be a problem, because it's inline and kprobes doesn't support return probes on inline functions. -- Summary: return probe on __switch_to triggers BUG_ON Product: systemtap Version: unspecified Status: NEW Severity: normal Priority: P1 Component: kprobes AssignedTo: systemtap at sources dot redhat dot com ReportedBy: jkenisto at us dot ibm dot com http://sourceware.org/bugzilla/show_bug.cgi?id=2068 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. |
------- Additional Comments From jkenisto at us dot ibm dot com 2005-12-19 23:41 ------- My previous analysis is incorrect. First of all, return probes on __switch_to() seem to work fine in the kprobes and SystemTap tests that I've run. Second, as Roland pointed out even before I created this PR, the stack switch happens before __switch_to() is called. So the value of current is the same (i.e., the same as next_p) on both entry and return, and the same hash list is used both times. Maybe Josh can provide a repeat-by script that demonstrates the problem. Otherwise, I'll change this to WORKSFORME. -- What |Removed |Added ---------------------------------------------------------------------------- CC| |joshua dot i dot stone at | |intel dot com Status|NEW |ASSIGNED Priority|P1 |P2 http://sourceware.org/bugzilla/show_bug.cgi?id=2068 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. |
In reply to this post by glaubitz at physik dot fu-berlin.de
------- Additional Comments From joshua dot i dot stone at intel dot com 2005-12-20 19:38 ------- I can repeat this without fail with this simple command: # stap -e 'probe kernel.function("__switch_to").return{}' I am running the 2.6.9-24.ELsmp kernel, on x86_64. I tried this also on the i686 kernel, and did not trigger the BUG_ON, so it may be specific to x86_64. -- http://sourceware.org/bugzilla/show_bug.cgi?id=2068 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. |
In reply to this post by glaubitz at physik dot fu-berlin.de
------- Additional Comments From jkenisto at us dot ibm dot com 2005-12-20 23:43 ------- Some ppc64 data, only partially analyzed... I tracked __switch_to calls and returns on a ppc64 system. Here __switch_to does change the value of current (it matches prev on entry and new on return), but the BUG_ON never gets triggered. I also consistently see more calls than returns; and it's apparently not entirely (or even mostly) due to running out of kretprobe_instance objects or the entry probe getting installed before the return probe. Here are results from various runs: run# maxactive nmissed ncalls nret ncalls-(nret+nmissed) 1 5 112 209 92 5 2 50 0 293 267 26 3 50 0 406 364 42 4 50 0 759 717 42 5 50 0 1581 1535 46 6 50 173 7308 7055 80 Following the activity on a particular CPU, I occasionally see multiple consecutive calls with no intervening returns. -- http://sourceware.org/bugzilla/show_bug.cgi?id=2068 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. |
In reply to this post by glaubitz at physik dot fu-berlin.de
------- Additional Comments From jkenisto at us dot ibm dot com 2005-12-21 00:35 ------- Comment #1 ("works for me") refers only to i686. Had my blinders on. Like Josh, I can see this fail (insmod -> hung system) on x86_64. I'm using a hand-written C module; no need to involve SystemTap. -- http://sourceware.org/bugzilla/show_bug.cgi?id=2068 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. |
In reply to this post by glaubitz at physik dot fu-berlin.de
-- What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|systemtap at sources dot |jkenisto at us dot ibm dot |redhat dot com |com http://sourceware.org/bugzilla/show_bug.cgi?id=2068 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. |
Free forum by Nabble | Edit this page |