RE: Review patches of user space kprobe

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
Parsanna,

I read the patches and have some comments. Generally, I think user space
kprobe is more useful than kernel kprobe, and it will attract far more
people who work on applications.

I wrote down inline comments for patch 1/3 below.

Yanmin

>>This patch adds two new interfaces to insert and remove userspace
probes.

>>This new interface uses two userspace probe objects.
>>Objects:
>>-------
>>Each user space probes is uniquely identified by the
>>combination of inode and offset. New user space probes defines
>>two objects
>>1. struct uprobe - per probe
>>2. struct uprobe_module - per application that has probes on
>>   it. This is allocated internally per application.
>>
>>
>> struct uprobe {
>> /*path of the application */
>> char name[50];
>> /*includes kprobe structure */
>> struct kprobe kp;
>> /*list of probe per module */
>> struct hlist_node ulist;
>> /*inode of the application */
>> struct inode *inode;
How about using a pointer to uprobe_module to locate the inode? Although
the storage space is same, it's better to keep all the same info of
uprobes in uprobe_module. Uprobes just need keep its own specific info.


>> /*offset within the page */
This comment is incorrect. Functions, such like
register_userspace_probe, use it as the offset from the beginning of the
file, not a page of the file. Right? Although your example uses it as an
offset of page. Frank's reply said this item could be deleted. I don't
think he is correct because uprobe->kp.addr is used when calling
register_kprobe.


>> unsigned long offset;
>> /*page containing probes */
>> struct page *page;
>> /* virtual memory area containing the probes */
>> struct vm_area_struct *vma;
Item page and vma are used temporarily. They are not used when the probe
is hit. To shrink struct uprobe, they could be deleted. A new super
struct could be defined to include struct uprobe, or always to search
them when they are needed. Application might be inserted huge uprobes.
It's better to keep the struct smaller.


>> };
>>
>> struct uprobe_module {
>> struct hlist_head ulist_head;
>> /* hlist head containing list of probes within this
module */
>> struct list_head mlist;
>> /* list of uprobes_modules */
>> struct inode *inode;
Item inode could be deleted. Use uprobe_module->nd.dentry->d_inode.


>> /* inode of the application on which probes are
implanted */
>> struct nameidata nd;
>> /* to hold path/dentry etc. */
Struct nameidata looks a little big. A pointer to dentry might be
enough.



>> };
>>
>>Interfaces :
>>------------
>>Two new interfaces are defined to insert and remove user
>>space probes
>>register_user space_probe(struct uprobe *);
>>unregister_user space_probe(struct uprobe *);
>>
>>register_user space_probe() accepts pointer to struct uprobe.
>>User has to allocate uprobes structure and initialize following
>>elements
>> name - contains the path of the application;
>> offset - offset of the probe within the page;
>> kp.addr - virtual address within the executable.
>> kp.pre_handler - handler to be executed when probe is fired.
>> kp.post_handler - handler to be executed after single stepping
>>  the original instruction.
>> kp.fault_handler- handler to be executed if fault occurs while
>>  executing the original instruction or the
>>  handlers.
>>
>>page and vma are used internally by register_user space_probe().
>>
>>unregister_user space_probe(struct uprobe *);
>>
>>register_user space_probe() first walks through the given path of the
>>application and gets the inode of the application where the probes are

>>to be inserted. Struct uprobe_module is allocated for the each
>>application on which probes are to be inserted. If the probes are to
>>be inserted on the application which has probes earlier on them, then
>>the same struct uprobe_module is used. Also uprobe structure is added
>>to ulist_head within the uprobe_module. Then walk through the list of
>>process private mappings and get the virtual memory area containing
the
>>offset. Get the reference to the hashed page containing the offset. If

>>the page is found, it should be checked if it is uptodate and lock the

>>page in the memory so that the page is not swapped out. Now map the
>>page; insert the probe and then unmap the page. Release the page lock.

>>Thus probes are inserted in the pages that are already read into the
>>memory at the time of registration.
>>
>>unregister_user space_probe() get the vma and the page containing the
>>offset of the user space probes. Maps the page containing the probes
>>before replacing the int3/breakpoint instruction with the original
>>instruction and then unmap the page. Remove the uprobe structure from
>>ulist_head and check if the uprobe is the last one within this module
>>to be unregistered. If the uprobe is the last one within the
application

>>to be unregistered, remove the uprobe_module from the
>>uprobe_module_list. If there are no references to the uprobe_module
>>struct free it.
>>
>>Usage:
>>Usage is similar to kprobe.
>> /* Allocate a uprobe structure */
>> struct uprobe p;
>>
>> /* Define pre handler */
>> int handler_pre(struct kprobe *p, struct pt_regs *regs)
>> {
>> <.............collect useful data..............>
>> }
>>
>> void handler_post(struct kprobe *p, struct pt_regs *regs,
>> unsigned long
flags)
>> {
>> <.............collect useful data..............>
>> }
>>
>> int handler_fault(struct kprobe *p, struct pt_regs *regs, int
trapnr)
>> {
>> <.............collect useful data..............>
>> }
>>
>> While instering the probe, specify the patchname of the
application

>> on which probes are to be inserted.
>>
>> char pname[] ="/tmp/MOD/appln";
>>
>> /* pre_handler */
>> p.kp.pre_handler=handler_pre;
>> /* pre_handler */
>> p.kp.post_handler=handler_post;
>> /* pre_handler */
>> p.kp.fault_handler=handler_fault;
>> /* Secify the offset within the page*/
>> p.offset = (kprobe_opcode_t *)0x4d4;
>> /* Secify the address within the application/executable */
>> /* $nm appln |grep func1 */
>> p.kp.addr = (kprobe_opcode_t *)0x80484d4;
>> /* copy the string name to p.name */
>> strcpy(p.name, pname);
>> /* Now registe the userspace probe */
>> register_userspace_probe(&p);
>>
>>
>> /* To unregister the registered probed, just call..*/
>> unregister_userspace_probe(&p);
>>
>>Signed-of-by: Prasanna S Panchamukhi <[hidden email]>
>>
>>
>>---
>>
>> linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c |   14 +
>> linux-2.6.13-prasanna/fs/namei.c                 |   12
>> linux-2.6.13-prasanna/include/linux/kprobes.h    |   25 +
>> linux-2.6.13-prasanna/include/linux/namei.h      |    1
>> linux-2.6.13-prasanna/kernel/kprobes.c           |  303
++++++++++++++++++++++-
>> 5 files changed, 344 insertions(+), 11 deletions(-)
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-newinterface
kernel/kprobes.c
>>---
linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-newinterface
2005-09-14 11:00:19.987408280 +0530
>>+++ linux-2.6.13-prasanna/kernel/kprobes.c 2005-09-14
11:00:53.943246208 +0530

>>@@ -37,6 +37,7 @@
>> #include <linux/init.h>
>> #include <linux/module.h>
>> #include <linux/moduleloader.h>
>>+#include <linux/namei.h>
>> #include <asm/cacheflush.h>
>> #include <asm/errno.h>
>> #include <asm/kdebug.h>
>>@@ -46,6 +47,7 @@
>>
>> static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
>> static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
>>+static struct list_head uprobe_module_list;
>>
>> unsigned int kprobe_cpu = NR_CPUS;
>> static DEFINE_SPINLOCK(kprobe_lock);
>>@@ -417,7 +419,11 @@ static int register_aggr_kprobe(struct k
>> /* kprobe removal house-keeping routines */
>> static inline void cleanup_kprobe(struct kprobe *p, unsigned long
flags)

>> {
>>- arch_disarm_kprobe(p);
>>+ if (!kernel_text_address((unsigned long)p->addr)) {
>>+ struct uprobe *up = container_of(p, struct uprobe, kp);
>>+ arch_disarm_uprobe(up);
>>+ } else
>>+ arch_disarm_kprobe(p);
>> hlist_del(&p->hlist);
>> spin_unlock_irqrestore(&kprobe_lock, flags);
>> arch_remove_kprobe(p);
>>@@ -434,30 +440,71 @@ static inline void cleanup_aggr_kprobe(s
>> spin_unlock_irqrestore(&kprobe_lock, flags);
>> }
>>
>>+/*
>>+ * This routine check if the probe already exits at the given offset
and inode.

>>+ * Returns the maching pointer, if found.
>>+ */
>>+static struct kprobe *get_kprobe_user(struct kprobe *kp)
>>+{
>>+ struct uprobe *up = container_of(kp, struct uprobe, kp);
>>+ struct hlist_head *head;
>>+ struct hlist_node *node;
>>+ struct kprobe *p;
>>+
>>+ head = &kprobe_table[hash_long((unsigned long)up->inode *
up->offset,
>>+       KPROBE_HASH_BITS)];
>>+ hlist_for_each_entry(p, node, head, hlist) {
>>+ struct uprobe *user = container_of(p, struct uprobe,
kp);
>>+ if (user->inode == up->inode && user->offset ==
up->offset)

>>+ return p;
>>+ }
>>+ return NULL;
>>+}
>>+
>>+static kprobe_opcode_t *insert_kprobe_user(struct kprobe *p)
>>+{
>>+ kprobe_opcode_t *addr;
>>+ struct uprobe *up = container_of(p, struct uprobe, kp);
>>+ addr = (kprobe_opcode_t *)
>>+ ((unsigned long)(up->inode) * (unsigned
long)(up->offset));

>>+ if ((up->page) && (PageUptodate(up->page))) {
>>+ arch_copy_kprobe(p);
>>+ arch_arm_uprobe(up);
>>+ }
>>+ return addr;
>>+}
>>+
>> int register_kprobe(struct kprobe *p)
>> {
>> int ret = 0;
>> unsigned long flags = 0;
>> struct kprobe *old_p;
>>+ kprobe_opcode_t *addr;
>>
>> if ((ret = arch_prepare_kprobe(p)) != 0) {
>> goto rm_kprobe;
>> }
>> spin_lock_irqsave(&kprobe_lock, flags);
>>- old_p = get_kprobe(p->addr);
>>+ if (!kernel_text_address((unsigned long)p->addr))
>>+ old_p = get_kprobe_user(p);
>>+ else
>>+ old_p = get_kprobe(p->addr);
>> p->nmissed = 0;
>> if (old_p) {
>> ret = register_aggr_kprobe(old_p, p);
>> goto out;
>> }
>>
>>- arch_copy_kprobe(p);
>> INIT_HLIST_NODE(&p->hlist);
>>- hlist_add_head(&p->hlist,
>>-       &kprobe_table[hash_ptr(p->addr,
KPROBE_HASH_BITS)]);
>>-
>>-   arch_arm_kprobe(p);
>>-
>>+ if (!kernel_text_address((unsigned long)p->addr))
>>+ addr = insert_kprobe_user(p);
After calling insert_kprobe_user, p->addr is still equal to the return
value of kmap_atomic. So when the uprobe is hit, how to find it from the
hash table? And I didn't see kprobe_handler is changed to call
get_kprobe_user if uprobe is hit.



>>+ else {
>>+ addr = p->addr;
>>+ arch_copy_kprobe(p);
>>+   arch_arm_kprobe(p);
>>+ }
>>+   hlist_add_head(&p->hlist,
>>+       &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)]);
>> out:
>> spin_unlock_irqrestore(&kprobe_lock, flags);
>> rm_kprobe:
>>@@ -472,7 +519,10 @@ void unregister_kprobe(struct kprobe *p)
>> struct kprobe *old_p;
>>
>> spin_lock_irqsave(&kprobe_lock, flags);
>>- old_p = get_kprobe(p->addr);
>>+ if (!kernel_text_address((unsigned long)p->addr))
>>+ old_p = get_kprobe_user(p);
>>+ else
>>+ old_p = get_kprobe(p->addr);
>> if (old_p) {
>> if (old_p->pre_handler == aggr_pre_handler)
>> cleanup_aggr_kprobe(old_p, p, flags);
>>@@ -501,6 +551,236 @@ void unregister_jprobe(struct jprobe *jp
>> unregister_kprobe(&jp->kp);
>> }
>>
>>+/*
>>+ * Wait for the page to be unlocked if someone else had locked it
>>+ * and map the page.
>>+ */
>>+static int map_uprobe_page(struct uprobe *up, int mapped)
The second parameter seems useless.



>>+{
>>+ /*nothing to to done, page is already mapped */
>>+ if (mapped)
>>+ return 0;
>>+ if (!up->page)
>>+ return 1;
>>+
>>+ wait_on_page_locked(up->page);
>>+ /* we could probably retry readpage here. */
I agree with above comment.



>>+ if (!PageUptodate(up->page))
>>+ return 1;
>>+ up->kp.addr = (kprobe_opcode_t *) kmap_atomic(up->page,
KM_USER0);

>>+ return 0;
>>+}
>>+
>>+/*
>>+ * Get the offset with the mapped page then register kprobe.
>>+ */
>>+static int insert_probe_page(struct uprobe *up)
>>+{
>>+ up->kp.addr = (kprobe_opcode_t *) ((unsigned long)up->kp.addr +
>>+ (unsigned long)(up->offset &
~PAGE_MASK));
Does it guarantee later register_kprobe could distinguish it's a user
probe? Perhaps it's better to assign a fixed address in user space, such
like 0.


>>+ up->kp.opcode = 0;
It's not necessary to initiate kp.opcode as zero here.



>>+ return register_kprobe(&up->kp);
>>+
>>+}
>>+
>>+static void unmap_uprobe_page(struct uprobe *up)
>>+{
>>+ kunmap_atomic(up->kp.addr, KM_USER0);
>>+}
>>+
>>+/*
>>+ * find_get_vma walks through the list of process private mappings
and
>>+ * returns the pointer to vma containing the offset if found.
>>+ */
>>+static struct vm_area_struct *find_get_vma(unsigned long offset,
>>+   struct address_space
*mapping)

>>+{
>>+ struct vm_area_struct *vma = NULL;
>>+ struct prio_tree_iter iter;
>>+ struct prio_tree_root *head = &mapping->i_mmap;
>>+ struct mm_struct *mm;
>>+ unsigned long start, end;
>>+
>>+ spin_lock(&mapping->i_mmap_lock);
>>+ vma_prio_tree_foreach(vma, &iter, head, offset, offset) {
>>+ mm = vma->vm_mm;
>>+ spin_lock(&mm->page_table_lock);
>>+ start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT);
>>+ end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT);
>>+ spin_unlock(&mm->page_table_lock);
>>+ if ((start + offset) < end) {
>>+ spin_unlock(&mapping->i_mmap_lock);
>>+ return vma;
>>+ }
>>+ }
>>+ spin_unlock(&mapping->i_mmap_lock);
>>+ return NULL;
>>+}
>>+
>>+/*
>>+ * Gets exclusive write access to the given inode to ensure that the
file
>>+ * on which probes are currently applied does not change. Use the
function,
>>+ * deny_write_access_to_inode() we added in fs/namei.c.
>>+ */
>>+static inline int ex_write_lock(struct inode *inode)
>>+{
>>+ return deny_write_access_to_inode(inode);
>>+}
>>+
>>+/*
>>+ * Called when removing user space probes to release the write lock
on the

>>+ * inode.
>>+ */
>>+static inline int ex_write_unlock(struct inode *inode)
>>+{
>>+ atomic_inc(&inode->i_writecount);
>>+ return 0;
>>+}
>>+
>>+/*
>>+ * Walk the uprobe_module_list and return the uprobe module with
matching
>>+ * inode.
>>+ */
>>+static struct uprobe_module *get_module_by_inode(struct inode *inode)
>>+{
>>+ struct uprobe_module *um;
>>+ list_for_each_entry(um, &uprobe_module_list, mlist) {
There is no any lock to protect uprobe_module_list. It's not safe under
multi-task environment.



>>+ if (um->inode == inode)
>>+ return um;
>>+ }
>>+ return NULL;
>>+}
>>+
>>+/*
>>+ * Walk the path and get the inode. Check for matching inode with the
module

>>+ * list.
>>+ */
>>+static struct uprobe_module *get_module_by_name(struct uprobe *p)
>>+{
>>+ struct nameidata nd;
>>+ struct inode *inode;
>>+ char *temp = p->name;
>>+
>>+ if (path_lookup(temp, LOOKUP_FOLLOW, &nd)) {
>>+ path_release(&nd);
>>+ printk("Failed to lookup the path\n");
>>+ return NULL;
>>+ }
>>+ inode = nd.dentry->d_inode;
>>+ path_release(&nd);
>>+ return get_module_by_inode(inode);
It's better to move path_release after calling get_module_by_inode in
case of potential race condition.


>>+}
>>+
>>+/*
>>+ * get inode operations.
>>+ *
>>+ * Walk the path name and get the inode. This function leaves with
the
>>+ * dentry held and taking with the inode writelock held to ensure
that the
>>+ * file on which probes are currently active does not change from
under us.

>>+ */
>>+static struct uprobe_module *get_inode_ops(struct uprobe *up)
>>+{
>>+ struct uprobe_module *um = NULL;
>>+ int error;
>>+ char *temp = up->name;
>>+
>>+ INIT_HLIST_NODE(&up->ulist);
>>+ um = get_module_by_name(up);
>>+ if (um) {
>>+ up->inode = um->inode;
>>+ hlist_add_head(&up->ulist, &um->ulist_head);
>>+ goto out;
>>+ }
>>+
>>+ um = kcalloc(1, sizeof(struct uprobe_module), GFP_KERNEL);
If kcalloc returns error, it should be checked here.


>>+ if (path_lookup(temp, LOOKUP_FOLLOW, &um->nd))
>>+ goto err;
>>+
>>+ up->inode = um->nd.dentry->d_inode;
>>+
>>+ if ((error = ex_write_lock(up->inode))) {
>>+ path_release(&um->nd);
>>+ goto err;
>>+ }
>>+ INIT_HLIST_HEAD(&um->ulist_head);
>>+ hlist_add_head(&up->ulist, &um->ulist_head);
>>+ list_add(&um->mlist, &uprobe_module_list);
>>+ um->inode = up->inode;
>>+ goto out;
>>+
>>+err:
>>+ kfree(um);
It looks like you forgot "return NULL" here.



>>+out:
>>+ return um;
>>+}
>>+/*
>>+ * physical insertion/removal of probes in the actual pages of the
module.
>>+ * Register user space probes before actually instering probes in the
page for

>>+ * a given pair of inode and offset.
>>+ */
>>+int register_userspace_probe(struct uprobe *up)
>>+{
>>+ struct address_space *mapping;
>>+ struct uprobe_module *um;
>>+ int error = 0;
>>+
>>+ if (!(um = get_inode_ops(up))) {
>>+ printk("get_inode_opertion: %s returned error %d\n",
up->name,
>>+       error);
>>+ return -ENOSYS;
>>+ }
>>+
>>+ mapping = up->inode->i_mapping;
>>+ up->vma = find_get_vma(up->offset, mapping);
>>+ up->page = find_get_page(mapping, (up->offset >>
PAGE_CACHE_SHIFT));
>>+
>>+ if (!map_uprobe_page(up, 0)) {
If map_uprobe_page(up,0) != 0, register_userspace_probe still returns 0
meaning success register. It's incorrect and might confuse users.
How about change it to:
if (!(error=map_uprobe_page(up, 0))) {



>>+ error = insert_probe_page(up);
>>+ unmap_uprobe_page(up);
>>+ }
>>+
>>+ if (up->page)
>>+ page_cache_release(up->page);
>>+
>>+ return error;
>>+}
>>+
>>+/*
>>+ * physical insertion/removal of probes in the actual pages of the
module.
>>+ * Register user space probes before actually instering probes in the
page for

>>+ * a given pair of inode and offset.
>>+ *
>>+ */
>>+void unregister_userspace_probe(struct uprobe *up)
>>+{
>>+ kprobe_opcode_t *addr;
>>+ struct address_space *mapping = up->inode->i_mapping;
>>+ struct uprobe_module *um;
>>+
>>+ up->vma = find_get_vma(up->offset, mapping);
>>+ up->page = find_get_page(mapping, up->offset >>
PAGE_CACHE_SHIFT);

>>+
>>+ if (map_uprobe_page(up, 0)) {
>>+ printk("Probe unregister: failed\n");
>>+ goto out;
>>+ }
>>+ addr = (kprobe_opcode_t *) (up->kp.addr +
>>+ (unsigned long)(up->offset & ~PAGE_MASK));
>>+ up->kp.addr = addr;
>>+ unregister_kprobe(&up->kp);
>>+ unmap_uprobe_page(up);
>>+ um = get_module_by_inode(up->inode);
>>+ hlist_del(&up->ulist);
>>+ if (hlist_empty(&um->ulist_head)) {
>>+ list_del(&um->mlist);
>>+ ex_write_unlock(up->inode);
>>+ path_release(&um->nd); /* release path */
>>+ }
>>+out:
>>+ if (up->page)
>>+ page_cache_release(up->page);
>>+}
>>+
>> #ifdef ARCH_SUPPORTS_KRETPROBES
>>
>> int register_kretprobe(struct kretprobe *rp)
>>@@ -574,6 +854,8 @@ static int __init init_kprobes(void)
>> INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
>> }
>>
>>+ /* initialize uprobe_module_list */
>>+ INIT_LIST_HEAD(&uprobe_module_list);
>> err = arch_init_kprobes();
>> if (!err)
>> err = register_die_notifier(&kprobe_exceptions_nb);
>>@@ -590,4 +872,5 @@ EXPORT_SYMBOL_GPL(unregister_jprobe);
>> EXPORT_SYMBOL_GPL(jprobe_return);
>> EXPORT_SYMBOL_GPL(register_kretprobe);
>> EXPORT_SYMBOL_GPL(unregister_kretprobe);
>>-
>>+EXPORT_SYMBOL_GPL(register_userspace_probe);
>>+EXPORT_SYMBOL_GPL(unregister_userspace_probe);
>>diff -puN
include/linux/kprobes.h~kprobes_userspace_probes-newinterface
include/linux/kprobes.h
>>---
linux-2.6.13/include/linux/kprobes.h~kprobes_userspace_probes-newinterfa
ce 2005-09-14 11:00:20.025402504 +0530
>>+++ linux-2.6.13-prasanna/include/linux/kprobes.h 2005-09-14
11:00:20.047399160 +0530

>>@@ -33,7 +33,10 @@
>> #include <linux/list.h>
>> #include <linux/notifier.h>
>> #include <linux/smp.h>
>>-
>>+#include <linux/mm.h>
>>+#include <linux/dcache.h>
>>+#include <linux/namei.h>
>>+#include <linux/pagemap.h>
>> #include <asm/kprobes.h>
>>
>> /* kprobe_status settings */
>>@@ -103,6 +106,24 @@ struct jprobe {
>> kprobe_opcode_t *entry; /* probe handling code to jump to */
>> };
>>
>>+
>>+struct uprobe {
>>+ char name[50];
>>+ struct kprobe kp;
>>+ struct hlist_node ulist; /*list of probe per module */
>>+ struct inode *inode;
>>+ unsigned long offset;
>>+ struct page *page;
>>+ struct vm_area_struct *vma;
>>+};
>>+
>>+struct uprobe_module {
>>+ struct hlist_head ulist_head;
>>+ struct list_head mlist;
>>+ struct inode *inode;
>>+ struct nameidata nd; /* to hold path/dentry etc. */
>>+};
>>+
>> #ifdef ARCH_SUPPORTS_KRETPROBES
>> extern void arch_prepare_kretprobe(struct kretprobe *rp, struct
pt_regs *regs);
>> #else /* ARCH_SUPPORTS_KRETPROBES */
>>@@ -159,6 +180,8 @@ extern int arch_init_kprobes(void);
>> extern void show_registers(struct pt_regs *regs);
>> extern kprobe_opcode_t *get_insn_slot(void);
>> extern void free_insn_slot(kprobe_opcode_t *slot);
>>+extern void arch_arm_uprobe(struct uprobe *up);
>>+extern void arch_disarm_uprobe(struct uprobe *up);
>>
>> /* Get the kprobe at this addr (if any).  Must have called
lock_kprobes */
>> struct kprobe *get_kprobe(void *addr);
>>diff -puN
arch/i386/kernel/kprobes.c~kprobes_userspace_probes-newinterface
arch/i386/kernel/kprobes.c
>>---
linux-2.6.13/arch/i386/kernel/kprobes.c~kprobes_userspace_probes-newinte
rface 2005-09-14 11:00:20.029401896 +0530
>>+++ linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c 2005-09-14
11:00:20.048399008 +0530
>>@@ -87,6 +87,20 @@ void arch_disarm_kprobe(struct kprobe *p
>>   (unsigned long) p->addr +
sizeof(kprobe_opcode_t));
>> }
>>
>>+void arch_arm_uprobe(struct uprobe *up)
>>+{
>>+ *up->kp.addr = BREAKPOINT_INSTRUCTION;
>>+ flush_icache_user_range(up->vma, up->page,
>>+ (unsigned long) up->kp.addr,
sizeof(kprobe_opcode_t));
>>+}
>>+
>>+void arch_disarm_uprobe(struct uprobe *up)
>>+{
>>+ *up->kp.addr = up->kp.opcode;
>>+ flush_icache_user_range(up->vma, up->page,
>>+ (unsigned long) up->kp.addr,
sizeof(kprobe_opcode_t));
>>+}
>>+
>> void arch_remove_kprobe(struct kprobe *p)
>> {
>> }
>>diff -puN include/linux/namei.h~kprobes_userspace_probes-newinterface
include/linux/namei.h
>>---
linux-2.6.13/include/linux/namei.h~kprobes_userspace_probes-newinterface
2005-09-14 11:00:20.033401288 +0530
>>+++ linux-2.6.13-prasanna/include/linux/namei.h 2005-09-14
11:00:20.048399008 +0530

>>@@ -73,6 +73,7 @@ extern int follow_up(struct vfsmount **,
>>
>> extern struct dentry *lock_rename(struct dentry *, struct dentry *);
>> extern void unlock_rename(struct dentry *, struct dentry *);
>>+extern int deny_write_access_to_inode(struct inode *inode);
>>
>> static inline void nd_set_link(struct nameidata *nd, char *path)
>> {
>>diff -puN fs/namei.c~kprobes_userspace_probes-newinterface fs/namei.c
>>--- linux-2.6.13/fs/namei.c~kprobes_userspace_probes-newinterface
2005-09-14 11:00:20.037400680 +0530
>>+++ linux-2.6.13-prasanna/fs/namei.c 2005-09-14 11:00:20.052398400
+0530

>>@@ -286,6 +286,18 @@ int get_write_access(struct inode * inod
>> return 0;
>> }
>>
>>+int deny_write_access_to_inode(struct inode * inode)
>>+{
>>+ spin_lock(&inode->i_lock);
>>+ if (atomic_read(&inode->i_writecount) > 0) {
>>+ spin_unlock(&inode->i_lock);
>>+ return -ETXTBSY;
>>+ }
>>+ atomic_dec(&inode->i_writecount);
>>+ spin_unlock(&inode->i_lock);
>>+ return 0;
>>+}
>>+
>> int deny_write_access(struct file * file)
>> {
>> struct inode *inode = file->f_dentry->d_inode;
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
General question:
1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 cases related to the question.
        a) Many applications execute codes produced themselves, such like JIT (Just-In-Time) of JVM.
        b) Some executables include TEXTREL section. When they are loaded into memory and linked dynamically, the text section might be changed, and kernel will do a Copy-On-Write to create a new anonymous page and map the new page to the process address space. So after the process starts, we couldn't insert uprobe on its copied pages.
Should a new interface be added to support it? The parameters could be process id and offset in the process address space. Of course, it could be an enhancement and implemented later.

3) Can function register_userspace_probe do not call register_kprobe? I think it's not necessary. It's just my feeling. It's up to you to make decision. :)

2) Function get_inode_ops should take care of errors and its caller, register_userspace_probe should check if the return value of get_inode_ops is IS_ERR. If so, the error code should be returned instead of a hard-coded -ENOSYS.

Other more comments against patch 1/3 are inline below.

Yanmin

>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Zhang, Yanmin
>>Sent: 2005年12月21日 16:30
>>To: [hidden email]
>>>>+static struct vm_area_struct *find_get_vma(unsigned long offset,
>>>>+   struct address_space
>>*mapping)
>>>>+{
>>>>+ struct vm_area_struct *vma = NULL;
>>>>+ struct prio_tree_iter iter;
>>>>+ struct prio_tree_root *head = &mapping->i_mmap;
>>>>+ struct mm_struct *mm;
>>>>+ unsigned long start, end;
>>>>+
>>>>+ spin_lock(&mapping->i_mmap_lock);
>>>>+ vma_prio_tree_foreach(vma, &iter, head, offset, offset) {
>>>>+ mm = vma->vm_mm;
>>>>+ spin_lock(&mm->page_table_lock);
>>>>+ start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT);
>>>>+ end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT);
>>>>+ spin_unlock(&mm->page_table_lock);
>>>>+ if ((start + offset) < end) {
>>>>+ spin_unlock(&mapping->i_mmap_lock);
>>>>+ return vma;
It's not safe to return vma without lock. There is a race condition. If vma is released by another thread, kernel might be crazy when this thread tries to access it later.

If the page is mapped to many vma in different processes, function find_get_vma just returns one vma. It's not enough.

I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach loop.


>>>>+ }
>>>>+ }
>>>>+ spin_unlock(&mapping->i_mmap_lock);
>>>>+ return NULL;
>>>>+}
Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
One more comment for patch 1/3.

register_aggr_kprobe needs to be aware of user space probe.

Yanmin

>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Zhang, Yanmin
>>Sent: 2005年12月22日 11:37
>>To: [hidden email]
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: RE: Review patches of user space kprobe
>>
>>General question:
>>1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 cases related to the question.
>> a) Many applications execute codes produced themselves, such like JIT (Just-In-Time) of JVM.
>> b) Some executables include TEXTREL section. When they are loaded into memory and linked dynamically, the text section might be
>>changed, and kernel will do a Copy-On-Write to create a new anonymous page and map the new page to the process address space. So after
>>the process starts, we couldn't insert uprobe on its copied pages.
>>Should a new interface be added to support it? The parameters could be process id and offset in the process address space. Of course,
>>it could be an enhancement and implemented later.
>>
>>3) Can function register_userspace_probe do not call register_kprobe? I think it's not necessary. It's just my feeling. It's up to you
>>to make decision. :)
>>
>>2) Function get_inode_ops should take care of errors and its caller, register_userspace_probe should check if the return value of
>>get_inode_ops is IS_ERR. If so, the error code should be returned instead of a hard-coded -ENOSYS.
>>
>>Other more comments against patch 1/3 are inline below.
>>
>>Yanmin
>>
>>>>-----Original Message-----
>>>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Zhang, Yanmin
>>>>Sent: 2005年12月21日 16:30
>>>>To: [hidden email]
>>>>>>+static struct vm_area_struct *find_get_vma(unsigned long offset,
>>>>>>+   struct address_space
>>>>*mapping)
>>>>>>+{
>>>>>>+ struct vm_area_struct *vma = NULL;
>>>>>>+ struct prio_tree_iter iter;
>>>>>>+ struct prio_tree_root *head = &mapping->i_mmap;
>>>>>>+ struct mm_struct *mm;
>>>>>>+ unsigned long start, end;
>>>>>>+
>>>>>>+ spin_lock(&mapping->i_mmap_lock);
>>>>>>+ vma_prio_tree_foreach(vma, &iter, head, offset, offset) {
>>>>>>+ mm = vma->vm_mm;
>>>>>>+ spin_lock(&mm->page_table_lock);
>>>>>>+ start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT);
>>>>>>+ end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT);
>>>>>>+ spin_unlock(&mm->page_table_lock);
>>>>>>+ if ((start + offset) < end) {
>>>>>>+ spin_unlock(&mapping->i_mmap_lock);
>>>>>>+ return vma;
>>It's not safe to return vma without lock. There is a race condition. If vma is released by another thread, kernel might be crazy when
>>this thread tries to access it later.
>>
>>If the page is mapped to many vma in different processes, function find_get_vma just returns one vma. It's not enough.
>>
>>I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach loop.
>>
>>
>>>>>>+ }
>>>>>>+ }
>>>>>>+ spin_unlock(&mapping->i_mmap_lock);
>>>>>>+ return NULL;
>>>>>>+}
Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
Below inline are the comments for patch 2/3.

Yanmin

>>
>>Signed-of-by: Prasanna S Panchamukhi <[hidden email]>
>>
>>
>>---
>>
>> linux-2.6.13-prasanna/include/linux/kprobes.h |    2
>> linux-2.6.13-prasanna/kernel/kprobes.c        |  113
++++++++++++++++++++++++++
>> 2 files changed, 115 insertions(+)
>>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-readpages
kernel/kprobes.c
>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-readpages
2005-09-14 11:01:18.495513696 +0530
>>+++ linux-2.6.13-prasanna/kernel/kprobes.c 2005-09-14
11:01:18.550505336 +0530
>>@@ -652,6 +652,109 @@ static struct uprobe_module *get_module_
>> }
>>
>> /*
>>+ * Check if the given offset lies within given page range.
>>+ */
>>+static int find_page_probe(unsigned long offset, unsigned long
page_start)
>>+{
>>+      unsigned long page_end = page_start + PAGE_SIZE;
>>+ if (offset >= page_start && offset < page_end)
>>+ return 1;
>>+ return 0;
>>+}
>>+
>>+/*
>>+ * This function handles the readpages of all modules that have
active probes
>>+ * in them. Here, we first call the original readpages() of this
>>+ * inode / address_space to actually read the pages into memory.
Then, we will
>>+ * insert all the probes that are specified in this pages before
returning.
>>+ */
>>+static int up_readpages(struct file *file, struct address_space
*mapping,
>>+ struct list_head *pages, unsigned nr_pages)
>>+{
>>+ int retval = 0;
>>+ struct page *page;
>>+ struct uprobe_module *m;
>>+ struct uprobe *up = NULL;
>>+ struct hlist_node *node;
>>+
>>+ m = get_module_by_inode(file->f_dentry->d_inode);
There is a race condition between unregister_userspace_probe and here.
If a thread jumps to the beginning of function up_readpages while
another thread calls unregister_userspace_probe to delete the um, the
first thread might return error.


>>+ if (!m) {
>>+ printk("up_readpages: major problem. we don't  \
>>+ have mod for this
!!!\n");
>>+ return -EINVAL;
>>+ }
>>+
>>+ /* call original readpages() */
>>+ retval = m->ori_a_ops->readpages(file, mapping, pages,
nr_pages);
>>+ if (retval >= 0) {
>>+ hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>+ /*
>>+ * TODO: Walk through readpages page list and
get
>>+ * pages with probes instead of find_get_page().
>>+ */
>>+ if ((page = find_get_page(mapping,
>>+ up->offset >> PAGE_CACHE_SHIFT)) !=
NULL) {
>>+ if (find_page_probe
>>+    (up->offset >> PAGE_CACHE_SHIFT,
>>+     page->index << PAGE_CACHE_SHIFT)) {
>>+ up->page = page;
>>+ if (!map_uprobe_page(up, 0)) {
>>+ lock_page(up->page);
The first patch doesn't do lock_page before calling insert_probe_page.
Why does this patch do so? It's inconsistent.


>>+ insert_probe_page(up);
>>+ unmap_uprobe_page(up);
>>+ unlock_page(up->page);
>>+ }
>>+ }
>>+ page_cache_release(up->page);
>>+ }
>>+ }
>>+ }
>>+ return retval;
>>+}
>>+
>>+/*
>>+ * This function handles the readpage of all modules that have active
probes
>>+ * in them. Here, we first call the original readpage() of this
>>+ * inode / address_space to actually read the page into memory. Then,
we will
>>+ * insert all the probes that are specified in this page before
returning.

>>+ */
>>+int up_readpage(struct file *file, struct page *page)
>>+{
>>+ int retval = 0;
>>+ struct uprobe_module *m;
>>+ struct uprobe *up = NULL;
>>+ int kprobe_page_mapped = 0;
>>+ struct hlist_node *node;
>>+
>>+ m = get_module_by_inode(file->f_dentry->d_inode);
The same race condition like above function.


>>+ if (!m) {
>>+ printk("up_readpage: major problem. we don't have mod
for this !!!\n");
>>+ return -EINVAL;
>>+ }
>>+
>>+ /* call original readpage() */
>>+ retval = m->ori_a_ops->readpage(file, page);
>>+ if (retval >= 0) {
>>+ hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>+ if (find_page_probe (up->offset >>
PAGE_CACHE_SHIFT,
>>+ page->index <<
PAGE_CACHE_SHIFT)) {
>>+ up->page = page;
>>+ if (!map_uprobe_page(up,
kprobe_page_mapped)) {
>>+ lock_page(up->page);
Same inconsistent issue.


>>+ kprobe_page_mapped = 1;
>>+ retval = insert_probe_page(up);
>>+ }
>>+ }
>>+ }
>>+ if (kprobe_page_mapped) {
The logic here is incorrect. If there are many uprobes at the same page,
up just points to the last one. How about others?



>>+ unmap_uprobe_page(up);
>>+ unlock_page(up->page);
>>+ }
>>+ }
>>+ return retval;
>>+}
>>+

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Vara Prasad
Hi Yanmin,

I appreciate your comments. Prasanna the author of the patch is on
vacation until the month end, he will address your comments once he is
back in Jan. Please feel free to finish the review of the 3rd patch as
well in the mean time.

Thanks,
Vara Prasad

Zhang, Yanmin wrote:

>Below inline are the comments for patch 2/3.
>
>Yanmin
>
>  
>
>>>Signed-of-by: Prasanna S Panchamukhi <[hidden email]>
>>>
>>>
>>>---
>>>
>>>linux-2.6.13-prasanna/include/linux/kprobes.h |    2
>>>linux-2.6.13-prasanna/kernel/kprobes.c        |  113
>>>      
>>>
>++++++++++++++++++++++++++
>  
>
>>>2 files changed, 115 insertions(+)
>>>
>>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>      
>>>
>kernel/kprobes.c
>  
>
>>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>      
>>>
>2005-09-14 11:01:18.495513696 +0530
>  
>
>>>+++ linux-2.6.13-prasanna/kernel/kprobes.c 2005-09-14
>>>      
>>>
>11:01:18.550505336 +0530
>  
>
>>>@@ -652,6 +652,109 @@ static struct uprobe_module *get_module_
>>>}
>>>
>>>/*
>>>+ * Check if the given offset lies within given page range.
>>>+ */
>>>+static int find_page_probe(unsigned long offset, unsigned long
>>>      
>>>
>page_start)
>  
>
>>>+{
>>>+      unsigned long page_end = page_start + PAGE_SIZE;
>>>+ if (offset >= page_start && offset < page_end)
>>>+ return 1;
>>>+ return 0;
>>>+}
>>>+
>>>+/*
>>>+ * This function handles the readpages of all modules that have
>>>      
>>>
>active probes
>  
>
>>>+ * in them. Here, we first call the original readpages() of this
>>>+ * inode / address_space to actually read the pages into memory.
>>>      
>>>
>Then, we will
>  
>
>>>+ * insert all the probes that are specified in this pages before
>>>      
>>>
>returning.
>  
>
>>>+ */
>>>+static int up_readpages(struct file *file, struct address_space
>>>      
>>>
>*mapping,
>  
>
>>>+ struct list_head *pages, unsigned nr_pages)
>>>+{
>>>+ int retval = 0;
>>>+ struct page *page;
>>>+ struct uprobe_module *m;
>>>+ struct uprobe *up = NULL;
>>>+ struct hlist_node *node;
>>>+
>>>+ m = get_module_by_inode(file->f_dentry->d_inode);
>>>      
>>>
>There is a race condition between unregister_userspace_probe and here.
>If a thread jumps to the beginning of function up_readpages while
>another thread calls unregister_userspace_probe to delete the um, the
>first thread might return error.
>
>
>  
>
>>>+ if (!m) {
>>>+ printk("up_readpages: major problem. we don't  \
>>>+ have mod for this
>>>      
>>>
>!!!\n");
>  
>
>>>+ return -EINVAL;
>>>+ }
>>>+
>>>+ /* call original readpages() */
>>>+ retval = m->ori_a_ops->readpages(file, mapping, pages,
>>>      
>>>
>nr_pages);
>  
>
>>>+ if (retval >= 0) {
>>>+ hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>+ /*
>>>+ * TODO: Walk through readpages page list and
>>>      
>>>
>get
>  
>
>>>+ * pages with probes instead of find_get_page().
>>>+ */
>>>+ if ((page = find_get_page(mapping,
>>>+ up->offset >> PAGE_CACHE_SHIFT)) !=
>>>      
>>>
>NULL) {
>  
>
>>>+ if (find_page_probe
>>>+    (up->offset >> PAGE_CACHE_SHIFT,
>>>+     page->index << PAGE_CACHE_SHIFT)) {
>>>+ up->page = page;
>>>+ if (!map_uprobe_page(up, 0)) {
>>>+ lock_page(up->page);
>>>      
>>>
>The first patch doesn't do lock_page before calling insert_probe_page.
>Why does this patch do so? It's inconsistent.
>
>
>  
>
>>>+ insert_probe_page(up);
>>>+ unmap_uprobe_page(up);
>>>+ unlock_page(up->page);
>>>+ }
>>>+ }
>>>+ page_cache_release(up->page);
>>>+ }
>>>+ }
>>>+ }
>>>+ return retval;
>>>+}
>>>+
>>>+/*
>>>+ * This function handles the readpage of all modules that have active
>>>      
>>>
>probes
>  
>
>>>+ * in them. Here, we first call the original readpage() of this
>>>+ * inode / address_space to actually read the page into memory. Then,
>>>      
>>>
>we will
>  
>
>>>+ * insert all the probes that are specified in this page before
>>>      
>>>
>returning.
>  
>
>>>+ */
>>>+int up_readpage(struct file *file, struct page *page)
>>>+{
>>>+ int retval = 0;
>>>+ struct uprobe_module *m;
>>>+ struct uprobe *up = NULL;
>>>+ int kprobe_page_mapped = 0;
>>>+ struct hlist_node *node;
>>>+
>>>+ m = get_module_by_inode(file->f_dentry->d_inode);
>>>      
>>>
>The same race condition like above function.
>
>
>  
>
>>>+ if (!m) {
>>>+ printk("up_readpage: major problem. we don't have mod
>>>      
>>>
>for this !!!\n");
>  
>
>>>+ return -EINVAL;
>>>+ }
>>>+
>>>+ /* call original readpage() */
>>>+ retval = m->ori_a_ops->readpage(file, page);
>>>+ if (retval >= 0) {
>>>+ hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>+ if (find_page_probe (up->offset >>
>>>      
>>>
>PAGE_CACHE_SHIFT,
>  
>
>>>+ page->index <<
>>>      
>>>
>PAGE_CACHE_SHIFT)) {
>  
>
>>>+ up->page = page;
>>>+ if (!map_uprobe_page(up,
>>>      
>>>
>kprobe_page_mapped)) {
>  
>
>>>+ lock_page(up->page);
>>>      
>>>
>Same inconsistent issue.
>
>
>  
>
>>>+ kprobe_page_mapped = 1;
>>>+ retval = insert_probe_page(up);
>>>+ }
>>>+ }
>>>+ }
>>>+ if (kprobe_page_mapped) {
>>>      
>>>
>The logic here is incorrect. If there are many uprobes at the same page,
>up just points to the last one. How about others?
>
>
>
>  
>
>>>+ unmap_uprobe_page(up);
>>>+ unlock_page(up->page);
>>>+ }
>>>+ }
>>>+ return retval;
>>>+}
>>>+
>>>      
>>>
>
>
>  
>


Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
Thanks for your info. I go through the patches 1 by 1. Perhaps later patches already fix the problems in prior patches, so my comments might be incorrect. :)

Yanmin

>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Vara Prasad
>>Sent: 2005年12月22日 13:41
>>To: Zhang, Yanmin
>>Cc: [hidden email]; [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>Hi Yanmin,
>>
>>I appreciate your comments. Prasanna the author of the patch is on
>>vacation until the month end, he will address your comments once he is
>>back in Jan. Please feel free to finish the review of the 3rd patch as
>>well in the mean time.
>>
>>Thanks,
>>Vara Prasad
>>
>>Zhang, Yanmin wrote:
>>
>>>Below inline are the comments for patch 2/3.
>>>
>>>Yanmin
>>>
>>>
>>>
>>>>>Signed-of-by: Prasanna S Panchamukhi <[hidden email]>
>>>>>
>>>>>
>>>>>---
>>>>>
>>>>>linux-2.6.13-prasanna/include/linux/kprobes.h |    2
>>>>>linux-2.6.13-prasanna/kernel/kprobes.c        |  113
>>>>>
>>>>>
>>>++++++++++++++++++++++++++
>>>
>>>
>>>>>2 files changed, 115 insertions(+)
>>>>>
>>>>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>>>
>>>>>
>>>kernel/kprobes.c
>>>
>>>
>>>>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-readpages
>>>>>
>>>>>
>>>2005-09-14 11:01:18.495513696 +0530
>>>
>>>
>>>>>+++ linux-2.6.13-prasanna/kernel/kprobes.c 2005-09-14
>>>>>
>>>>>
>>>11:01:18.550505336 +0530
>>>
>>>
>>>>>@@ -652,6 +652,109 @@ static struct uprobe_module *get_module_
>>>>>}
>>>>>
>>>>>/*
>>>>>+ * Check if the given offset lies within given page range.
>>>>>+ */
>>>>>+static int find_page_probe(unsigned long offset, unsigned long
>>>>>
>>>>>
>>>page_start)
>>>
>>>
>>>>>+{
>>>>>+      unsigned long page_end = page_start + PAGE_SIZE;
>>>>>+ if (offset >= page_start && offset < page_end)
>>>>>+ return 1;
>>>>>+ return 0;
>>>>>+}
>>>>>+
>>>>>+/*
>>>>>+ * This function handles the readpages of all modules that have
>>>>>
>>>>>
>>>active probes
>>>
>>>
>>>>>+ * in them. Here, we first call the original readpages() of this
>>>>>+ * inode / address_space to actually read the pages into memory.
>>>>>
>>>>>
>>>Then, we will
>>>
>>>
>>>>>+ * insert all the probes that are specified in this pages before
>>>>>
>>>>>
>>>returning.
>>>
>>>
>>>>>+ */
>>>>>+static int up_readpages(struct file *file, struct address_space
>>>>>
>>>>>
>>>*mapping,
>>>
>>>
>>>>>+ struct list_head *pages, unsigned nr_pages)
>>>>>+{
>>>>>+ int retval = 0;
>>>>>+ struct page *page;
>>>>>+ struct uprobe_module *m;
>>>>>+ struct uprobe *up = NULL;
>>>>>+ struct hlist_node *node;
>>>>>+
>>>>>+ m = get_module_by_inode(file->f_dentry->d_inode);
>>>>>
>>>>>
>>>There is a race condition between unregister_userspace_probe and here.
>>>If a thread jumps to the beginning of function up_readpages while
>>>another thread calls unregister_userspace_probe to delete the um, the
>>>first thread might return error.
>>>
>>>
>>>
>>>
>>>>>+ if (!m) {
>>>>>+ printk("up_readpages: major problem. we don't  \
>>>>>+ have mod for this
>>>>>
>>>>>
>>>!!!\n");
>>>
>>>
>>>>>+ return -EINVAL;
>>>>>+ }
>>>>>+
>>>>>+ /* call original readpages() */
>>>>>+ retval = m->ori_a_ops->readpages(file, mapping, pages,
>>>>>
>>>>>
>>>nr_pages);
>>>
>>>
>>>>>+ if (retval >= 0) {
>>>>>+ hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>>>+ /*
>>>>>+ * TODO: Walk through readpages page list and
>>>>>
>>>>>
>>>get
>>>
>>>
>>>>>+ * pages with probes instead of find_get_page().
>>>>>+ */
>>>>>+ if ((page = find_get_page(mapping,
>>>>>+ up->offset >> PAGE_CACHE_SHIFT)) !=
>>>>>
>>>>>
>>>NULL) {
>>>
>>>
>>>>>+ if (find_page_probe
>>>>>+    (up->offset >> PAGE_CACHE_SHIFT,
>>>>>+     page->index << PAGE_CACHE_SHIFT)) {
>>>>>+ up->page = page;
>>>>>+ if (!map_uprobe_page(up, 0)) {
>>>>>+ lock_page(up->page);
>>>>>
>>>>>
>>>The first patch doesn't do lock_page before calling insert_probe_page.
>>>Why does this patch do so? It's inconsistent.
>>>
>>>
>>>
>>>
>>>>>+ insert_probe_page(up);
>>>>>+ unmap_uprobe_page(up);
>>>>>+ unlock_page(up->page);
>>>>>+ }
>>>>>+ }
>>>>>+ page_cache_release(up->page);
>>>>>+ }
>>>>>+ }
>>>>>+ }
>>>>>+ return retval;
>>>>>+}
>>>>>+
>>>>>+/*
>>>>>+ * This function handles the readpage of all modules that have active
>>>>>
>>>>>
>>>probes
>>>
>>>
>>>>>+ * in them. Here, we first call the original readpage() of this
>>>>>+ * inode / address_space to actually read the page into memory. Then,
>>>>>
>>>>>
>>>we will
>>>
>>>
>>>>>+ * insert all the probes that are specified in this page before
>>>>>
>>>>>
>>>returning.
>>>
>>>
>>>>>+ */
>>>>>+int up_readpage(struct file *file, struct page *page)
>>>>>+{
>>>>>+ int retval = 0;
>>>>>+ struct uprobe_module *m;
>>>>>+ struct uprobe *up = NULL;
>>>>>+ int kprobe_page_mapped = 0;
>>>>>+ struct hlist_node *node;
>>>>>+
>>>>>+ m = get_module_by_inode(file->f_dentry->d_inode);
>>>>>
>>>>>
>>>The same race condition like above function.
>>>
>>>
>>>
>>>
>>>>>+ if (!m) {
>>>>>+ printk("up_readpage: major problem. we don't have mod
>>>>>
>>>>>
>>>for this !!!\n");
>>>
>>>
>>>>>+ return -EINVAL;
>>>>>+ }
>>>>>+
>>>>>+ /* call original readpage() */
>>>>>+ retval = m->ori_a_ops->readpage(file, page);
>>>>>+ if (retval >= 0) {
>>>>>+ hlist_for_each_entry(up, node, &m->ulist_head, ulist) {
>>>>>+ if (find_page_probe (up->offset >>
>>>>>
>>>>>
>>>PAGE_CACHE_SHIFT,
>>>
>>>
>>>>>+ page->index <<
>>>>>
>>>>>
>>>PAGE_CACHE_SHIFT)) {
>>>
>>>
>>>>>+ up->page = page;
>>>>>+ if (!map_uprobe_page(up,
>>>>>
>>>>>
>>>kprobe_page_mapped)) {
>>>
>>>
>>>>>+ lock_page(up->page);
>>>>>
>>>>>
>>>Same inconsistent issue.
>>>
>>>
>>>
>>>
>>>>>+ kprobe_page_mapped = 1;
>>>>>+ retval = insert_probe_page(up);
>>>>>+ }
>>>>>+ }
>>>>>+ }
>>>>>+ if (kprobe_page_mapped) {
>>>>>
>>>>>
>>>The logic here is incorrect. If there are many uprobes at the same page,
>>>up just points to the last one. How about others?
>>>
>>>
>>>
>>>
>>>
>>>>>+ unmap_uprobe_page(up);
>>>>>+ unlock_page(up->page);
>>>>>+ }
>>>>>+ }
>>>>>+ return retval;
>>>>>+}
>>>>>+
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
Below inline are the comments for patch 3/3.

Yanmin

>>Signed-of-by: Prasanna S Panchamukhi <[hidden email]>
>>diff -puN kernel/kprobes.c~kprobes_userspace_probes-handlers
kernel/kprobes.c
>>--- linux-2.6.13/kernel/kprobes.c~kprobes_userspace_probes-handlers
2005-09-14 11:49:28.780123648 +0530
>>+++ linux-2.6.13-prasanna/kernel/kprobes.c 2005-09-14
11:49:28.835115288 +0530
>>@@ -52,7 +52,8 @@ static struct list_head uprobe_module_li
>>+/*
>>+ * Walk through the kprobe hlist and get the matching userspace probe
>>+ * structure with the given inode and offset.
>>+ */
>>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
long offset)

>>+{
>>+ struct hlist_head *head;
>>+ struct hlist_node *node;
>>+ struct kprobe *p;
>>+
>>+ head = &kprobe_table[hash_long((unsigned long)inode * offset,
>>+       KPROBE_HASH_BITS)];
>>+ hlist_for_each_entry(p, node, head, hlist) {
>>+ if (p->pre_handler == aggr_pre_handler)
>>+ return p;
>>+ else {
>>+ struct uprobe *user = container_of(p,
>>+ struct uprobe,
kp);
Kprobe and uprobe share the same hash table. Does p here always point to
uprobe?

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
Yanmin,

Thanks for your comments, Please see my comments inline below.

>>> /*inode of the application */
>>> struct inode *inode;
>How about using a pointer to uprobe_module to locate the inode? Although
>the storage space is same, it's better to keep all the same info of
>uprobes in uprobe_module. Uprobes just need keep its own specific info.

inode and offset combination uniquely identifies individual user space
probe, hence when a probe is hit, combination inode and offset is used to
identify each probe. Keeping inode pointer in struct uprobe clearly
reflects this concept.


>> /*offset within the page */
>This comment is incorrect. Functions, such like
>register_userspace_probe, use it as the offset from the beginning of the
>file, not a page of the file. Right? Although your example uses it as an
>offset of page. Frank's reply said this item could be deleted. I don't
>think he is correct because uprobe->kp.addr is used when calling
>register_kprobe.

Next patch release will take care of this, offset is hidden from the user and user need
not initialize it.


>> unsigned long offset;=20
>> /*page containing probes */
>> struct page *page;
>> /* virtual memory area containing the probes */
>> struct vm_area_struct *vma;
>Item page and vma are used temporarily. They are not used when the probe
>is hit. To shrink struct uprobe, they could be deleted. A new super
>struct could be defined to include struct uprobe, or always to search
>them when they are needed. Application might be inserted huge uprobes.
>It's better to keep the struct smaller.

If vma's are temporarily defined, then readpage hooks should walk
through the vma list to get the vma. To reduce the time spent in
readpage/readpages hooks in fastpath, we store it while registeration.



>> };
>>
>> struct uprobe_module {
>> struct hlist_head ulist_head;
>> /* hlist head containing list of probes within this
module */=20
>> struct list_head mlist;
>> /* list of uprobes_modules */
>> struct inode *inode;
>Item inode could be deleted. Use uprobe_module->nd.dentry->d_inode.

This will be taken care in the next patch release.


>> /* inode of the application on which probes are
implanted */
>> struct nameidata nd;
>> /* to hold path/dentry etc. */
>Struct nameidata looks a little big. A pointer to dentry might be
>enough.

nameidata is allocated per module/application and not for every probe.
At present the reference count to the mount point is held until
the probes is unregistered. I will relook at this issue.
 
>>+ if (!kernel_text_address((unsigned long)p->addr))
>>+ addr =3D insert_kprobe_user(p);
>After calling insert_kprobe_user, p->addr is still equal to the return
>value of kmap_atomic. So when the uprobe is hit, how to find it from the
>hash table?

Here it does not return just the mapped address, but inode * offset as the address.
Each probe is uniquely identified by the combination of inode and offset, kprobe_handler
does the same thing.

>And I didn't see kprobe_handler is changed to call
>get_kprobe_user if uprobe is hit.

kprobe_handler() calls get_uprobe_at() when the probe is hit.

>>+ */
>>+static int map_uprobe_page(struct uprobe *up, int mapped)
>The second parameter seems useless.

Its being used in the readpage patch.



>>+{
>>+{
>>+ up->kp.addr =3D (kprobe_opcode_t *) ((unsigned long)up->kp.addr +
>>+ (unsigned long)(up->offset &
~PAGE_MASK));
Does it guarantee later register_kprobe could distinguish it's a user
probe? Perhaps it's better to assign a fixed address in user space, such
like 0.

Yes,register_kprobe() distinguishes user space probes. I could not
understand what you mean by "assign a fixed address in user space"

>>+ up->kp.opcode =3D 0;
>It's not necessary to initiate kp.opcode as zero here.

This will be taken care in the next patch release.


>>+ list_for_each_entry(um, &uprobe_module_list, mlist) {
>There is no any lock to protect uprobe_module_list. It's not safe under
>multi-task environment.

This will be fixed in the next release .


>>+ path_release(&nd);
>>+ printk("Failed to lookup the path\n");
>>+ return NULL;
>>+ }
>>+ inode =3D nd.dentry->d_inode;
>>+ path_release(&nd);
>>+ return get_module_by_inode(inode);

>It's better to move path_release after calling get_module_by_inode in
>case of potential race condition.

Later in the patch, we increment the writecount on the inode. get_module_by_inode
just walks the uprobe_module list.

>>+
>>+ um =3D kcalloc(1, sizeof(struct uprobe_module), GFP_KERNEL);
>If kcalloc returns error, it should be checked here.

This will be taken care in the next patch release.

>>+ goto out;
>>+
>>+err:
>>+ kfree(um);
>It looks like you forgot "return NULL" here.

It is taken care off now.


>>+ mapping =3D up->inode->i_mapping;
>>+ up->vma =3D find_get_vma(up->offset, mapping);
>>+ up->page =3D find_get_page(mapping, (up->offset >>
PAGE_CACHE_SHIFT));
>>+
>>+ if (!map_uprobe_page(up, 0)) {
>If map_uprobe_page(up,0) !=3D 0, register_userspace_probe still returns 0
>meaning success register. It's incorrect and might confuse users.
>How about change it to:
>if (!(error=3Dmap_uprobe_page(up, 0))) {

Proper return of error value in case of failure is taken care
in next patch release.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin

>General question:
>1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 =
>cases related to the question.
> a) Many applications execute codes produced themselves, such like JIT =
>(Just-In-Time) of JVM.

At present we do not support it, need to look into such a case.

> b) Some executables include TEXTREL section. When they are loaded into =
>memory and linked dynamically, the text section might be changed, and =
>kernel will do a Copy-On-Write to create a new anonymous page and map =
>the new page to the process address space. So after the process starts, =
>we couldn't insert uprobe on its copied pages.
>Should a new interface be added to support it? The parameters could be =
>process id and offset in the process address space. Of course, it could =
>be an enhancement and implemented later.

User space probes support insertion of probes on dynamically linked libraries
and even probes can be inserted on the text pages that are not even loaded
into the memory.

>3) Can function register_userspace_probe do not call register_kprobe? I =
>think it's not necessary. It's just my feeling. It's up to you to make =
>decision. :)

register_kprobe already does most of what userspace probe registeration needs.

>2) Function get_inode_ops should take care of errors and its caller, =
>register_userspace_probe should check if the return value of =
>get_inode_ops is IS_ERR. If so, the error code should be returned =
>instead of a hard-coded -ENOSYS.

Next patch release will take care of these things.

>>>>+ spin_unlock(&mapping->i_mmap_lock);
>>>>+ return vma;
>It's not safe to return vma without lock. There is a race condition. If =
>vma is released by another thread, kernel might be crazy when this =
>thread tries to access it later.

>If the page is mapped to many vma in different processes, function =
>find_get_vma just returns one vma. It's not enough.
>I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach =
>loop.

could you please elaborate this.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
>
> register_aggr_kprobe needs to be aware of user space probe.
>
this is already taken care off.

--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>+ struct page *page;
> >>+ struct uprobe_module *m;
> >>+ struct uprobe *up = NULL;
> >>+ struct hlist_node *node;
> >>+
> >>+ m = get_module_by_inode(file->f_dentry->d_inode);
> There is a race condition between unregister_userspace_probe and here.
> If a thread jumps to the beginning of function up_readpages while
> another thread calls unregister_userspace_probe to delete the um, the
> first thread might return error.

Some locking/semaphore can be used to avoid this.

> >>+ lock_page(up->page);
> The first patch doesn't do lock_page before calling insert_probe_page.
> Why does this patch do so? It's inconsistent.
>

Next patch release will take care of this.

> >>+ int kprobe_page_mapped = 0;
> >>+ struct hlist_node *node;
> >>+
> >>+ m = get_module_by_inode(file->f_dentry->d_inode);
> The same race condition like above function.
>

Some locking/semaphore can be used to avoid this.

> PAGE_CACHE_SHIFT,
> >>+ page->index <<
> PAGE_CACHE_SHIFT)) {
> >>+ up->page = page;
> >>+ if (!map_uprobe_page(up,
> kprobe_page_mapped)) {
> >>+ lock_page(up->page);
> Same inconsistent issue.
>
>
> >>+ kprobe_page_mapped = 1;
> >>+ retval = insert_probe_page(up);
> >>+ }
> >>+ }
> >>+ }
> >>+ if (kprobe_page_mapped) {
> The logic here is incorrect. If there are many uprobes at the same page,
> up just points to the last one. How about others?
>

readpage() routine reads one page at a time. we map the page one time and walk the
probes list for this inode, insert all the probes within this page and then unmap it.

Thanks
Prasanna
--

Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>+ */
> >>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
> long offset)
> >>+{
> >>+ struct hlist_head *head;
> >>+ struct hlist_node *node;
> >>+ struct kprobe *p;
> >>+
> >>+ head = &kprobe_table[hash_long((unsigned long)inode * offset,
> >>+       KPROBE_HASH_BITS)];
> >>+ hlist_for_each_entry(p, node, head, hlist) {
> >>+ if (p->pre_handler == aggr_pre_handler)
> >>+ return p;
> >>+ else {
> >>+ struct uprobe *user = container_of(p,
> >>+ struct uprobe,
> kp);
> Kprobe and uprobe share the same hash table. Does p here always point to
> uprobe?

Check can be made before accessig uprobe.
if (!kernel_text_address((unsigned long)p->addr))

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636