RE: Review patches of user space kprobe

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
The 4th patch is to avoid probes misses in smp environment. I just have
some initial comments because the patch is not complete.

See the inline comments below.

Yanmin

>>---
>>
>> linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c |   52
++++++++++++++++++++---
>> 1 files changed, 46 insertions(+), 6 deletions(-)
>>
>>diff -puN arch/i386/kernel/kprobes.c~kprobes_avoid_smp_missprobes
arch/i386/kernel/kprobes.c
>>---
linux-2.6.13/arch/i386/kernel/kprobes.c~kprobes_avoid_smp_missprobes
2005-09-15 13:56:24.105192248 +0530
>>+++ linux-2.6.13-prasanna/arch/i386/kernel/kprobes.c 2005-09-15
13:56:24.137187384 +0530
>>+/*
>>+ * This routine provides the functionality of single stepping out of
line by
>>+ * copying the original instruction in the user process free stack
space.
>>+ */
>>+static inline int uprobe_single_step(struct uprobe *p, struct pt_regs
*regs)

>>+{
>>+ unsigned long page_addr, stack_addr = regs->esp;
>>+ struct vm_area_struct *vma = NULL;
>>+ int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
>>+
>>+ if (!(vma = find_vma(current->mm, PAGE_ALIGN(stack_addr)))) {
>>+ printk("No vma found\n");
>>+ return -ENOENT;
>>+ }
>>+
>>+ if (vma->vm_flags & VM_GROWSDOWN) {
>>+ page_addr = PAGE_ALIGN(stack_addr);
It's wrong to use PAGE_ALIGN here. I think mostly page_addr is bigger
than stack_addr. Pls. use stack_addr&PAGE_MASK.


>>+ if (((stack_addr - sizeof(long long)) - page_addr) >
size) {
Pay attention to the type of stack_add and page_addr. They are unsigned
long. So it's better to change it to:
if ((stack_addr - sizeof(long long)) > (size + page_addr)) {

And why consider a sizeof(long long) here?


>>+ if (copy_to_user((unsigned long *)(page_addr +
size),
>>+ (unsigned long
*)&p->kp.ainsn.insn,
>>+ size))
>>+ return -EFAULT;
>>+ regs->eip = page_addr + size;
>>+ }
>>+ } else {
>>+ page_addr = PAGE_ALIGN(stack_addr) + PAGE_SIZE;
Similar issue.


>>+ if ((page_addr - (stack_addr + sizeof(long long))) >
size) {
Similar issue.


>>+ if (copy_to_user((unsigned long *)page_addr,
>>+ (unsigned long
*)&p->kp.ainsn.insn,

>>+\ size))
>>+ return -EFAULT;
>>+ regs->eip = page_addr;
>>+ }
>>+ }
>>+ singlestep_addr = (kprobe_opcode_t *)regs->eip;
>>+ return 0;
>>+}
>>
>> static inline void prepare_singlestep(struct kprobe *p, struct
pt_regs *regs)

>> {
>>@@ -140,10 +178,13 @@ static inline void prepare_singlestep(st
>> if (p->opcode == BREAKPOINT_INSTRUCTION)
>> regs->eip = (unsigned long)p->addr;
>> else {
>>- if (!kernel_text_address((unsigned long)p->addr)) {
>>- arch_disarm_uprobe(current_uprobe);
>>- regs->eip = (unsigned long)uprobe_addr;
>>- } else
>>+ if (!kernel_text_address((unsigned long)p->addr))
>>+ uprobe_single_step(current_uprobe, regs);
1) If uprobe_single_step returns -EFAULT, how does the thread go ahead?
Note the first byte of the original instruction is break now, so the
instruction from the second byte is wrong.
2) If prepare_singlestep succeeds, but later the real instruction of
single step might cause a fatal error, for example, to access illegal
address, then the process will be killed. Would the current
kprobe(uprobe) environment be restored clearly? Such like kprobe_cpu,
kprobe_status and so on.



Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
> The 4th patch is to avoid probes misses in smp environment. I just have
> some initial comments because the patch is not complete.
>

> >>+
> >>+ if (vma->vm_flags & VM_GROWSDOWN) {
> >>+ page_addr = PAGE_ALIGN(stack_addr);
> It's wrong to use PAGE_ALIGN here. I think mostly page_addr is bigger
> than stack_addr. Pls. use stack_addr&PAGE_MASK.

This will be fixed in the next release.

>
>
> >>+ if (((stack_addr - sizeof(long long)) - page_addr) >
> size) {
> Pay attention to the type of stack_add and page_addr. They are unsigned
> long. So it's better to change it to:
> if ((stack_addr - sizeof(long long)) > (size + page_addr)) {
>
> And why consider a sizeof(long long) here?
>

For the safety reasons, some space is left below the stack pointer, so that the probed
instruction may use this space while single stepping.

> >> {
> >>@@ -140,10 +178,13 @@ static inline void prepare_singlestep(st
> >> if (p->opcode == BREAKPOINT_INSTRUCTION)
> >> regs->eip = (unsigned long)p->addr;
> >> else {
> >>- if (!kernel_text_address((unsigned long)p->addr)) {
> >>- arch_disarm_uprobe(current_uprobe);
> >>- regs->eip = (unsigned long)uprobe_addr;
> >>- } else
> >>+ if (!kernel_text_address((unsigned long)p->addr))
> >>+ uprobe_single_step(current_uprobe, regs);
> 1) If uprobe_single_step returns -EFAULT, how does the thread go ahead?
> Note the first byte of the original instruction is break now, so the
> instruction from the second byte is wrong.

As of now it is not be handled but, it can be handled by allowing the page fault to
succeed or expanding the stack space or  replacing the original instruction if nothing
 can be done.

> 2) If prepare_singlestep succeeds, but later the real instruction of
> single step might cause a fatal error, for example, to access illegal
> address, then the process will be killed. Would the current
> kprobe(uprobe) environment be restored clearly? Such like kprobe_cpu,
> kprobe_status and so on.
>

If its an illegal instruction, kprobe_fault_handler() is given control,
so that it can take care of restoring it cleanly.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月5日 19:14
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>+ */
>>> >>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
>>> long offset)
>>> >>+{
>>> >>+ struct hlist_head *head;
>>> >>+ struct hlist_node *node;
>>> >>+ struct kprobe *p;
>>> >>+
>>> >>+ head = &kprobe_table[hash_long((unsigned long)inode * offset,
>>> >>+       KPROBE_HASH_BITS)];
>>> >>+ hlist_for_each_entry(p, node, head, hlist) {
>>> >>+ if (p->pre_handler == aggr_pre_handler)
>>> >>+ return p;
>>> >>+ else {
>>> >>+ struct uprobe *user = container_of(p,
>>> >>+ struct uprobe,
>>> kp);
>>> Kprobe and uprobe share the same hash table. Does p here always point to
>>> uprobe?
>>
>>Check can be made before accessig uprobe.
>>if (!kernel_text_address((unsigned long)p->addr))
Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called incorrectly. The logic is *not clear*.

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:[hidden email]]
>>Sent: 2006年1月5日 18:31
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>Yanmin,
>>
>>Thanks for your comments, Please see my comments inline below.
>>
>>>>> /*inode of the application */
>>>>> struct inode *inode;
>>>How about using a pointer to uprobe_module to locate the inode? Although
>>>the storage space is same, it's better to keep all the same info of
>>>uprobes in uprobe_module. Uprobes just need keep its own specific info.
>>
>>inode and offset combination uniquely identifies individual user space
>>probe,
What does "individual user space probe" mean here? Any uprobe will be shared by all processes who map the same page. Right?


 hence when a probe is hit, combination inode and offset is used to
>>identify each probe. Keeping inode pointer in struct uprobe clearly
>>reflects this concept.
What's the relationship between uprobe and uprobe_module? All uprobes linked by uprobe_module->ulist_head have the same inode, right? If it's correct, why not to just include inode in uprobe_module?


>>
>>
>>>> /*offset within the page */
>>>This comment is incorrect. Functions, such like
>>>register_userspace_probe, use it as the offset from the beginning of the
>>>file, not a page of the file. Right? Although your example uses it as an
>>>offset of page. Frank's reply said this item could be deleted. I don't
>>>think he is correct because uprobe->kp.addr is used when calling
>>>register_kprobe.
>>
>>Next patch release will take care of this, offset is hidden from the user and user need
>>not initialize it.
>>
>>
>>>> unsigned long offset;=20
>>>> /*page containing probes */
>>>> struct page *page;
>>>> /* virtual memory area containing the probes */
>>>> struct vm_area_struct *vma;
>>>Item page and vma are used temporarily. They are not used when the probe
>>>is hit. To shrink struct uprobe, they could be deleted. A new super
>>>struct could be defined to include struct uprobe, or always to search
>>>them when they are needed. Application might be inserted huge uprobes.
>>>It's better to keep the struct smaller.
>>
>>If vma's are temporarily defined, then readpage hooks should walk
>>through the vma list to get the vma. To reduce the time spent in
>>readpage/readpages hooks in fastpath, we store it while registeration.
Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?


>>
>>
>>
>>>> };
>>>>
>>>> struct uprobe_module {
>>>> struct hlist_head ulist_head;
>>>> /* hlist head containing list of probes within this
>>module */=20
>>>> struct list_head mlist;
>>>> /* list of uprobes_modules */
>>>> struct inode *inode;
>>>Item inode could be deleted. Use uprobe_module->nd.dentry->d_inode.
>>
>>This will be taken care in the next patch release.
>>
>>
>>>> /* inode of the application on which probes are
>>implanted */
>>>> struct nameidata nd;
>>>> /* to hold path/dentry etc. */
>>>Struct nameidata looks a little big. A pointer to dentry might be
>>>enough.
>>
>>nameidata is allocated per module/application and not for every probe.
My above questions answer this one.


>>At present the reference count to the mount point is held until
>>the probes is unregistered. I will relook at this issue.
>>
>>>>+ if (!kernel_text_address((unsigned long)p->addr))
>>>>+ addr =3D insert_kprobe_user(p);
>>>After calling insert_kprobe_user, p->addr is still equal to the return
>>>value of kmap_atomic. So when the uprobe is hit, how to find it from the
>>>hash table?
>>
>>Here it does not return just the mapped address, but inode * offset as the address.
>>Each probe is uniquely identified by the combination of inode and offset, kprobe_handler
>>does the same thing.
>>
>>>And I didn't see kprobe_handler is changed to call
>>>get_kprobe_user if uprobe is hit.
>>
>>kprobe_handler() calls get_uprobe_at() when the probe is hit.
>>
>>>>+ */
>>>>+static int map_uprobe_page(struct uprobe *up, int mapped)
>>>The second parameter seems useless.
>>
>>Its being used in the readpage patch.
>>
>>
>>
>>>>+{
>>>>+{
>>>>+ up->kp.addr =3D (kprobe_opcode_t *) ((unsigned long)up->kp.addr +
>>>>+ (unsigned long)(up->offset &
>>~PAGE_MASK));
>>Does it guarantee later register_kprobe could distinguish it's a user
>>probe? Perhaps it's better to assign a fixed address in user space, such
>>like 0.
>>
>>Yes,register_kprobe() distinguishes user space probes. I could not
>>understand what you mean by "assign a fixed address in user space"
map_uprobe_page sets uprobe->kp.addr to address returned by kmap_atomic, then insert_probe_page will add offset to uprobe->kp.addr. Later, register_kprobe uses uprobe->kp.addr to estimate if the address is in kernel text. Is it wrong? As for why you didn't hit it, I think mostly now uprobe->kp.addr is pointing to the kernel data area, but it doesn't point to user space. Right?
A fix address means a value out of kernel space. Such like a value under PAGE_OFFSET on i386.


Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Zhang, Yanmin
>>Sent: 2006年1月6日 10:52
>>To: [hidden email]
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: RE: Review patches of user space kprobe
>>
>>>>-----Original Message-----
>>>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Prasanna S Panchamukhi
>>>>Sent: 2006年1月5日 19:14
>>>>To: Zhang, Yanmin
>>>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>>>Subject: Re: Review patches of user space kprobe
>>>>
>>>>> >>+ */
>>>>> >>+static struct kprobe *get_uprobe_at(struct inode *inode, unsigned
>>>>> long offset)
>>>>> >>+{
>>>>> >>+ struct hlist_head *head;
>>>>> >>+ struct hlist_node *node;
>>>>> >>+ struct kprobe *p;
>>>>> >>+
>>>>> >>+ head = &kprobe_table[hash_long((unsigned long)inode * offset,
>>>>> >>+       KPROBE_HASH_BITS)];
>>>>> >>+ hlist_for_each_entry(p, node, head, hlist) {
>>>>> >>+ if (p->pre_handler == aggr_pre_handler)
>>>>> >>+ return p;
>>>>> >>+ else {
>>>>> >>+ struct uprobe *user = container_of(p,
>>>>> >>+ struct uprobe,
>>>>> kp);
>>>>> Kprobe and uprobe share the same hash table. Does p here always point to
>>>>> uprobe?
>>>>
>>>>Check can be made before accessig uprobe.
>>>>if (!kernel_text_address((unsigned long)p->addr))
>>Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called
>>incorrectly. The logic is *not clear*.
I might misunderstand your reply. If put the check before using container_of in function get_uprobe_at, the issue could be resolved.

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:[hidden email]]
>>Sent: 2006年1月5日 18:33
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>
>>>General question:
>>>1) How to insert an uprobe at anonymous page (VMA)? I think there are 2 =
>>>cases related to the question.
>>> a) Many applications execute codes produced themselves, such like JIT =
>>>(Just-In-Time) of JVM.
>>
>>At present we do not support it, need to look into such a case.
>>
>>> b) Some executables include TEXTREL section. When they are loaded into =
>>>memory and linked dynamically, the text section might be changed, and =
>>>kernel will do a Copy-On-Write to create a new anonymous page and map =
>>>the new page to the process address space. So after the process starts, =
>>>we couldn't insert uprobe on its copied pages.
>>>Should a new interface be added to support it? The parameters could be =
>>>process id and offset in the process address space. Of course, it could =
>>>be an enhancement and implemented later.
>>
>>User space probes support insertion of probes on dynamically linked libraries
>>and even probes can be inserted on the text pages that are not even loaded
>>into the memory.
It doesn't resolve case b).


>>
>>>3) Can function register_userspace_probe do not call register_kprobe? I =
>>>think it's not necessary. It's just my feeling. It's up to you to make =
>>>decision. :)
>>
>>register_kprobe already does most of what userspace probe registeration needs.
Function register_kprobe is not big. Current register_userspace_probe calls register_kprobe, then register_kprobe calls back to uprobe-specific functions. It looks confusing. Why not to just bypass register_kprobe?


>>
>>>2) Function get_inode_ops should take care of errors and its caller, =
>>>register_userspace_probe should check if the return value of =
>>>get_inode_ops is IS_ERR. If so, the error code should be returned =
>>>instead of a hard-coded -ENOSYS.
>>
>>Next patch release will take care of these things.
>>
>>>>>>+ spin_unlock(&mapping->i_mmap_lock);
>>>>>>+ return vma;
>>>It's not safe to return vma without lock. There is a race condition. If =
>>>vma is released by another thread, kernel might be crazy when this =
>>>thread tries to access it later.
>>
>>>If the page is mapped to many vma in different processes, function =
>>>find_get_vma just returns one vma. It's not enough.
>>>I'd like to suggest to do the flush_icache in the vma_prio_tree_foreach =
>>>loop.
>>
>>could you please elaborate this.
Under smp environment, if a page (inode,offset) is mapped to the address spaces of many processes, then when a uprobe is registered on the page, uprobe->vma is just point to one of them. Then, insert_kprobe_user=>arch_arm_uprobe=>flush_icache_user_range, only this vma (or address space) is flushed.


Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:[hidden email]]
>>Sent: 2006年1月5日 19:10
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>>
>>>
>>> >>+ kprobe_page_mapped = 1;
>>> >>+ retval = insert_probe_page(up);
>>> >>+ }
>>> >>+ }
>>> >>+ }
>>> >>+ if (kprobe_page_mapped) {
>>> The logic here is incorrect. If there are many uprobes at the same page,
>>> up just points to the last one. How about others?
>>>
>>
>>readpage() routine reads one page at a time. we map the page one time and walk the
>>probes list for this inode, insert all the probes within this page and then unmap it.
Yes, I did see codes inserting all the probes on the page, but below codes:
+ if (kprobe_page_mapped) {
+ unmap_uprobe_page(up);
+ unlock_page(up->page);
+ }

are only executed after the loop of hlist_for_each_entry. Is it correct?


Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>> >>+ struct uprobe,
> >>> kp);
> >>> Kprobe and uprobe share the same hash table. Does p here always point to
> >>> uprobe?
> >>
> >>Check can be made before accessig uprobe.
> >>if (!kernel_text_address((unsigned long)p->addr))
> Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called incorrectly. The logic is *not clear*.

This was all done to handle aggrigate uprobes. get_uprobe() is called after
checking if the address is not kernel_text_address(). get_uprobe_at() returns
kprobe structures from the kprobe hash list. If the kprobe structure returned
is a aggrigate kprobe structure, the aggrigate hash list is walked to get the
individual kprobe structure that is included in the uprobe structure. The page
containing the probe address is located and locked in the memory, so that the
breakpoint instruction can be replaced with original instruction, if the single
stepping out-of-line could not be achieved. But as of now we are not handling
the case when single stepping out-of-line fails, which need to be handled.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>>uprobes in uprobe_module. Uprobes just need keep its own specific info.
> >>
> >>inode and offset combination uniquely identifies individual user space
> >>probe,
> What does "individual user space probe" mean here?
>

Here individual user space probe mean, the place where the breakpoint
instruction is inserted.

>Any uprobe will be shared by all processes who map the same page. Right?

that's right.

>
>  hence when a probe is hit, combination inode and offset is used to
> >>identify each probe. Keeping inode pointer in struct uprobe clearly
> >>reflects this concept.
> What's the relationship between uprobe and uprobe_module? All uprobes linked by uprobe_module->ulist_head have the same inode, right? If it's correct, why not to just include inode in uprobe_module?

This can be done, but in such a case for every probe hit, the uprobe_module list much be traversed just to get the inode.

> >>>> struct vm_area_struct *vma;
> >>>Item page and vma are used temporarily. They are not used when the probe
> >>>is hit. To shrink struct uprobe, they could be deleted. A new super
> >>>struct could be defined to include struct uprobe, or always to search
> >>>them when they are needed. Application might be inserted huge uprobes.
> >>>It's better to keep the struct smaller.
> >>
> >>If vma's are temporarily defined, then readpage hooks should walk
> >>through the vma list to get the vma. To reduce the time spent in
> >>readpage/readpages hooks in fastpath, we store it while registeration.
> Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?

Do you mean the same page is mapped in different vma?
As of now we are not supporting such cases.

> >>understand what you mean by "assign a fixed address in user space"
> map_uprobe_page sets uprobe->kp.addr to address returned by kmap_atomic, then insert_probe_page will add offset to uprobe->kp.addr. Later, register_kprobe uses uprobe->kp.addr to estimate if the address is in kernel text. Is it wrong? As for why you didn't hit it, I think mostly now uprobe->kp.addr is pointing to the kernel data area, but it doesn't point to user space. Right?

This is done to make use of register_kprobe(), the address returned by
kmap_atomic is passed to register_kprobe() and even though the kernel data
address is stored at kp.addr, that is enough to distinguish between
kernel probes.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>User space probes support insertion of probes on dynamically linked libraries
> >>and even probes can be inserted on the text pages that are not even loaded
> >>into the memory.
> It doesn't resolve case b).
>

Need to look how to support such cases.

>
> >>
> >>>3) Can function register_userspace_probe do not call register_kprobe? I =
> >>>think it's not necessary. It's just my feeling. It's up to you to make =
> >>>decision. :)
> >>
> >>register_kprobe already does most of what userspace probe registeration needs.
> Function register_kprobe is not big. Current register_userspace_probe calls register_kprobe, then register_kprobe calls back to uprobe-specific functions. It looks confusing. Why not to just bypass register_kprobe?
>

Let me check how the code looks if we bypass register_kprobe().

> >>
> >>could you please elaborate this.
> Under smp environment, if a page (inode,offset) is mapped to the address spaces of many processes, then when a uprobe is registered on the page, uprobe->vma is just point to one of them. Then, insert_kprobe_user=>arch_arm_uprobe=>flush_icache_user_range, only this vma (or address space) is flushed.
>
At present only the vma containing the page is flused, if the pages are present
on other vmas, even the probes should be inserted on such pages and those vma
need to be flused.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: Prasanna S Panchamukhi [mailto:[hidden email]]
>>Sent: 2006年1月6日 16:57
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>>uprobes in uprobe_module. Uprobes just need keep its own specific info.
>>> >>
>>> >>inode and offset combination uniquely identifies individual user space
>>> >>probe,
>>> What does "individual user space probe" mean here?
>>>
>>
>>Here individual user space probe mean, the place where the breakpoint
>>instruction is inserted.
>>
>>>Any uprobe will be shared by all processes who map the same page. Right?
>>
>>that's right.
>>
>>>
>>>  hence when a probe is hit, combination inode and offset is used to
>>> >>identify each probe. Keeping inode pointer in struct uprobe clearly
>>> >>reflects this concept.
>>> What's the relationship between uprobe and uprobe_module? All uprobes linked by uprobe_module->ulist_head have the same inode, right?
>>If it's correct, why not to just include inode in uprobe_module?
>>
>>This can be done, but in such a case for every probe hit, the uprobe_module list much be traversed just to get the inode.
>>
>>> >>>> struct vm_area_struct *vma;
>>> >>>Item page and vma are used temporarily. They are not used when the probe
>>> >>>is hit. To shrink struct uprobe, they could be deleted. A new super
>>> >>>struct could be defined to include struct uprobe, or always to search
>>> >>>them when they are needed. Application might be inserted huge uprobes.
>>> >>>It's better to keep the struct smaller.
>>> >>
>>> >>If vma's are temporarily defined, then readpage hooks should walk
>>> >>through the vma list to get the vma. To reduce the time spent in
>>> >>readpage/readpages hooks in fastpath, we store it while registeration.
>>> Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?
>>
>>Do you mean the same page is mapped in different vma?
Map the page in different vma of different processes (address spaces). I think it's common. For example, libc.so is mapped to many processes. Every process has its own vma to map the page.

>>As of now we are not supporting such cases.
>>
>>> >>understand what you mean by "assign a fixed address in user space"
>>> map_uprobe_page sets uprobe->kp.addr to address returned by kmap_atomic, then insert_probe_page will add offset to uprobe->kp.addr.
>>Later, register_kprobe uses uprobe->kp.addr to estimate if the address is in kernel text. Is it wrong? As for why you didn't hit it,
>>I think mostly now uprobe->kp.addr is pointing to the kernel data area, but it doesn't point to user space. Right?
>>
>>This is done to make use of register_kprobe(), the address returned by
>>kmap_atomic is passed to register_kprobe() and even though the kernel data
>>address is stored at kp.addr, that is enough to distinguish between
>>kernel probes.
Is it true on all arch? I don't think it's safe to do so. Actually, register_kprobe just needs know it's a uprobe.

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>readpage() routine reads one page at a time. we map the page one time and walk the
> >>probes list for this inode, insert all the probes within this page and then unmap it.
> Yes, I did see codes inserting all the probes on the page, but below codes:
> + if (kprobe_page_mapped) {
> + unmap_uprobe_page(up);
> + unlock_page(up->page);
> + }
>
> are only executed after the loop of hlist_for_each_entry. Is it correct?
>

that's correct. The page is mapped only once for the first match in the loop,
then all the probes are inserted into that page in the hlist_for_each loop and
then the page is unmapped only once after the end of the loop.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月6日 17:12
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>readpage() routine reads one page at a time. we map the page one time and walk the
>>> >>probes list for this inode, insert all the probes within this page and then unmap it.
>>> Yes, I did see codes inserting all the probes on the page, but below codes:
>>> + if (kprobe_page_mapped) {
>>> + unmap_uprobe_page(up);
>>> + unlock_page(up->page);
>>> + }
>>>
>>> are only executed after the loop of hlist_for_each_entry. Is it correct?
>>>
>>
>>that's correct. The page is mapped only once for the first match in the loop,
>>then all the probes are inserted into that page in the hlist_for_each loop and
>>then the page is unmapped only once after the end of the loop.
So call lock_page for many times, and call unlock_page for one time?


Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
> >>> are only executed after the loop of hlist_for_each_entry. Is it correct?
> >>>
> >>
> >>that's correct. The page is mapped only once for the first match in the loop,
> >>then all the probes are inserted into that page in the hlist_for_each loop and
> >>then the page is unmapped only once after the end of the loop.
> So call lock_page for many times, and call unlock_page for one time?
>
no, lock_page and unlock_page is called only once, there is a small correction,
insert_probe_page() should be moved out of if condition so that all probes
are inserted on that page.

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>> >>If vma's are temporarily defined, then readpage hooks should walk
> >>> >>through the vma list to get the vma. To reduce the time spent in
> >>> >>readpage/readpages hooks in fastpath, we store it while registeration.
> >>> Incorrect. This vma only represents one process who maps the page. How about other processes who map the same page?
> >>
> >>Do you mean the same page is mapped in different vma?
> Map the page in different vma of different processes (address spaces). I think it's common. For example, libc.so is mapped to many processes. Every process has its own vma to map the page.
>

I think the dll's share the same vma's.
below is the output of different process libc mmaps.
./1/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./1/maps:0082f000-00832000 rw-p 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./1996/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./1996/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2000/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2000/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2011/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2011/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./.2014/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./.2014/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2031/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2031/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2032/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2032/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2033/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2033/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2034/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2034/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2035/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so
./2035/maps:0082f000-00832000 rwxp 00132000 03:02 395614     /lib/tls/libc-2.3.2.so
./2397/maps:006fd000-0082f000 r-xp 00000000 03:02 395614     /lib/tls/libc-2.3.2.so

> >>
> >>This is done to make use of register_kprobe(), the address returned by
> >>kmap_atomic is passed to register_kprobe() and even though the kernel data
> >>address is stored at kp.addr, that is enough to distinguish between
> >>kernel probes.
> Is it true on all arch? I don't think it's safe to do so. Actually, register_kprobe just needs know it's a uprobe.
yes, otherwise need to find out a way to pass the mapped address to register_kprobe(). All this can be avoided
if register_kprobe() is bypassed :).

Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Roland McGrath
> I think the dll's share the same vma's.

You cannot rely on mapping addresses being the same, they are not always
so.  There are always separate vm_area_struct's in separate processes even
when they appear to be identical.


Reply | Threaded
Open this post in threaded view
|

Re: Review patches of user space kprobe

Prasanna S Panchamukhi
In reply to this post by Zhang, Yanmin
> >>>>> >>+ head = &kprobe_table[hash_long((unsigned long)inode * offset,
> >>>>> >>+       KPROBE_HASH_BITS)];
> >>>>> >>+ hlist_for_each_entry(p, node, head, hlist) {
> >>>>> >>+ if (p->pre_handler == aggr_pre_handler)
> >>>>> >>+ return p;
> >>>>> >>+ else {
> >>>>> >>+ struct uprobe *user = container_of(p,
> >>>>> >>+ struct uprobe,
> >>>>> kp);
> >>>>> Kprobe and uprobe share the same hash table. Does p here always point to
> >>>>> uprobe?
> >>>>
> >>>>Check can be made before accessig uprobe.
> >>>>if (!kernel_text_address((unsigned long)p->addr))
> >>Incorrect. get_uprobe, the caller of get_uprobe_at, might be crazy. current_uprobe might be set as up and get_user_page(up) is called
> >>incorrectly. The logic is *not clear*.
> I might misunderstand your reply. If put the check before using container_of in function get_uprobe_at, the issue could be resolved.

even I meant the same thing of adding a check before we access uprobe structure using container_of.


Thanks
Prasanna
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [hidden email]
Ph: 91-80-25044636

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月6日 17:32
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> >>> are only executed after the loop of hlist_for_each_entry. Is it correct?
>>> >>>
>>> >>
>>> >>that's correct. The page is mapped only once for the first match in the loop,
>>> >>then all the probes are inserted into that page in the hlist_for_each loop and
>>> >>then the page is unmapped only once after the end of the loop.
>>> So call lock_page for many times, and call unlock_page for one time?
>>>
>>no, lock_page and unlock_page is called only once, there is a small correction,
>>insert_probe_page() should be moved out of if condition so that all probes
>>are inserted on that page.
I'm still confused. Anyway, I will wait for your new codes. In addition, I still think the second parameter of map_uprobe_page is not good and could be deleted. The logic related to the second parameter is very simple and could be moved to the caller of map_uprobe_page.

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Roland McGrath
>>Sent: 2006年1月6日 18:31
>>To: [hidden email]
>>Cc: Zhang, Yanmin; [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>
>>> I think the dll's share the same vma's.
>>
>>You cannot rely on mapping addresses being the same, they are not always
>>so.  There are always separate vm_area_struct's in separate processes even
>>when they appear to be identical.
I agree with Roland. Although sometimes they appear to be identical, the vma themselves are different. And a vma might be released/reallocated before readpage/readpages are called. It's easy to introduce race condition when storing temp member in uprobe.

Reply | Threaded
Open this post in threaded view
|

RE: Review patches of user space kprobe

Zhang, Yanmin
In reply to this post by Zhang, Yanmin
>>-----Original Message-----
>>From: [hidden email] [mailto:[hidden email]] On Behalf Of Prasanna S Panchamukhi
>>Sent: 2006年1月6日 18:26
>>To: Zhang, Yanmin
>>Cc: [hidden email]; Keshavamurthy, Anil S; Mao, Bibo
>>Subject: Re: Review patches of user space kprobe
>>> >>This is done to make use of register_kprobe(), the address returned by
>>> >>kmap_atomic is passed to register_kprobe() and even though the kernel data
>>> >>address is stored at kp.addr, that is enough to distinguish between
>>> >>kernel probes.
>>> Is it true on all arch? I don't think it's safe to do so. Actually, register_kprobe just needs know it's a uprobe.
>>yes, otherwise need to find out a way to pass the mapped address to register_kprobe(). All this can be avoided
>>if register_kprobe() is bypassed :).
It's a good solution if register_kprobe is bypassed.