Is nexti confused by pushq?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Is nexti confused by pushq?

David Griffiths
Hi, when I get to the following instructions:

  0x00007fffe192413e: rex.W pushq 0x28(%rsp)
  0x00007fffe1924143: rex.W popq (%rsp)
  0x00007fffe1924147: callq  0x00007fffe1045de0

and do "nexti" at the first, it doesn't stop at the second but instead acts
as though I'd done "continue". For some reason I can't reproduce with a
little test though.

(gdb 8.1 on Ubuntu 16.04)

BTW I'm doing nexti programmatically and trying to avoid looking at the
next instruction to decide whether to do stepi or nexti.

Cheers,

David

--

David Griffiths, Senior Software Engineer

Undo <https://undo.io> | Resolve even the most challenging software defects
with software flight recorder technology

Software reliability report: optimizing the software supplier and customer
relationship
<https://info.undo.io/software-reliability-report-optimizing-supplier-and-customer-relationship>
dwk
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

dwk
I encounter this frequently, although I don't have a minimal case yet
either. I think it may have something to do with symbol information, as
I've only encountered the case when symbol information is not present (as
in the example you gave). stepi always works but nexti sometimes turns into
a continue, I assumed because it was unable to figure out where the "next"
instruction was somehow in the absence of symbols.

dwk

On Mon, Feb 25, 2019, 10:41 AM David Griffiths <[hidden email]> wrote:

> Hi, when I get to the following instructions:
>
>   0x00007fffe192413e: rex.W pushq 0x28(%rsp)
>   0x00007fffe1924143: rex.W popq (%rsp)
>   0x00007fffe1924147: callq  0x00007fffe1045de0
>
> and do "nexti" at the first, it doesn't stop at the second but instead acts
> as though I'd done "continue". For some reason I can't reproduce with a
> little test though.
>
> (gdb 8.1 on Ubuntu 16.04)
>
> BTW I'm doing nexti programmatically and trying to avoid looking at the
> next instruction to decide whether to do stepi or nexti.
>
> Cheers,
>
> David
>
> --
>
> David Griffiths, Senior Software Engineer
>
> Undo <https://undo.io> | Resolve even the most challenging software
> defects
> with software flight recorder technology
>
> Software reliability report: optimizing the software supplier and customer
> relationship
> <
> https://info.undo.io/software-reliability-report-optimizing-supplier-and-customer-relationship
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Andrew Burgess
* dwk <[hidden email]> [2019-02-25 10:54:19 -0500]:

> I encounter this frequently, although I don't have a minimal case yet
> either. I think it may have something to do with symbol information, as
> I've only encountered the case when symbol information is not present (as
> in the example you gave). stepi always works but nexti sometimes turns into
> a continue, I assumed because it was unable to figure out where the "next"
> instruction was somehow in the absence of symbols.

The problem here is that pushq changes the stack pointer, this is
obviously interacting badly with the unwinder in some cases.

If we consider the difference between 'stepi' and 'nexti' we will see
what is going wrong.

A 'stepi' simply single steps the machine.  There's very little extra
logic, it's just a single step.

A 'nexti' however, steps the next instruction in the current function,
stepping over function calls.  The way this works is that when the
'nexti' is issued GDB caches the current frame-id, that is (roughly)
function entry $pc, and the frame base pointer (related to $sp at
entry to the function).  Once this is cached GDB single steps, and
after each step it checks the current frame-id.  If the frame-id has
changed then GDB believes we have entered a new (nested) function,
places a breakpoint at the return address, and then continues.  Once
we hit the breakpoint we should be back in the original frame and we
have completed the 'nexti'.

Now the problem comes if when we single step over the 'pushq' the
frame-id changes, if this happens GDB gets confused and then
continues.

To check this you should try walking over your problem code using
'stepi', and at each step run the 'bt' command.  What you'll see is
that as you step over the 'pushq' the backtrace will change in
someway, this indicates the change in frame-id that is causing
problems for you.

Of course, this doesn't solve the problem for you, but at least you
know what's going wrong now :)

Thanks,
Andrew





>
> dwk
>
> On Mon, Feb 25, 2019, 10:41 AM David Griffiths <[hidden email]> wrote:
>
> > Hi, when I get to the following instructions:
> >
> >   0x00007fffe192413e: rex.W pushq 0x28(%rsp)
> >   0x00007fffe1924143: rex.W popq (%rsp)
> >   0x00007fffe1924147: callq  0x00007fffe1045de0
> >
> > and do "nexti" at the first, it doesn't stop at the second but instead acts
> > as though I'd done "continue". For some reason I can't reproduce with a
> > little test though.
> >
> > (gdb 8.1 on Ubuntu 16.04)
> >
> > BTW I'm doing nexti programmatically and trying to avoid looking at the
> > next instruction to decide whether to do stepi or nexti.
> >
> > Cheers,
> >
> > David
> >
> > --
> >
> > David Griffiths, Senior Software Engineer
> >
> > Undo <https://undo.io> | Resolve even the most challenging software
> > defects
> > with software flight recorder technology
> >
> > Software reliability report: optimizing the software supplier and customer
> > relationship
> > <
> > https://info.undo.io/software-reliability-report-optimizing-supplier-and-customer-relationship
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Jan Kratochvil-2
On Tue, 26 Feb 2019 08:32:37 +0100, Andrew Burgess wrote:
> Of course, this doesn't solve the problem for you, but at least you
> know what's going wrong now :)

To make it clear the debuggee has wrong/insufficient debug info, its
.eh_frame/.debug_frame there should annotate the push (and pop) instructions.


Jan
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

David Griffiths
Ok, so in my case this is generated code with no debug info (Java JIT
generated) so does that mean I shouldn't attempt to use nexti? (I've got
other issues which probably preclude using nexti anyway but just curious)

Cheers,

David

On Tue, 26 Feb 2019 at 10:12, Jan Kratochvil <[hidden email]>
wrote:

> On Tue, 26 Feb 2019 08:32:37 +0100, Andrew Burgess wrote:
> > Of course, this doesn't solve the problem for you, but at least you
> > know what's going wrong now :)
>
> To make it clear the debuggee has wrong/insufficient debug info, its
> .eh_frame/.debug_frame there should annotate the push (and pop)
> instructions.
>
>
> Jan
>


--

David Griffiths, Senior Software Engineer

Undo <https://undo.io> | Resolve even the most challenging software defects
with software flight recorder technology

Software reliability report: optimizing the software supplier and customer
relationship
<https://info.undo.io/software-reliability-report-optimizing-supplier-and-customer-relationship>
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Jan Kratochvil-2
On Tue, 26 Feb 2019 12:50:37 +0100, David Griffiths wrote:
> Ok, so in my case this is generated code with no debug info (Java JIT
> generated) so does that mean I shouldn't attempt to use nexti? (I've got
> other issues which probably preclude using nexti anyway but just curious)

The proper fix is in OpenJDK so that it produces proper debug info for the
JITted module (but I do not know the details of GDB JIT modules).
Otherwise it is always some sort of workaround.


Jan
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Dmitry Samersoff
In reply to this post by David Griffiths
David,

On 26.02.2019 14:50, David Griffiths wrote:
> Ok, so in my case this is generated code with no debug info (Java JIT
> generated) so does that mean I shouldn't attempt to use nexti? (I've got
> other issues which probably preclude using nexti anyway but just curious)

On my experience with Java JIT (C2) produced code, it's better to avoid
using nexti.

If you do it programmatically, you can try to mimic nexti behavior in
some cases by analyzing instructions ahead and setting breakpoint where
appropriate.

-Dmitry

>
> Cheers,
>
> David
>
> On Tue, 26 Feb 2019 at 10:12, Jan Kratochvil <[hidden email]>
> wrote:
>
>> On Tue, 26 Feb 2019 08:32:37 +0100, Andrew Burgess wrote:
>>> Of course, this doesn't solve the problem for you, but at least you
>>> know what's going wrong now :)
>>
>> To make it clear the debuggee has wrong/insufficient debug info, its
>> .eh_frame/.debug_frame there should annotate the push (and pop)
>> instructions.
>>
>>
>> Jan
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

David Griffiths
Thanks Dmitry, I will avoid nexti. It's pretty weird stepping through JITed
code anyway, sometimes even a breakpoint/continue is not enough because it
dives off into deopt functions and re-emerges in the interpreter!

On Tue, 26 Feb 2019 at 14:19, Dmitry Samersoff <[hidden email]> wrote:

> David,
>
> On 26.02.2019 14:50, David Griffiths wrote:
> > Ok, so in my case this is generated code with no debug info (Java JIT
> > generated) so does that mean I shouldn't attempt to use nexti? (I've got
> > other issues which probably preclude using nexti anyway but just curious)
>
> On my experience with Java JIT (C2) produced code, it's better to avoid
> using nexti.
>
> If you do it programmatically, you can try to mimic nexti behavior in
> some cases by analyzing instructions ahead and setting breakpoint where
> appropriate.
>
> -Dmitry
>
> >
> > Cheers,
> >
> > David
> >
> > On Tue, 26 Feb 2019 at 10:12, Jan Kratochvil <[hidden email]>
> > wrote:
> >
> >> On Tue, 26 Feb 2019 08:32:37 +0100, Andrew Burgess wrote:
> >>> Of course, this doesn't solve the problem for you, but at least you
> >>> know what's going wrong now :)
> >>
> >> To make it clear the debuggee has wrong/insufficient debug info, its
> >> .eh_frame/.debug_frame there should annotate the push (and pop)
> >> instructions.
> >>
> >>
> >> Jan
> >>
> >
> >
>


--

David Griffiths, Senior Software Engineer

Undo <https://undo.io> | Resolve even the most challenging software defects
with software flight recorder technology

Software reliability report: optimizing the software supplier and customer
relationship
<https://info.undo.io/software-reliability-report-optimizing-supplier-and-customer-relationship>
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Tom Tromey-2
In reply to this post by David Griffiths
>>>>> "David" == David Griffiths <[hidden email]> writes:

David> Ok, so in my case this is generated code with no debug info (Java JIT
David> generated) so does that mean I shouldn't attempt to use nexti? (I've got
David> other issues which probably preclude using nexti anyway but just curious)

There are a few options to deal with this sort of problem.

As Jan said, the JIT could generate debug info using one of the
gdb-provided JIT interfaces.  That's kind of heavyweight but gives a lot
of control.

Another option is to write an unwinder in Python.  The crucial thing
here is to ensure that the frame ID is constant for the duration of a
frame.  In DWARF this is done by using the CFA as part of the identity;
for the JIT you'd want to do something similar.  I thought there was
already such an unwinder for OpenJDK at least... ?

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Dmitry Samersoff
Tom,

> for the JIT you'd want to do something similar.  I thought there was
> already such an unwinder for OpenJDK at least... ?

Yes, JDK has different kind of unwinders but unfortunately porting it to
python is problematic.

It reminds me old discussion about a native plugin interface for gdb

-Dmitry

On 26.02.2019 22:05, Tom Tromey wrote:

>>>>>> "David" == David Griffiths <[hidden email]> writes:
>
> David> Ok, so in my case this is generated code with no debug info (Java JIT
> David> generated) so does that mean I shouldn't attempt to use nexti? (I've got
> David> other issues which probably preclude using nexti anyway but just curious)
>
> There are a few options to deal with this sort of problem.
>
> As Jan said, the JIT could generate debug info using one of the
> gdb-provided JIT interfaces.  That's kind of heavyweight but gives a lot
> of control.
>
> Another option is to write an unwinder in Python.  The crucial thing
> here is to ensure that the frame ID is constant for the duration of a
> frame.  In DWARF this is done by using the CFA as part of the identity;
> for the JIT you'd want to do something similar.  I thought there was
> already such an unwinder for OpenJDK at least... ?
>
> Tom
>
Reply | Threaded
Open this post in threaded view
|

Re: Is nexti confused by pushq?

Tom Tromey-2
>> for the JIT you'd want to do something similar.  I thought there was
>> already such an unwinder for OpenJDK at least... ?

Dmitry> Yes, JDK has different kind of unwinders but unfortunately porting it to
Dmitry> python is problematic.

Dmitry> It reminds me old discussion about a native plugin interface for gdb

I was referring to this:

http://mail.openjdk.java.net/pipermail/jdk9-dev/2016-May/004379.html

Tom