GDB "asserts" after "next"ing through a remote stub

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

GDB "asserts" after "next"ing through a remote stub

Shahab Vahedi-2
I am having difficulties in using GDB [1] that is connected to a  remote
stub [2].  An older version of GDB (v8.0)  runs  smoothly  against  this
stub. The problem manifests itself as:
,--------------------------------------------------------------------.
| gdb> target remote :1234    # connects OK and gets the PC address  |
| gdb> next                   # it will go berserk                   |
|                                                                    |
| /gdb/inferior.c:279: internal-error: inferior*                     |
| find_inferior_pid(process_stratum_target*, int): Assertion         |
| 'pid != 0' failed.                                                 |
| A problem internal to GDB has been detected,                       |
| further debugging may prove unreliable.                            |
`--------------------------------------------------------------------'
I have added the logging  for  both  working  [3]  and  non-working  [4]
versions at the end of this e-mail.

The problem in current GDB [1]  seems  to  be  about  "do_target_wait()"
clearing a global variable "inferior_ptid" and then later  looking  into
one of its field (m_ptid) without setting it to anything:
,--------------------------------------------------------------------.
| infrun.c:do_target_wait()                                          |
| {                                                                  |
|    ...                                                             |
|    // the introduced lambda that wasn't in GDB 8.0                 |
|    auto do_wait = [&] (inferior *inf)                              |
|    {                                                               |
|      // this will set "inferior_ptid" to "null_ptid"               |
|      switch_to_inferior_no_thread (inf);                           |
|                                                                    |
|      // eventually, will cause the assert to be triggered          |
|      ... do_target_wait_1 (inf, wait_ptid, &ecs->ws, options);     |
|    };                                                              |
|    ...                                                             |
|    // the lambda in motion                                         |
|    int inf_num = selected->num;                                    |
|    for (inferior *inf = selected; inf != NULL; inf = inf->next)    |
|      if (inferior_matches (inf))                                   |
|        if (do_wait (inf))                   // BOOM!               |
|    ...                                                             |
| }                                                                  |
`--------------------------------------------------------------------'

From a call stack point of view, deep down in "do_target_wait_1()":
,--------------------------------------------------------------------.
| do_target_wait_1()                                                 |
|   ...                                                              |
|     remote_target::wait_as()     // process "S05" packet           |
|       process_stop_reply()       // sets "ptid" to "inferior_ptid" |
|         remote_notice_new_inferior()                               |
|           find_thread_ptid()     // ASSERTS                        |
`--------------------------------------------------------------------'

To put it all together:

1. do_wait lambda invokes   switch_to_inferior_no_thread()  that    sets
   "inferior_ptid" to "null_ptid".
2. do_wait invokes do_target_wait_1()
3. process_stop_reply() inside do_target_wait_1()  will  set  "ptid"  to
   "inferior_ptid" which holds no value.
4. find_thread_ptid() down the road looks into this "ptid" and asserts.

I do not know what is causing this. Should the remote stub send a packet
as such to have "inferior_ptid" set agian? Or is this a bug in GDB?

I appreciate any thought on this.  For the record, this issue along the
reproducible steps are also filed on SNPS's github [5].


Cheers,
--
Shahab


[1]
The GDB I use is the one from master branch with its HEAD at c7adb09f35a
(Fix typo in gdb/testsuite/ChangeLog).

[2]
This stub is NOT gdbserver, but a proprietary simulator for ARC.

[3] working log ("set debug remote 1" and "set debug infrun 1")
infrun: clear_proceed_status_thread (Remote target)
infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Remote target] at 0x100
Sending packet: $vCont?#49...Packet received:
Packet vCont (verbose-resume) is NOT supported
Sending packet: $Hc0#db...Packet received: OK
Sending packet: $s#73...infrun: infrun_async(1)
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   -1.0.0 [Thread 0],
infrun:   status->kind = ignore
infrun: TARGET_WAITKIND_IGNORE
infrun: prepare_to_wait
Packet received: S05
infrun: target_wait (-1.0.0, status) =
infrun:   42000.0.0 [Remote target],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
Sending packet: $g#67...Packet received: 00000080000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000f8ffff03000000000000000000000000000000000801000008010000000000000000000001000000500100000000400000000000000000000000000000000000xxxxxxxx0000000000000000020000000000000010000000020100000602020003000000030000000200000003030000055b88010274c4220120000e0000000000000000000000000000000000000000
infrun: stop_pc = 0x108
Sending packet: $m100,40#8e...Packet received: 0a20800f00800000001e417000f00800f907cfff0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
infrun: stepped to a different line
infrun: stop_waiting
Sending packet: $qL1200000000000000000#50...Packet received:
infrun: infrun_async(0)

[4] non-working log ("set debug remote 1" and "set debug infrun 1")
infrun: clear_proceed_status_thread (Remote target)
infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Remote target] at 0x100
Sending packet: $vCont?#49...Packet received:
Packet vCont (verbose-resume) is NOT supported
Sending packet: $Hc0#db...Packet received: OK
Sending packet: $s#73...infrun: infrun_async(1)
infrun: prepare_to_wait
Packet received: S05
infrun: infrun_async(0)
infrun: infrun_async(1)
This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>.
infrun: infrun_async(0)
infrun: infrun_async(1)

[5]
https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/239

Reply | Threaded
Open this post in threaded view
|

Re: GDB "asserts" after "next"ing through a remote stub

Simon Marchi-4
On 2020-03-04 4:15 p.m., Shahab Vahedi wrote:
> I am having difficulties in using GDB [1] that is connected to a  remote
> stub [2].  An older version of GDB (v8.0)  runs  smoothly  against  this
> stub. The problem manifests itself as:

Hi Shahab,

I don't have much time to look into this, but it sounds related to this patch
series here:

  https://sourceware.org/legacy-ml/gdb-patches/2020-03/msg00015.html

Simon

Reply | Threaded
Open this post in threaded view
|

Re: GDB "asserts" after "next"ing through a remote stub

Shahab Vahedi-2
On 3/9/20 10:10 PM, Simon Marchi wrote:
> Hi Shahab,
>
> I don't have much time to look into this, but it sounds related to
> this patch series here:
>
>   https://sourceware.org/legacy-ml/gdb-patches/2020-03/msg00015.html 
>
> Simon
>

Hi Simon,

Indeed, that patch [1] resolves the issue. Thank you for bringing it
into my attention.

--
Shahab

[1] gdb/remote: Restore support for 'S' stop reply packet
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=24ed6739b69