Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
Hi,

I'm running into the issue below. Any suggestion how to this?

# DISPLAY=:1 gdb /opt/firefox/bin/firefox
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.11".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/firefox/bin/firefox...
(gdb) run -P
Starting program: /opt/firefox/bin/firefox -P
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]
[New LWP    6        ]
[New LWP    7        ]
[New LWP    8        ]
[New LWP    9        ]
[New LWP    10        ]
[New LWP    11        ]
[New LWP    12        ]
[New LWP    13        ]
[New LWP    14        ]
[New LWP    15        ]
[New LWP    16        ]
[New LWP    17        ]
[New LWP    18        ]
[New LWP    19        ]
[New LWP    20        ]
[New LWP    21        ]
[New LWP    22        ]
[New LWP    23        ]
[New LWP    24        ]
[New LWP    25        ]
[New LWP    26        ]
[LWP    20         exited]
[New LWP    20        ]
[LWP    21         exited]
[New LWP    21        ]
procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
(gdb)

---

Is this Solaris GDB issue? Any suggestion where to look in GDB code?

Thanks!

Petr
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Rainer Orth-2
Hi Petr,

> I'm running into the issue below. Any suggestion how to this?
>
> # DISPLAY=:1 gdb /opt/firefox/bin/firefox
> GNU gdb (GDB) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.11".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /opt/firefox/bin/firefox...
> (gdb) run -P
> Starting program: /opt/firefox/bin/firefox -P
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
[...]

> [New LWP    26        ]
> [LWP    20         exited]
> [New LWP    20        ]
> [LWP    21         exited]
> [New LWP    21        ]
> procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
> procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
> (gdb)
>
> ---
>
> Is this Solaris GDB issue? Any suggestion where to look in GDB code?

I'm seeing this relatively often when running the gdb testsuite (which
makes it unsuitable to run make check on the Solaris gdb buildbots).

I haven't yet gotten around to investigate closely, but the first places
to check are procfs.c (the process layer, via /proc) and sol-thread.c
(the thread layer, via libc_db).

There's lots of old cruft in there from pre-Solaris 9 times with its NxM
thread model, which both breaks a considerable number of test cases and
makes the code harder to follow due to the added complexity/generality
we don't need any longer.

        Rainer

--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
In reply to this post by Sourceware - gdb list mailing list
The issue seems to be that the LWP exits and the status->kind is set to
TARGET_WAITKIND_SPURIOUS:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214

But instantly it's added into the list again here:

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200

But there is no longer such LWP in /proc.

Any suggestion?

Thanks,

Petr

On 28.05.2020 17:29, Petr Sumbera via Gdb wrote:

> Hi,
>
> I'm running into the issue below. Any suggestion how to this?
>
> # DISPLAY=:1 gdb /opt/firefox/bin/firefox
> GNU gdb (GDB) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.11".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>      <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /opt/firefox/bin/firefox...
> (gdb) run -P
> Starting program: /opt/firefox/bin/firefox -P
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> [New LWP    3        ]
> [New LWP    4        ]
> [New LWP    5        ]
> [New LWP    6        ]
> [New LWP    7        ]
> [New LWP    8        ]
> [New LWP    9        ]
> [New LWP    10        ]
> [New LWP    11        ]
> [New LWP    12        ]
> [New LWP    13        ]
> [New LWP    14        ]
> [New LWP    15        ]
> [New LWP    16        ]
> [New LWP    17        ]
> [New LWP    18        ]
> [New LWP    19        ]
> [New LWP    20        ]
> [New LWP    21        ]
> [New LWP    22        ]
> [New LWP    23        ]
> [New LWP    24        ]
> [New LWP    25        ]
> [New LWP    26        ]
> [LWP    20         exited]
> [New LWP    20        ]
> [LWP    21         exited]
> [New LWP    21        ]
> procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
> procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
> (gdb)
>
> ---
>
> Is this Solaris GDB issue? Any suggestion where to look in GDB code?
>
> Thanks!
>
> Petr
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
In reply to this post by Rainer Orth-2
Hi Rainer,

please see my other mail. But it seems to me that handling LWP exit
might be broken on Solaris. Though I think I'm really lost and don't
know how to continue now.

Thanks!

Petr

On 28.05.2020 18:01, Rainer Orth wrote:

> Hi Petr,
>
>> I'm running into the issue below. Any suggestion how to this?
>>
>> # DISPLAY=:1 gdb /opt/firefox/bin/firefox
>> GNU gdb (GDB) 9.2
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.
>> Type "show copying" and "show warranty" for details.
>> This GDB was configured as "sparc-sun-solaris2.11".
>> Type "show configuration" for configuration details.
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>.
>> Find the GDB manual and other documentation resources online at:
>>      <http://www.gnu.org/software/gdb/documentation/>.
>>
>> For help, type "help".
>> Type "apropos word" to search for commands related to "word"...
>> Reading symbols from /opt/firefox/bin/firefox...
>> (gdb) run -P
>> Starting program: /opt/firefox/bin/firefox -P
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP    2        ]
> [...]
>> [New LWP    26        ]
>> [LWP    20         exited]
>> [New LWP    20        ]
>> [LWP    21         exited]
>> [New LWP    21        ]
>> procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
>> procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list.
>> (gdb)
>>
>> ---
>>
>> Is this Solaris GDB issue? Any suggestion where to look in GDB code?
>
> I'm seeing this relatively often when running the gdb testsuite (which
> makes it unsuitable to run make check on the Solaris gdb buildbots).
>
> I haven't yet gotten around to investigate closely, but the first places
> to check are procfs.c (the process layer, via /proc) and sol-thread.c
> (the thread layer, via libc_db).
>
> There's lots of old cruft in there from pre-Solaris 9 times with its NxM
> thread model, which both breaks a considerable number of test cases and
> makes the code harder to follow due to the added complexity/generality
> we don't need any longer.
>
> Rainer
>
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
In reply to this post by Sourceware - gdb list mailing list
On 6/1/20 12:39 PM, Petr Sumbera via Gdb wrote:

> The issue seems to be that the LWP exits and the status->kind is set to TARGET_WAITKIND_SPURIOUS:
>
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214
>
> But instantly it's added into the list again here:
>
> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200
>
> But there is no longer such LWP in /proc.
>
> Any suggestion?

Either:

- replace TARGET_WAITKIND_SPURIOUS with TARGET_WAITKIND_THREAD_EXITED, or,

- replace
    status->kind = TARGET_WAITKIND_SPURIOUS;
    return retval;
  with
    goto wait_again;
  instead.

Thanks,
Pedro Alves

Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 01.06.2020 21:12, Pedro Alves wrote:

> On 6/1/20 12:39 PM, Petr Sumbera via Gdb wrote:
>> The issue seems to be that the LWP exits and the status->kind is set to TARGET_WAITKIND_SPURIOUS:
>>
>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214
>>
>> But instantly it's added into the list again here:
>>
>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200
>>
>> But there is no longer such LWP in /proc.
>>
>> Any suggestion?

Thanks for looking at it!

> Either:
>
> - replace TARGET_WAITKIND_SPURIOUS with TARGET_WAITKIND_THREAD_EXITED, or,

With this I'm getting:

[LWP    21         exited]
[LWP    21         exited]
/builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459:
internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr
!= nullptr' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

> - replace
>      status->kind = TARGET_WAITKIND_SPURIOUS;
>      return retval;
>    with
>      goto wait_again;
>    instead.

and with this:

[LWP    20         exited]
[LWP    20         exited]
/builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459:
internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr
!= nullptr' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

--

Note that in both cases there are TWO exits for one LWP. But LWP numbers
differ.

Any other comment?

Thanks!

Petr
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 6/2/20 8:32 AM, Petr Sumbera via Gdb wrote:

> On 01.06.2020 21:12, Pedro Alves wrote:
>> On 6/1/20 12:39 PM, Petr Sumbera via Gdb wrote:
>>> The issue seems to be that the LWP exits and the status->kind is set to TARGET_WAITKIND_SPURIOUS:
>>>
>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214
>>>
>>> But instantly it's added into the list again here:
>>>
>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200
>>>
>>> But there is no longer such LWP in /proc.
>>>
>>> Any suggestion?
>
> Thanks for looking at it!
>
>> Either:
>>
>> - replace TARGET_WAITKIND_SPURIOUS with TARGET_WAITKIND_THREAD_EXITED, or,
>
> With this I'm getting:
>
> [LWP    21         exited]
> [LWP    21         exited]
> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
>
>> - replace
>>      status->kind = TARGET_WAITKIND_SPURIOUS;
>>      return retval;
>>    with
>>      goto wait_again;
>>    instead.
>
> and with this:
>
> [LWP    20         exited]
> [LWP    20         exited]
> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
>
> --
>
> Note that in both cases there are TWO exits for one LWP. But LWP numbers differ.

You mean, it was 21 in one run, and 20 in another run?
Those were two different runs, and some timing difference
probably tweaked the order of which thread exits first or
something.  Doesn't seem unusual.

Sounds like the patch below would fix it.  

But why do we get two exits in a row for each LWP?  Oh, I guess
once for PR_SYSENTRY of the exit syscall, and another time for
PR_SYSEXIT.

From 0be6c82e754dd676e9f1259ab0f9a7849d985ffd Mon Sep 17 00:00:00 2001
From: Pedro Alves <[hidden email]>
Date: Tue, 2 Jun 2020 15:44:54 +0100
Subject: [PATCH] fix-solaris

---
 gdb/procfs.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gdb/procfs.c b/gdb/procfs.c
index f6c6b0e71c1..e2042f3edc4 100644
--- a/gdb/procfs.c
+++ b/gdb/procfs.c
@@ -2331,9 +2331,10 @@ procfs_target::wait (ptid_t ptid, struct target_waitstatus *status,
     if (print_thread_events)
       printf_unfiltered (_("[%s exited]\n"),
  target_pid_to_str (retval).c_str ());
-    delete_thread (find_thread_ptid (this, retval));
-    status->kind = TARGET_WAITKIND_SPURIOUS;
-    return retval;
+    thread_info *thr = find_thread_ptid (this, retval);
+    if (thr != nullptr)
+      delete_thread (thr);
+    goto wait_again;
   }
  else if (0)
   {

base-commit: f6eee2d098049afd18f90b8f4bb6a5d1a49d900c
--
2.14.5

Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 02.06.2020 16:53, Pedro Alves wrote:

> On 6/2/20 8:32 AM, Petr Sumbera via Gdb wrote:
>> On 01.06.2020 21:12, Pedro Alves wrote:
>>> On 6/1/20 12:39 PM, Petr Sumbera via Gdb wrote:
>>>> The issue seems to be that the LWP exits and the status->kind is set to TARGET_WAITKIND_SPURIOUS:
>>>>
>>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214
>>>>
>>>> But instantly it's added into the list again here:
>>>>
>>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200
>>>>
>>>> But there is no longer such LWP in /proc.
>>>>
>>>> Any suggestion?
>>
>> Thanks for looking at it!
>>
>>> Either:
>>>
>>> - replace TARGET_WAITKIND_SPURIOUS with TARGET_WAITKIND_THREAD_EXITED, or,
>>
>> With this I'm getting:
>>
>> [LWP    21         exited]
>> [LWP    21         exited]
>> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed.
>> A problem internal to GDB has been detected,
>> further debugging may prove unreliable.
>>
>>> - replace
>>>       status->kind = TARGET_WAITKIND_SPURIOUS;
>>>       return retval;
>>>     with
>>>       goto wait_again;
>>>     instead.
>>
>> and with this:
>>
>> [LWP    20         exited]
>> [LWP    20         exited]
>> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed.
>> A problem internal to GDB has been detected,
>> further debugging may prove unreliable.
>>
>> --
>>
>> Note that in both cases there are TWO exits for one LWP. But LWP numbers differ.
>
> You mean, it was 21 in one run, and 20 in another run?
> Those were two different runs, and some timing difference
> probably tweaked the order of which thread exits first or
> something.  Doesn't seem unusual.
>
> Sounds like the patch below would fix it.

Unfortunately no.

> But why do we get two exits in a row for each LWP?  Oh, I guess
> once for PR_SYSENTRY of the exit syscall, and another time for
> PR_SYSEXIT.

Only PR_SYSENTRY is called for my test case (the first occurrence of
'exited]' - I changed that strings to distinguish between each other).

>  From 0be6c82e754dd676e9f1259ab0f9a7849d985ffd Mon Sep 17 00:00:00 2001
> From: Pedro Alves <[hidden email]>
> Date: Tue, 2 Jun 2020 15:44:54 +0100
> Subject: [PATCH] fix-solaris
>
> ---
>   gdb/procfs.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/gdb/procfs.c b/gdb/procfs.c
> index f6c6b0e71c1..e2042f3edc4 100644
> --- a/gdb/procfs.c
> +++ b/gdb/procfs.c
> @@ -2331,9 +2331,10 @@ procfs_target::wait (ptid_t ptid, struct target_waitstatus *status,
>      if (print_thread_events)
>        printf_unfiltered (_("[%s exited]\n"),
>   target_pid_to_str (retval).c_str ());
> -    delete_thread (find_thread_ptid (this, retval));
> -    status->kind = TARGET_WAITKIND_SPURIOUS;
> -    return retval;
> +    thread_info *thr = find_thread_ptid (this, retval);
> +    if (thr != nullptr)
> +      delete_thread (thr);
> +    goto wait_again;
>    }
>   else if (0)
>    {
>
> base-commit: f6eee2d098049afd18f90b8f4bb6a5d1a49d900c
>

I have modified your change to gdb 9.2 and to correct occurrence (you
have added it to second occurrence of 'exited'):

--- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000
+++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000
@@ -2207,9 +2207,10 @@
                     if (print_thread_events)
                       printf_unfiltered (_("[%s exited]\n"),
                                          target_pid_to_str
(retval).c_str ());
-                   delete_thread (find_thread_ptid (retval));
-                   status->kind = TARGET_WAITKIND_SPURIOUS;
-                   return retval;
+                   thread_info *thr = find_thread_ptid (retval);
+                   if (thr)
+                     delete_thread (thr);
+                   goto wait_again;
                   }
                 else if (syscall_is_exit (pi, what))
                   {

But this time exited message repeats forever:

[LWP    24         exited]
[LWP    24         exited]
[LWP    24         exited]
..

---

Petr
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 6/2/20 5:30 PM, Petr Sumbera wrote:

> I have modified your change to gdb 9.2 and to correct occurrence (you have added it to second occurrence of 'exited'):
>
> --- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000
> +++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000
> @@ -2207,9 +2207,10 @@
>                     if (print_thread_events)
>                       printf_unfiltered (_("[%s exited]\n"),
>                                          target_pid_to_str (retval).c_str ());
> -                   delete_thread (find_thread_ptid (retval));
> -                   status->kind = TARGET_WAITKIND_SPURIOUS;
> -                   return retval;
> +                   thread_info *thr = find_thread_ptid (retval);
> +                   if (thr)
> +                     delete_thread (thr);
> +                   goto wait_again;
>                   }
>                 else if (syscall_is_exit (pi, what))
>                   {
>
> But this time exited message repeats forever:
>
> [LWP    24         exited]
> [LWP    24         exited]
> [LWP    24         exited]

Sounds like the LWP is stuck with the status, or the status is
cached.  We probably need to resume the process to move it out
of the syscall, I guess.  There's this bit in the file, at
another spot we call goto wait_again:

        /* How to keep going without returning to wfi: */
        target_continue_no_signal (ptid);
        goto wait_again;

wfi == wait_for_inferior, the name of a function that used
to be pretty core in infrun.c.  Nowadays handle_inferior_event
took the role.

Try doing the same.  Like:

        delete_thread (find_thread_ptid (this, retval));
        target_continue_no_signal (ptid);
        goto wait_again;

You may need to split the delete_thread/find_thread bits, or
you may not.  I'm not sure.

The TARGET_WAITKIND_SPURIOUS handling in infrun.c also
just calls resume(GDB_SIGNAL_0), so I _think_ this will work as
well as before.  I have no idea how this was supposed to handle
the case of an LWP exiting while another one is single
stepping.  Looks like we lose the original single-stepping
request.  Maybe.  Not sure.  But doesn't look like we're
making things any worse.

Thanks,
Pedro Alves

Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 02.06.2020 19:14, Pedro Alves wrote:

> On 6/2/20 5:30 PM, Petr Sumbera wrote:
>
>> I have modified your change to gdb 9.2 and to correct occurrence (you have added it to second occurrence of 'exited'):
>>
>> --- ../../gdb-9.2/gdb/procfs.c.orig     2020-06-02 17:10:32.057735432 +0000
>> +++ ../../gdb-9.2/gdb/procfs.c  2020-06-02 18:02:45.496117117 +0000
>> @@ -2207,9 +2207,10 @@
>>                      if (print_thread_events)
>>                        printf_unfiltered (_("[%s exited]\n"),
>>                                           target_pid_to_str (retval).c_str ());
>> -                   delete_thread (find_thread_ptid (retval));
>> -                   status->kind = TARGET_WAITKIND_SPURIOUS;
>> -                   return retval;
>> +                   thread_info *thr = find_thread_ptid (retval);
>> +                   if (thr)
>> +                     delete_thread (thr);
>> +                   goto wait_again;
>>                    }
>>                  else if (syscall_is_exit (pi, what))
>>                    {
>>
>> But this time exited message repeats forever:
>>
>> [LWP    24         exited]
>> [LWP    24         exited]
>> [LWP    24         exited]
>
> Sounds like the LWP is stuck with the status, or the status is
> cached.  We probably need to resume the process to move it out
> of the syscall, I guess.  There's this bit in the file, at
> another spot we call goto wait_again:
>
> /* How to keep going without returning to wfi: */
> target_continue_no_signal (ptid);
> goto wait_again;
>
> wfi == wait_for_inferior, the name of a function that used
> to be pretty core in infrun.c.  Nowadays handle_inferior_event
> took the role.
>
> Try doing the same.  Like:
>
> delete_thread (find_thread_ptid (this, retval));
> target_continue_no_signal (ptid);
> goto wait_again;
>
> You may need to split the delete_thread/find_thread bits, or
> you may not.  I'm not sure.
>
> The TARGET_WAITKIND_SPURIOUS handling in infrun.c also
> just calls resume(GDB_SIGNAL_0), so I _think_ this will work as
> well as before.  I have no idea how this was supposed to handle
> the case of an LWP exiting while another one is single
> stepping.  Looks like we lose the original single-stepping
> request.  Maybe.  Not sure.  But doesn't look like we're
> making things any worse.

This time it looks very promising. This is gdb 9.2 patch:

--- gdb-9.2/gdb/procfs.c
+++ gdb-9.2/gdb/procfs.c
@@ -2208,8 +2208,8 @@
                       printf_unfiltered (_("[%s exited]\n"),
                                          target_pid_to_str
(retval).c_str ());
                     delete_thread (find_thread_ptid (retval));
-                   status->kind = TARGET_WAITKIND_SPURIOUS;
-                   return retval;
+                   target_continue_no_signal (ptid);
+                   goto wait_again;
                   }
                 else if (syscall_is_exit (pi, what))
                   {


This works for few test cases. And I actually started gdb tests to see
if it makes any regression (but it might take some time to run it though).

But in one particular case it returns following:

..
[LWP    33         exited1]
[LWP    31         exited1]
[LWP    32         exited1]
[LWP    28         exited1]
[LWP    30         exited1]
[LWP    2         exited1]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
satisfy query
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
satisfy query
(gdb)

It might be related...

Thank you very much!

Petr
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 03.06.2020 15:09, Petr Sumbera via Gdb wrote:
> This works for few test cases. And I actually started gdb tests to see
> if it makes any regression (but it might take some time to run it though).

GDB tests on Solaris doesn't seem to deterministic. So I cannot confirm
for 100% that the patch doesn't cause any regression. Though it rather
seems it's not. See attached diff output between runs without and with
the patch.

Can we get the patch to upstream now?

> But in one particular case it returns following:
>
> ..
> [LWP    33         exited1]
> [LWP    31         exited1]
> [LWP    32         exited1]
> [LWP    28         exited1]
> [LWP    30         exited1]
> [LWP    2         exited1]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
> satisfy query
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
> satisfy query
> (gdb)
>
> It might be related...
Not sure about this one. I see this when I start Firefox with -P and
select a profile and Firefox is going to start with selected profile.
All threads are closed. The message is for the only remaining thread #1
(after it got TD_NOTHR from p_td_ta_map_id2thr(). Where GDB for some
reasons thinks it's 'defunct'.

When I run it without GDB the same process will start many other threads
and Firefox works. In GDB the above message is shown...

Thanks,

Petr

test-64-diffs (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list

Sourceware - gdb list mailing list
On 6/8/20 10:51 AM, Petr Sumbera via Gdb wrote:
> On 03.06.2020 15:09, Petr Sumbera via Gdb wrote:
>> This works for few test cases. And I actually started gdb tests to see if it makes any regression (but it might take some time to run it though).
>
> GDB tests on Solaris doesn't seem to deterministic. So I cannot confirm for 100% that the patch doesn't cause any regression. Though it rather seems it's not. See attached diff output between runs without and with the patch.
>
> Can we get the patch to upstream now?

Please can send the version that you tested.  You mentioned
before that I had applied the fix to the wrong place.

Pedro Alves