[Bug gdb/22992] New: GDB and Microsoft Windows thread pool

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] New: GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

            Bug ID: 22992
           Summary: GDB and Microsoft Windows thread pool
           Product: gdb
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: critical
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: ruslanngaripov at gmail dot com
  Target Milestone: ---
              Host: Microsoft Windows 10 x86-64
            Target: Microsoft Windows 10 x86-64
             Build: x86_64-w64-mingw32

Created attachment 10906
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10906&action=edit
Sample code

I encountered an internal GDB error while was debugging C++ code using
Microsoft Windows thread pool API[1]. When several works/threads hit the same
breakpoint at the same time, I got the following GDB error:

> ../../../../src/gdb-8.0.1/gdb/infrun.c:5575: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.

Detailed information comes below.

Compiler: mingw-w64 GCC (7.1 - 7.3); GDB (8.0.1 and 8.1); host OS is Microsoft
Windows 10 x64.

There is the sample code in the attachment. The code initializes thread pool
object and runs three background threads/works that almost do nothing. The
problem appears if one set a breakpoint (with the `break`, `dprintf`, etc.)
inside the thread's callback function. When the threads hit that breakpoint at
the same time, the internal error raises. The works are being submitted to the
execution in the `for` loop, because usually the error appears on the second
submitting and rarely on the first (see output log below).

Command line to build the sample:

```
g++ -x c++ -std=gnu++1z -m64 -gdwarf -g3 -D_WIN32_WINNT=_WIN32_WINNT_WIN10
-DWINVER=_WIN32_WINNT_WIN10 -DDEBUG sample.cxx
```

Below is dump of a GDB session:

```
D:\p>gdb -se a.exe
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-w64-mingw32".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>;.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>;.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.exe...done.
(gdb) dprintf sample.cxx:22, "[a] In a background thread.\n"
Dprintf 1 at 0x401574: file sample.cxx, line 22.
(gdb) r
Starting program: D:\p\a.exe
[New Thread 11420.0x308c]
[New Thread 11420.0x21f0]
[New Thread 11420.0x2c18]
[New Thread 11420.0x1a48]
[New Thread 11420.0x2c80]
[New Thread 11420.0x1b04]
[a] In a background thread.
[a] In a background thread.
[New Thread 11420.0x2404]
[a] In a background thread.
[a] In a background thread.
../../../../src/gdb-8.0.1/gdb/infrun.c:5575: internal-error: int
finish_step_over(execution_control_state*): Assertion
`ecs->event_thread->control.trap_expected' failed.
A problem internal to GDB has been detected, further debugging may prove
unreliable.
Quit this debugging session? (y or n) y

This is a bug, please report it.  For instructions, see:
<http://www.gnu.org/software/gdb/bugs/>;.

../../../../src/gdb-8.0.1/gdb/infrun.c:5575: internal-error: int
finish_step_over(execution_control_state*): Assertion
`ecs->event_thread->control.trap_expected' failed.
A problem internal to GDB has been detected, further debugging may prove
unreliable.
Create a core file of GDB? (y or n) n
```

In the session above the error raised on the second work submitting (the first
iteration of the loop was OK).

I tried different compiler version (7.1, 7.2 and 7.3) with different thread
models of the runtime (posix and win-32); different version of GDB (8.0.1, 8.1
and 7.9 shipped with Intel C++ compiler (gdb-ia)) and I always got the same
result -- internal error of GDB.

And now I cannot debug my program without ugly workarounds...

This issue was confirmed by Liu Hao on Microsoft Windows 7 x64[2].

[1]:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686766(v=vs.85).aspx
"Thread Pool API"
[2]: https://sourceforge.net/p/mingw-w64/mailman/message/36269973/ "Re:
[Mingw-w64-public] GDB and Microsoft Windows thread pool"

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

Hannes Domani <ssbssa at yahoo dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ssbssa at yahoo dot de

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-09-24
                 CC|                            |tromey at sourceware dot org
     Ever confirmed|0                           |1

--- Comment #1 from Tom Tromey <tromey at sourceware dot org> ---
I was able to reproduce this intermittently -- sometimes it works,
sometimes it crashes.  With "set debug infrun 1" I see this at the
tail of the log:

[a] In a background thread.
infrun: BPSTAT_WHAT_SINGLE
infrun: thread [Thread 5480.0x2e00] still needs step-over
infrun: skipping breakpoint: stepping past insn at: 0x4015d6
infrun: skipping breakpoint: stepping past insn at: 0x4015d6
infrun: skipping breakpoint: stepping past insn at: 0x4015d6
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread
[Thread 5480.0x2e00] at 0x4015d6
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun:   5480.0.12172 [Thread 5480.0x2f8c],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: handle_inferior_event status->kind = stopped, signal = GDB_SIGNAL_TRAP


What this says to me is that gdb sent a single-step, but then
some other thread stopped instead.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #2 from Tom Tromey <tromey at sourceware dot org> ---
Enabling some windows-nat logging additionally shows:

infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread
[Thread 10932.0x254c] at 0x4015d6
ContinueDebugEvent (cpid=10932, ctid=0x254c, DBG_CONTINUE);
infrun: prepare_to_wait
gdb: kernel event for pid=10932 tid=0x16dc code=EXCEPTION_DEBUG_EVENT)
gdb: Target exception EXCEPTION_BREAKPOINT at 0x4015d6
infrun: target_wait (-1.0.0, status) =
infrun:   10932.0.5852 [Thread 10932.0x16dc],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP


So it looks like, indeed, a breakpoint is hit in another thread.

From what I understand, this is a bug in windows-nat.c, because
a stepping resume should keep other threads suspended.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

Joel Brobecker <brobecker at gnat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |brobecker at gnat dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #3 from Tom Tromey <tromey at sourceware dot org> ---
I hacked in some code to call SuspendThread on any other threads
in the stepping case -- but this still did not work.

Maybe the problem, though, is that SuspendThread is not synchronous:

https://devblogs.microsoft.com/oldnewthing/?p=44743

... so next is to try that.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #4 from Tom Tromey <tromey at sourceware dot org> ---
Adding a call to GetThreadContext did not work either.
I wonder if the thread pool implementation could be
calling ResumeThread.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #5 from Tom Tromey <tromey at sourceware dot org> ---
One other possible theory is that there are two simultaneous
stops, and they get queued; so when we resume the thread,
gdb immediately gets the next one.

I have no idea if that's actually possible.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #6 from Tom Tromey <tromey at sourceware dot org> ---
gdbserver does already properly suspend non-stepping threads,
so this part seems like a missing feature of windows-nat.c.

However, I can reproduce the bug using gdbserver as well.

I am not sure what to do about this bug.  On Windows 10
we could try passing DBG_REPLY_LATER to ContinueDebugEvent.
However, I seem to be running on Windows 6.3.

Maybe one approach could be to always use non-stop on Windows.
That seems pretty intrusive though.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #7 from Tom Tromey <tromey at sourceware dot org> ---
Last night I had the idea that perhaps windows-nat could
record when gdb requested a step; then simply queue stops
from other (suspended) threads until the step is complete.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #8 from Tom Tromey <tromey at sourceware dot org> ---
Actually now I think that can't work, because gdb can't re-continue
the original event; nor can it continue the new event (the one it
wants to queue).

Perhaps all that's left is trying to teach infrun about this case.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #9 from Tom Tromey <tromey at sourceware dot org> ---
(In reply to Tom Tromey from comment #8)

> Perhaps all that's left is trying to teach infrun about this case.

It turns out that this is relatively simple if the Ravenscar series
is checked in.  Then infrun can be told that Windows is a "random
thread stop" target.  I have a simple patch for this that avoids the
crash; though I haven't fixed this in gdbserver.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #10 from Tom Tromey <tromey at sourceware dot org> ---
I resurrected the idea from comment #7.  I realized it would
probably work fine, because we can always continue whatever
event we just processed; and then the queue of deferred stops
could be handled in windows-nat.c via synthetic stops and resumes.

This at least avoids the crash but it also causes a spurious stop:

Thread 2 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 11276.0x3244]
0x004015dd in (anonymous namespace)::Callback (context=0xe8) at sample.cxx:22
22        Sleep(100);  /* Do a "work". */

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #11 from Tom Tromey <tromey at sourceware dot org> ---
One thing I notice is that the spurious stop is always a "breakpoint"
stop in some other thread, one that is ostensibly suspended:

infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread
[Thread 5924.0x2360] at 0x4015d6
gdb: windows_resume (pid=5924, tid=0x2360, step=1, sig=0);
ContinueDebugEvent (cpid=5924, ctid=0x2360, DBG_CONTINUE);
infrun: prepare_to_wait
gdb: kernel event for pid=5924 tid=0x33a8 code=EXCEPTION_DEBUG_EVENT)
gdb: Target exception EXCEPTION_BREAKPOINT at 0x4015d6
get_windows_debug_event - unexpected stop in 0x33a8 (expecting 0x2360)


I am thinking it might be ok to simply ignore such events.

I'm a little troubled by all this because I don't truly understand
how this stop could happen, even in theory.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

Pedro Alves <palves at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |palves at redhat dot com

--- Comment #12 from Pedro Alves <palves at redhat dot com> ---
(In reply to Tom Tromey from comment #5)
> One other possible theory is that there are two simultaneous
> stops, and they get queued; so when we resume the thread,
> gdb immediately gets the next one.
>
> I have no idea if that's actually possible.

That sounds highly likely what is happening here, to me.  It's puzzling that
Windows would report an event out of a suspended thread, though.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #13 from Pedro Alves <palves at redhat dot com> ---
(In reply to Tom Tromey from comment #11)

> One thing I notice is that the spurious stop is always a "breakpoint"
> stop in some other thread, one that is ostensibly suspended:
>
> infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current
> thread [Thread 5924.0x2360] at 0x4015d6
> gdb: windows_resume (pid=5924, tid=0x2360, step=1, sig=0);
> ContinueDebugEvent (cpid=5924, ctid=0x2360, DBG_CONTINUE);
> infrun: prepare_to_wait
> gdb: kernel event for pid=5924 tid=0x33a8 code=EXCEPTION_DEBUG_EVENT)
> gdb: Target exception EXCEPTION_BREAKPOINT at 0x4015d6
> get_windows_debug_event - unexpected stop in 0x33a8 (expecting 0x2360)

And supposedly, the breakpoint is NOT installed in the target at this point,
right?  Or is it some other breakpoint?  Seems like it's the same from your
logs.  If it is indeed the breakpoint that was removed, then that again
suggests this is Windows returning the queued event.

SuspendThread returns the thread's previous suspend count.  You could use that
to check whether the program is unsuspending threads on its own.

>
>
> I am thinking it might be ok to simply ignore such events.

Does Windows decrement the PC after a break automatically?  I forget.

>
> I'm a little troubled by all this because I don't truly understand
> how this stop could happen, even in theory.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #14 from Tom Tromey <tromey at sourceware dot org> ---
(In reply to Pedro Alves from comment #13)

> And supposedly, the breakpoint is NOT installed in the target at this point,
> right?  Or is it some other breakpoint?  Seems like it's the same from your
> logs.  If it is indeed the breakpoint that was removed, then that again
> suggests this is Windows returning the queued event.

As I understand it, gdb itself uninstalls the breakpoints before
single-stepping.
I did not actually verify this.

The only user breakpoint in my setup is the "dprintf" one (see the original
repro instructions in this bug).

> SuspendThread returns the thread's previous suspend count.  You could use
> that to check whether the program is unsuspending threads on its own.

Yes, I did this at some point.  In fact what I did is hack gdb to suspend
each thread 5 times, then also print out the suspension count after getting
a debug event.  All the relevant threads always reported their suspension
count as 5 -- which to my mind means it is unlikely that the inferior is doing
anything tricky.

> > I am thinking it might be ok to simply ignore such events.
>
> Does Windows decrement the PC after a break automatically?  I forget.

There's no mention of this in windows-nat.c.  So I guess it defers to the arch?


I changed my gdb to (1) suspend threads when stepping (still seems like a big
oversight to me ... and despite what I said earlier, I'm not 100% sure that
gdbserver does this either), and (2) ignore spurious breakpoint events.

Now I get even weirder behavior:

Thread 4 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 7664.0x32c8]
0x004015dc in (anonymous namespace)::Callback (context=0xe0) at sample.cxx:22
22        Sleep(100);  /* Do a "work". */

Look at that PC, then:

(gdb) disassemble
Dump of assembler code for function (anonymous
namespace)::Callback(TP_CALLBACK_INSTANCE*, void*, TP_WORK*):
   0x004015d0 <+0>:     push   %ebp
   0x004015d1 <+1>:     mov    %esp,%ebp
   0x004015d3 <+3>:     sub    $0x18,%esp
   0x004015d6 <+6>:     movl   $0x64,(%esp)
   0x004015dd <+13>:    mov    0x4091fc,%eax

... notice that it is in the middle of an instruction.


See comment #10 to see the results of the queueing experiment.  That stop is
suggestive
because it is one instruction past the location of the dprintf breakpoint.

1       dprintf        keep y   0x004015d6 in (anonymous
namespace)::Callback(TP_CALLBACK_INSTANCE*, void*, TP_WORK*) at sample.cxx:22

This seems to happen sometimes after one of these stops has been ignored, so I
somewhat suspect my patch, though I don't see anything obviously wrong in it.


Maybe the queueing approach is better and we should just live with the spurious
stop.
If the theory is that this is caused by internal event buffering in Windows,
then I
would definitely argue this direction.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #15 from Pedro Alves <palves at redhat dot com> ---
(In reply to Tom Tromey from comment #14)

> (In reply to Pedro Alves from comment #13)
>
> > And supposedly, the breakpoint is NOT installed in the target at this point,
> > right?  Or is it some other breakpoint?  Seems like it's the same from your
> > logs.  If it is indeed the breakpoint that was removed, then that again
> > suggests this is Windows returning the queued event.
>
> As I understand it, gdb itself uninstalls the breakpoints before
> single-stepping.
> I did not actually verify this.

Not all breakpoints, just the breakpoint being stepped over.  Hence my
question, is this always an event for that breakpoint that is being stepped
over, thus indicating that it must be a queued event, or could it be an event
for some other breakpoint, indicating that the other threads actually ran even
when supposedly suspended?  This makes me realize that a better test would be
to look at the PCs of all the threads, make sure none of them changed, except
the thread that is being stepped.  If none moves, then this really must be just
a queued event.

>
> The only user breakpoint in my setup is the "dprintf" one (see the original
> repro instructions in this bug).
>
> > SuspendThread returns the thread's previous suspend count.  You could use
> > that to check whether the program is unsuspending threads on its own.
>
> Yes, I did this at some point.  In fact what I did is hack gdb to suspend
> each thread 5 times, then also print out the suspension count after getting
> a debug event.  All the relevant threads always reported their suspension
> count as 5 -- which to my mind means it is unlikely that the inferior is
> doing
> anything tricky.
>
> > > I am thinking it might be ok to simply ignore such events.
> >
> > Does Windows decrement the PC after a break automatically?  I forget.
>
> There's no mention of this in windows-nat.c.  So I guess it defers to the
> arch?

By "Windows", I mean the system/kernel, before reporting the event.  On x86, a
breakpoint triggers, the PC will be pointing to the instruction after the
breakpoint instruction, int3 / 0xcc, which is a one-byte-long instruction,
since a breakpoint trigger is just the int3 instruction executing.  So when a
breakpoint instruction runs / triggers, something must rewind the PC back one
byte, so that it points at the start of the original instruction again.
Traditionally, this is done by gdb core, by infrun.c:adjust_pc_after_break.
If the target / system / kernel itself does the PC rewinding before reporting
the event, then we should skip adjust_pc_after_break and make the target return
true for target_supports_stopped_by_sw_breakpoint along with implementing
target_stopped_by_sw_breakpoint.  linux-nat.c and gdbserver/linux-low.c do that
nowadays.

>
>
> I changed my gdb to (1) suspend threads when stepping (still seems like a big
> oversight to me ... and despite what I said earlier, I'm not 100% sure that
> gdbserver does this either), and (2) ignore spurious breakpoint events.

Indeed seems like an oversight if true.  Tests like gdb.threads/schedlock.exp
should expose this.  ISTR that they pass when testing with Cygwin, but that was
a long time ago.  thread_rec suspends the thread, it may be that the code is
assuming that core gdb fetches registers from all threads all all debug events
or something.  Which is a really bad assumption.

You should be able to check this by confirming that "set scheduler-locking on"
really leaves other threads suspended, but, if you do "info threads", you'll
fetch registers from all threads, thus masking the problem, since fetching
registers ends up in thread_rec -> SuspendThread for all threads, I think.

>
> Now I get even weirder behavior:
>
> Thread 4 received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 7664.0x32c8]
> 0x004015dc in (anonymous namespace)::Callback (context=0xe0) at sample.cxx:22
> 22  Sleep(100);  /* Do a "work". */
>
> Look at that PC, then:
>
> (gdb) disassemble
> Dump of assembler code for function (anonymous
> namespace)::Callback(TP_CALLBACK_INSTANCE*, void*, TP_WORK*):
>    0x004015d0 <+0>: push   %ebp
>    0x004015d1 <+1>: mov    %esp,%ebp
>    0x004015d3 <+3>: sub    $0x18,%esp
>    0x004015d6 <+6>: movl   $0x64,(%esp)
>    0x004015dd <+13>: mov    0x4091fc,%eax
>
> ... notice that it is in the middle of an instruction.

This is exactly the sort of thing to expect if Windows itself doesn't unwind
the PC for us.

The breakpoint was originally set at 0x4015d6.  So when the breakpoint
triggers, the PC will be pointing at 0x4015d7.  If nothing rewinds the PC back
to 0x4015d6, then when the thread is resumed, it'll execute the instruction at
0x4015d7.  Sometimes that crashes immediately for trying to run an invalid
instruction, but in this case looks like that slice of the original instruction
was a valid 2-byte instruction, which executes and manages to not crash the
program.  Then the next "instruction" as at 0x004015dc, and that one crashes.
I.e., the program's PC pointer is no longer pointing at the real instruction
boundaries.

Even if Windows did the PC rewinding, it would seem to me that it would be a
bad idea to ignore the events.  For breakpoints, ignoring is kind of fine since
as soon as you resume the thread, and the breakpoint is still installed, the
thread will just trigger the breakpoint again.  But for other events, like for
example watchpoints, that won't be true, if you ignore such events, you'll just
lose them.

So it's better to queue these events in gdb too.

>
> Maybe the queueing approach is better and we should just live with the
> spurious stop.
> If the theory is that this is caused by internal event buffering in Windows,
> then I
> would definitely argue this direction.

The queueing approach is better, but that should _not_ result in spurious
stops.
We do queueing in linux-nat.c too, for example.  

You either

- #1 queue all events except breakpoints, and for breakpoints unwind PC and
discard the event, or,

- #2 queue all events, for breakpoints unwind PC, and implement
target_stopped_by_sw_breakpoint (this requires unwound PCs)

#1 makes threads retrigger delayed breakpoint events, possibly over and over in
a contended scenario, worsening contention.  gdb/linux-nat.c and
gdbserver/linux-low.c did this for many years.

#2 lets core GDB know to ignore a spurious breakpoint event if the breakpoint
is removed between a thread hitting it and the queued event bubbling up to gdb
core.  Without target_stopped_by_sw_breakpoint, such an delayed event would
just look like a spurious SIGTRAP, and thus reported as such to the user.

I'd go for #2.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at sourceware dot org   |tromey at sourceware dot org

--- Comment #16 from Tom Tromey <tromey at sourceware dot org> ---
An update on this bug -- after extensive debugging via irc,
Pedro pointed out an oddity in the logs... the queueing patch
was not re-fetching the de-queued thread's registers.  Fixing
this made it work.  So, on to plan 2 from comment #15.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug gdb/22992] GDB and Microsoft Windows thread pool

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=22992

--- Comment #17 from Tom Tromey <tromey at sourceware dot org> ---
> - #2 queue all events, for breakpoints unwind PC, and implement target_stopped_by_sw_breakpoint (this requires unwound PCs)

This part turned out to be sort of subtle.
From what I can tell the target should check
software_breakpoint_inserted_here_p before deciding whether
this method should return true, and thus also before committing
to an un-biased PC.

I have a patch to update the target.h docs here a little.

--
You are receiving this mail because:
You are on the CC list for the bug.
12