[Bug threads/20291] New: Set scheduler-locking on doesn't work correctly with pthread_create

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] New: Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

            Bug ID: 20291
           Summary: Set scheduler-locking on doesn't work correctly with
                    pthread_create
           Product: gdb
           Version: HEAD
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: threads
          Assignee: unassigned at sourceware dot org
          Reporter: martin.galvan at tallertechnologies dot com
                CC: palves at redhat dot com
  Target Milestone: ---

Created attachment 9350
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9350&action=edit
Example code that triggers the bug

In the following code:

#include <stdio.h>
#include <pthread.h>

void *thread_func(void *unused)
{
    puts("Child thread.");

    return NULL;
}

int main()
{
    pthread_t thread;

    if (pthread_create(&thread, NULL, thread_func, NULL) == 0) /* Stop here */
    {
        puts("Child thread created.");

        if (pthread_join(thread, NULL) == 0)
        {
            puts("Child thread joined.");
        }
    }

    return 0;
}

If I stop at the "Stop here" line (before calling pthread_create), issue a "set
scheduler-locking on", and then a "next", the child thread will still run (and
finish) before "next" finishes:

Temporary breakpoint 1, main () at example.c:15
15          if (pthread_create(&thread, NULL, thread_func, NULL) == 0) /* Stop
here */
(gdb) set scheduler-locking on
(gdb) next
[New Thread 0x7ffff77f6700 (LWP 13087)]
Child thread.
[Thread 0x7ffff77f6700 (LWP 13087) exited]
17              puts("Child thread created.");
(gdb)

This doesn't happen with gdb 7.7.1, so I'm guessing some later commit
introduced the bug. The schedlock tests run fine for the current git master, so
I guess this wasn't covered by the testsuite.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

Martin Galvan <martin.galvan at tallertechnologies dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |martin.galvan@tallertechnol
                   |                            |ogies.com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #1 from Martin Galvan <martin.galvan at tallertechnologies dot com> ---
Using git-bisect I've traced the issue back to commit
c1a747c10948e2298083179f4e8aeed8b962e2af:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git&a=commit&h=c1a747c10948e2298083179f4e8aeed8b962e2af

I'm still surprised that it didn't cause any regressions on the schedlock
tests, though. I'll take a deeper look at this and report back.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #2 from Pedro Alves <palves at redhat dot com> ---
What I imagine is happening is that:

- Before, whenever a new thread appeared, everything would stop for the thread
event breakpoint, and then the event got reported to infrun.c.  infrun.c would
then only continue the thread that was originally resumed.

- Now, new threads don't cause a infrun.c-visible stop.
linux-nat.c:resume_stopped_resumed_lwps resumes the new thread right after we
handle the PTRACE_EVENT_CLONE that discovered the new thread.

The problem is that the target backend has no idea whether infrun wants new
threads to be resumed or whether it wants them stopped.

I think one solution would for infrun to enable the new support for reporting
thread create/exit events (target_thread_events ->
TARGET_WAITKIND_THREAD_CREATED) whenever schedlock is in effect.

> I'm still surprised that it didn't cause any regressions on the schedlock
> tests, though. I'll take a deeper look at this and report back.

I think schedlock.c doesn't spawn new threads while the meat of the test is
running.  Instead, it first starts all threads in a preparation phase.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #3 from Pedro Alves <palves at redhat dot com> ---
Note that nothing in gdb ever tried to make sure that "set scheduler-locking
on" locks new threads.  Many targets can't even trap when a new thread is
created.

Actually, even when debugging against GNU/Linux gdbserver, even before that
patch, I believe you'll see the child running free, because gdbserver hasn't
been using libthread_db.so thread event breakpoints for a long while.  Even
when it did, those thread event stops weren't reported to gdb, so the new
threads would still run free.

I think it was mainly working "by chance" before.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #4 from Pedro Alves <palves at redhat dot com> ---
> I think it was mainly working "by chance" before.

And also because the thread event breakpoint is triggered on the _parent_
thread.
If you enable "set debug infrun 1" / "set debug lin-lwp 1" on an older gdb, you
should see that the child manages to run a bit before the event breakpoint
triggers.  When the breakpoint is hit, then everything is stopped, and then
infrun only resumes the parent.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #5 from Martin Galvan <martin.galvan at tallertechnologies dot com> ---
(In reply to Pedro Alves from comment #3)

> Note that nothing in gdb ever tried to make sure that "set scheduler-locking
> on" locks new threads.  Many targets can't even trap when a new thread is
> created.
>
> Actually, even when debugging against GNU/Linux gdbserver, even before that
> patch, I believe you'll see the child running free, because gdbserver hasn't
> been using libthread_db.so thread event breakpoints for a long while.  Even
> when it did, those thread event stops weren't reported to gdb, so the new
> threads would still run free.
>
> I think it was mainly working "by chance" before.

In that case, should this even be considered a bug? Perhaps we could add
something to the documentation so that other users won't trip on this again.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #6 from Pedro Alves <palves at redhat dot com> ---
> In that case, should this even be considered a bug? Perhaps we could add
> something to the documentation so that other users won't trip on this again.

I'm not sure, haven't made up my mind about it myself.

The fact that is only worked by chance doesn't mean that we shouldn't fix it
properly.

My knee-jerk reaction was that it's a bug, though I'm slightly mildly concerned
about the potential maybe-surprising hangs over pthread_create and similars.
But maybe not that concerned.  The user did ask for nothing else to be
scheduled.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #7 from Martin Galvan <martin.galvan at tallertechnologies dot com> ---
(In reply to Pedro Alves from comment #6)

> > In that case, should this even be considered a bug? Perhaps we could add
> > something to the documentation so that other users won't trip on this again.
>
> I'm not sure, haven't made up my mind about it myself.
>
> The fact that is only worked by chance doesn't mean that we shouldn't fix it
> properly.
>
> My knee-jerk reaction was that it's a bug, though I'm slightly mildly
> concerned
> about the potential maybe-surprising hangs over pthread_create and similars.
> But maybe not that concerned.  The user did ask for nothing else to be
> scheduled.

That sounds to me like an implementation detail leaking out of glibc through
gdb. If I understood correctly, the parent thread locks a mutex in
__pthread_create_2_1 so that the child will block inside start_thread before it
can jump to its user-provided start routine. IMHO, as a user I shouldn't have
to be aware of such details when using a (seemingly) inocuous command.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #8 from Pedro Alves <palves at redhat dot com> ---
What do you suggest gdb should do?

Baking in awareness of glibc's internals in order to let the child run a bit is
not that appealing.

Note this sort of hang can happen when you step over _any_ function that blocks
on some synchronization primitive (with schedlock on).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #9 from Martin Galvan <martin.galvan at tallertechnologies dot com> ---
(In reply to Pedro Alves from comment #8)
> Baking in awareness of glibc's internals in order to let the child run a bit
> is not that appealing.

Agreed. That would be ugly.

> Note this sort of hang can happen when you step over _any_ function that
> blocks on some synchronization primitive (with schedlock on).

I'm going through the glibc code again and I'm kind of lost here. You said the
hang would arise because the parent thread would wait forever for the child to
release the lock. However, create_thread first locks pd->lock, and then calls
clone (through ARCH_CLONE), passing it the glibc-internal start_thread as
starting routine. start_thread, in turn, will try to lock that same lock,
blocking the child. The parent then has a chance to set some more stuff on the
child, and then it unlocks it so the child can go on and call its user-provided
routine.

What this means is, the child waits for the parent and not the other way
around. If schedlock somehow manages to prevent the child from ever being
scheduled, the parent shouldn't mind (unless of course the user later calls
pthread_join, but that's their problem :)).

In any case, I think you're right in that the parent might hang elsewhere. My
point was that it's a bad thing if it happens in library code, but the user can
reasonably be expected to be aware of this in their own code.

> What do you suggest gdb should do?

I'm not sure. I'm just starting to dwell in this part of gdb, so I'm gonna need
a bit more inspecting before I can make a proper suggestion.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

--- Comment #10 from Pedro Alves <palves at redhat dot com> ---
Well, I actually said:

 "IF there's synchronization (mutex/futex/whatnot) between
 the parent/child threads inside glibc's pthread_create"

New emphasis on the IF.  I haven't looked at the code in that level of detail.
I was merely pointing out a _potential_ problem with the use of schedlock in
the new glibc printers you're working on (as discussed in the libc-alpha@).

(It's still true that the child manages to run a bit with older gdbs.  But with
your observation, that ends up not making a practical difference for the issues
at hand.)

> What this means is, the child waits for the parent and not the other way around.

OK, then the concern is moot.  (unless glibc changes, of course.)

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug threads/20291] Set scheduler-locking on doesn't work correctly with pthread_create

rootedjoy833 at gmail dot com
In reply to this post by rootedjoy833 at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=20291

mingwei.zhang at intel dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mingwei.zhang at intel dot com

--
You are receiving this mail because:
You are on the CC list for the bug.