GDB hangs when calling inferior functions

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

GDB hangs when calling inferior functions

Stephen Roberts
Hi folks,

I'm encountering a hang in GDB which I have traced back to the  infrun_async(int) function in infrun.c, and wondered if anyone could  shed some light on this part of the code before I raise a bug report. This function is a wrapper around the mark_async_event_handler and  clear_async_event_handler functions, which set the "ready" member of an  async_signal_handler to 1 or 0, respectively. From what I can tell, this wrapper saves its argument in a file-scoped  variable called infrun_is_async to ensure that it only calls the  mark/clear functions once if it is called repeatedly with the same  argument.

The hang occurs when GDB tries to call inferior functions on two different threads with scheduler-locking turned on. The first call works fine, with the call to infrun_async(1) causing the  signal_handler to be marked and the event to be handled, but then the  event loop resets the "ready" member to zero, while leaving  infrun_is_async set to 1. As a result, GDB hangs if the user switches to another thread and calls a  second function because calling infrun_async(1) a second time has no  effect, meaning the inferior call events are never handled.

I've been able to hack around this hang by resetting the infrun_is_async  variable to zero, but I'm hoping to find a more elegant solution. With that in mind, does anyone know what this logic is for? Most of  infrun.c just calls the mark/clear functions directly without going  through this interface. Which calls need to use this behavior, and why?

This issue affects all versions after 7.12 (including HEAD). For reference, this is my reproducer for this issue, where get_value is a global function:

break after_thread_creation
run
set scheduler-locking on
thread 1
call get_value()
thread 2
call get_value()
# GDB hangs here

Thanks and Regards,
Stephen Roberts.
Reply | Threaded
Open this post in threaded view
|

Re: GDB hangs when calling inferior functions

Pedro Alves-7
On 10/11/2017 02:01 PM, Stephen Roberts wrote:
> Hi folks,
>
> I'm encountering a hang in GDB which I have traced back to the
> infrun_async(int) function in infrun.c, and wondered if anyone could
> shed some light on this part of the code before I raise a bug report.

This event source exists to wake up the event loop when we have
pending events to handle recorded in the thread data structures.
I.e., events that we've already pulled out of the
backend (e.g., linux-nat.c), and thus wouldn't otherwise wake up
the event loop.

> This function is a wrapper around the mark_async_event_handler and
> clear_async_event_handler functions, which set the "ready" member of
> an  async_signal_handler to 1 or 0, respectively. From what I can
> tell, this wrapper saves its argument in a file-scoped  variable
> called infrun_is_async to ensure that it only calls the  mark/clear
> functions once if it is called repeatedly with the same  argument.

That's correct.

>
> The hang occurs when GDB tries to call inferior functions on two
> different threads with scheduler-locking turned on. The first call
> works fine, with the call to infrun_async(1) causing the
> signal_handler to be marked and the event to be handled, but then the
> event loop resets the "ready" member to zero, while leaving
> infrun_is_async set to 1. As a result, GDB hangs if the user switches
> to another thread and calls a  second function because calling
> infrun_async(1) a second time has no  effect, meaning the inferior
> call events are never handled.
>
> I've been able to hack around this hang by resetting the
> infrun_is_async  variable to zero, but I'm hoping to find a more
> elegant solution. With that in mind, does anyone know what this logic
> is for? Most of  infrun.c just calls the mark/clear functions
> directly without going  through this interface. Which calls need to
> use this behavior, and why?

I don't recall off hand, unfortunately.  Did you look at
git log/blame?

> This issue affects all versions after 7.12 (including HEAD). For
> reference, this is my reproducer for this issue, where get_value is a
> global function:

Do you have this in gdb testsuite testcase form?
I can't promise to look at this in detail right now,
but the easier you make it to try it out, the better.

>
> break after_thread_creation run set scheduler-locking on thread 1
> call get_value() thread 2 call get_value() # GDB hangs here

Thanks,
Pedro Alves
Reply | Threaded
Open this post in threaded view
|

Re: GDB hangs when calling inferior functions

Stephen Roberts
Hi Pedro,

Thanks for your reply. Any light you could shed on this would be very much appreciated. I did read the logs but didn't get very far as I'm not very familiar with this code.

I've included a gdb testcase below - hopefully this format is acceptable. This testcase reproduces the issue around 99% of the time on my ubuntu 16.04 machine. This figure drops closer to 50% when under heavy load, which suggested a race condition. I dug deeper into this and found that the threads which hang  are always ones which did not hit the breakpoint but were stopped when another thread did hit a breakpoint. Threads which are stopped at breakpoints are immune to this issue. Loading the system allows more threads to reach the breakpoint before they are stopped by gdb.

Thanks again,
Stephen Roberts.

cat > gdb-hangs-on-infcall.c << SOURCEFILE
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define THREADCOUNT 4

pthread_barrier_t barrier;
pthread_t threads[THREADCOUNT];
int thread_ids[THREADCOUNT];

// Returns the argument back, within range [0..THREADCOUNT)
int get_value(int index) {
  return thread_ids[index];
}

unsigned long fast_fib(unsigned int n) {
  int a = 0;
  int b = 1;
  int t;
  for (unsigned int i = 0; i < n; ++i) {
    t = b;
    b = a+b;
    a = t;
  }
  return a;
}

void * thread_function(void *args) {
  int tid = get_value(*((int *) args));
  (void) args;
  int status;
  status = pthread_barrier_wait(&barrier);
  if (status == PTHREAD_BARRIER_SERIAL_THREAD) {
    printf("All threads entering compute region\n");
  }
  unsigned long result = fast_fib(100); // testmarker01
  status = pthread_barrier_wait(&barrier);
  if (status == PTHREAD_BARRIER_SERIAL_THREAD) {
    printf("All threads outputting results\n");
  }
  pthread_barrier_wait(&barrier);
  printf("Thread %d Result: %lu\n", tid, result);
  pthread_exit(NULL);
}

int main(void) {
  int err = pthread_barrier_init(&barrier, NULL, THREADCOUNT);
  // Create worker threads (main)
  printf("Spawining worker threads\n");
  for (int tid = 0; tid < THREADCOUNT; ++tid) {
    thread_ids[tid] = tid;
    err = pthread_create(&threads[tid], NULL, thread_function, (void *) &thread_ids[tid]);
    if (err) {
      fprintf(stderr, "Thread creation failed\n");
      return EXIT_FAILURE;
    }
  }
  // Wait for threads to complete then exit
  for (int tid = 0; tid < THREADCOUNT; ++tid) {
    pthread_join(threads[tid], NULL);
  }
  pthread_exit(NULL);
  return EXIT_SUCCESS;
}
SOURCEFILE



$ cat > gdb-hangs-on-infcall.exp << EXPECTFILE
standard_testfile .c

# What compiler are we using?
#
if [get_compiler_info] {
    return -1
}

if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
    return -1
}

clean_restart ${binfile}

if { ![runto main] } then {
   fail "run to main"
   return
}

gdb_breakpoint [gdb_get_line_number "testmarker01"]
gdb_continue_to_breakpoint "testmarker01"
gdb_test_no_output "set scheduler-locking on"
gdb_test "show scheduler-locking" "Mode for locking scheduler during execution is \"on\"."
gdb_test "thread 4" "Switching to .*"
gdb_test "call get_value(0)" ".* = 0"
gdb_test "thread 3" "Switching to .*"
gdb_test "call get_value(0)" ".* = 0"
gdb_test "thread 2" "Switching to .*"
gdb_test "call get_value(0)" ".* = 0"
gdb_test "thread 1" "Switching to .*"
gdb_test "call get_value(0)" ".* = 0"
EXPECTFILE
Reply | Threaded
Open this post in threaded view
|

Re: GDB hangs when calling inferior functions

Yao Qi
Stephen Roberts <[hidden email]> writes:

> I've included a gdb testcase below - hopefully this format is acceptable. This testcase reproduces the issue around 99% of the time on my ubuntu 16.04 machine. This figure drops closer to 50% when under heavy load, which suggested a race condition. I dug deeper into this and found that the threads which hang  are always ones which did not hit the breakpoint but were stopped when another thread did hit a breakpoint. Threads which are stopped at breakpoints are immune to this issue. Loading the system allows more threads to reach the breakpoint before they are stopped by gdb.

I can reproduce the hang with your test case.  Looks
inferior_event_handler (INF_EXEC_COMPLETE, ) should be called somewhere,
may be in infrun or the thread finite-state machine is in a wrong state
for inferior call.

--
Yao (齐尧)