[Bug nptl/3086] New: when run tst-timer on x86_64, it causes a segfault

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] New: when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de
when build tst-timer to 64-bit and run it, it will casues a segfault:
# ./tst-timer
clock_gettime returned 0, timespec = { 1155266976, 602521000 }
clock_getres returned 0, timespec = { 0, 4000250 }
signal_func
notify_func2
signal_func
notify_func2
notify_func1
signal_func
notify_func1
notify_func2
notify_func1
notify_func2
signal_func
notify_func1
notify_func1
notify_func2
signal_func
notify_func1
notify_func2
notify_func1
signal_func
notify_func1
notify_func2
notify_func1
notify_func2
signal_func
notify_func1
notify_func1
notify_func2
signal_func
notify_func1
notify_func2
notify_func1
signal_func
notify_func1
notify_func2
notify_func1
notify_func2
signal_func
notify_func1
notify_func1
notify_func2
signal_func
notify_func1
notify_func2
notify_func1
signal_func
notify_func1
notify_func2
notify_func1
notify_func2
signal_func
notify_func1
notify_func1
notify_func2
signal_func
notify_func1
notify_func2
notify_func1
signal_func
Segmentation fault

# dmesg
tst-timer[6147]: segfault at 0000000000000000 rip 00002b8eaec8e9f0 rsp
0000000040804128 error 6

this statement means the process 6147 try to write to address 0000000000000000
at user-mode.

The root cause is this problem: when one thread try to access a block of
memory, but very unfortunately, this block of memory is freed by another
thread. from the execution path of the 64-bit program, we can see there is no
mutex mechanism to protect this critical section.

Totally, there are two timer_delete.c's and each includes a implementation of
timer_delete function in glibc:
./nptl/sysdeps/pthread/timer_delete.c
./nptl/sysdeps/unix/sysv/linux/timer_delete.c
if the syscall timer_delete is not available, it will call the timer_delete
function in the first one, else it will call to the next one, currently, the
syscall timer_delete is implemented by the kernel. A very important difference
between the two implementations of timer_delete is the first one has a pthread
mutex lock, I don't know why this lock is removed in the next version, it
seems we still need a mutex mechanism to protect the critical section.

--
           Summary: when run tst-timer on x86_64, it causes a segfault
           Product: glibc
           Version: 2.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: huangjq at cn dot ibm dot com
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de

------- Additional Comments From jakub at redhat dot com  2006-09-12 08:56 -------
I cannot reproduce this myself.
If you have analyzed this, can you please put in details?
nptl/sysdeps/pthread/timer_*.c needs the mutex to guard the global variables
the pure userland implementation uses (see the vars in timer_routines.c),
but there is no such things when using kernel timers, the state is kept in
the kernel, so there is no need for the mutex.
When you talk about "a block of memory", can you please be a little bit more
specific on which exact block of memory it is (allocated by which function, used
for what) and which function accesses it after it is freed?
Which critical section are you talking about?

--
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de

------- Additional Comments From huangjq at cn dot ibm dot com  2006-09-12 11:14 -------
##################################################
nptl/sysdeps/unix/sysv/linux/timer_routines.c
##################################################
static void *
timer_helper_thread (void *arg)
{
.....
      int result = INLINE_SYSCALL (rt_sigtimedwait, 4, &ss, &si, NULL,
                                   _NSIG / 8);

      LIBC_CANCEL_RESET (oldtype);

      if (result > 0)
        {      
          if (si.si_code == SI_TIMER)
            {      
              struct timer *tk = (struct timer *) si.si_ptr;

              /* That the signal we are waiting for.  */
              pthread_t th;
              (void) pthread_create (&th, &tk->attr, timer_sigev_thread, tk);
            }
......

static void *
timer_sigev_thread (void *arg)
{
  /* The parent thread has all signals blocked.  This is a bit
     surprising for user code, although valid.  We unblock all
     signals.  */
  sigset_t ss;
  sigemptyset (&ss);
  INTERNAL_SYSCALL_DECL (err);
  INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_SETMASK, &ss, NULL, _NSIG / 8);

  struct timer *tk = (struct timer *) arg; <----- critical section

  /* Call the user-provided function.  */
  tk->thrfunc (tk->sival);

  return NULL;
}

##############################################
/nptl/sysdeps/unix/sysv/linux/timer_delete.c
##############################################
int
timer_delete (timerid)
     timer_t timerid;
{
# undef timer_delete
# ifndef __ASSUME_POSIX_TIMERS
  if (__no_posix_timers >= 0)
# endif
    {
      struct timer *kt = (struct timer *) timerid;

      /* Delete the kernel timer object.  */
      int res = INLINE_SYSCALL (timer_delete, 1, kt->ktimerid);

      if (res == 0)
        {
# ifndef __ASSUME_POSIX_TIMERS
          /* We know the syscall support is available.  */
          __no_posix_timers = 1;
# endif

          /* Free the memory.  */
          (void) free (kt);     <----- critical section

          return 0;
        }

From above code, we can see, when a timer event trigger, it will create a
thread and invoke function timer_sigev_thread(supposing it running on CPU0),
before this thread access the struct tk, this program(tst-timer) call
timer_delete to delete this struct, yes, it succeed, this struct is freed by
timer_delete(supposing it running on CPU1), now when the first thread try to
access tk, it will cause a segfault, I suspect this problem only exists on SMP
machine. when we compile tst-timer.c to 32-bit, it work fine, I don't know
why? here is a patch from drepper:
from the nptl/Changelog

http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/ChangeLog.diff?
r1=1.888&r2=1.889&cvsroot=glibc

2006-04-27  Ulrich Drepper  <[hidden email]>
       * sysdeps/unix/sysv/linux/timer_routines.c (timer_helper_thread):
         Allocate new object which is passed to timer_sigev_thread so that
         the timer can be deleted before the new thread is scheduled.

This is the fix in question.

http://sources.redhat.com/cgi-
bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/timer_routines.c.diff?
r1=1.8&r2=1.9&cvsroot=glibc

--


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de

------- Additional Comments From huangjq at cn dot ibm dot com  2006-09-12 11:21 -------
x86_64 and ppc64(POWER4/5 Power970) have this problem.

--


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de

------- Additional Comments From jakub at redhat dot com  2006-09-12 11:34 -------
If
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/unix/sysv/linux/timer_routines.c.diff?r1=1.8&r2=1.9&cvsroot=glibc
fixes this for you (I was testing with current CVS), then why are you filling
this bug?  There is no 2.4 branch, so there is not a branch it could be
backported to.


--


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de

------- Additional Comments From huangjq at cn dot ibm dot com  2006-09-13 03:08 -------
I am using glibc-2.4 from SLES10-GMC which has this defect, didn't check this
from the current glibc CVS, if current CVS already includes this fix, you can
reject and close defect, thanks!

--


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug nptl/3086] when run tst-timer on x86_64, it causes a segfault

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de

------- Additional Comments From jakub at redhat dot com  2006-09-13 07:27 -------
Guess you should report it to SUSE instead.

--
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=3086

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.