[PATCH 0/5] Restartable Sequences support for glibc 2.30

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/5] Restartable Sequences support for glibc 2.30

Mathieu Desnoyers-4
Hi,

This patchset implements basic support for the "rseq" Linux system call
in glibc by registering the rseq TLS abi.

One patch in this series modifies sched_getcpu() to speed up reading the
current CPU number by reading __rseq_abi.cpu_id when rseq is available.

Some of the Reviewed-by tags provided in the last round were not
included because additional changes were made.

Please consider for inclusion into glibc,

Thanks,

Mathieu

Mathieu Desnoyers (5):
  glibc: Perform rseq(2) registration at C startup and thread creation
    (v8)
  glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v2)
  support record failure: allow use from constructor
  support: implement xpthread key create/delete
  rseq registration tests (v3)

 NEWS                                          |  15 +
 csu/libc-start.c                              |  14 +-
 misc/rseq-internal.h                          |  39 ++
 nptl/pthread_create.c                         |   9 +
 support/Makefile                              |   2 +
 support/check.h                               |   4 +
 support/support_record_failure.c              |  18 +-
 support/xpthread_key_create.c                 |  25 ++
 support/xpthread_key_delete.c                 |  25 ++
 support/xthread.h                             |   2 +
 sysdeps/unix/sysv/linux/Makefile              |   8 +-
 sysdeps/unix/sysv/linux/Versions              |   4 +
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   |  32 ++
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arm/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/bits/rseq.h           |  30 ++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../unix/sysv/linux/microblaze/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/rseq-internal.h       |  89 +++++
 sysdeps/unix/sysv/linux/rseq-sym.c            |  64 +++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      |  31 ++
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sched_getcpu.c        |  27 +-
 sysdeps/unix/sysv/linux/sh/libc.abilist       |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/sys/rseq.h            |  51 +++
 sysdeps/unix/sysv/linux/tst-rseq-nptl.c       | 367 ++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq.c            | 114 ++++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       |  31 ++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 49 files changed, 1041 insertions(+), 14 deletions(-)
 create mode 100644 misc/rseq-internal.h
 create mode 100644 support/xpthread_key_create.c
 create mode 100644 support/xpthread_key_delete.c
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h

--
2.17.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

See the rseq(2) man page proposed here:
  https://lkml.org/lkml/2018/9/19/647

This patch is based on glibc-2.29. The rseq(2) system call was merged
into Linux 4.18.

Signed-off-by: Mathieu Desnoyers <[hidden email]>
CC: Carlos O'Donell <[hidden email]>
CC: Florian Weimer <[hidden email]>
CC: Joseph Myers <[hidden email]>
CC: Szabolcs Nagy <[hidden email]>
CC: Thomas Gleixner <[hidden email]>
CC: Ben Maurer <[hidden email]>
CC: Peter Zijlstra <[hidden email]>
CC: "Paul E. McKenney" <[hidden email]>
CC: Boqun Feng <[hidden email]>
CC: Will Deacon <[hidden email]>
CC: Dave Watson <[hidden email]>
CC: Paul Turner <[hidden email]>
CC: Rich Felker <[hidden email]>
CC: [hidden email]
CC: [hidden email]
CC: [hidden email]
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
  eliminate one symbol.

  All libraries/programs which try to register rseq (glibc,
  early-adopter applications, early-adopter libraries) should use the
  rseq refcount. It becomes part of the ABI within a user-space
  process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
  non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
  nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
  trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
  across glibc, including memory allocator and sched_getcpu(). The
  __rseq_refcount symbol is kept in nptl, because there is no reason
  to use it elsewhere in glibc.

Changes since v3:
- Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
  because glibc is the first/last user.
- Unconditionally register/unregister rseq at thread start/exit, because
  glibc is the first/last user.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.

Changes since v4:
- Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on
  "System V Application Binary Interface", weak only affects the link
  editor, not the dynamic linker.
- Install a new sys/rseq.h system header on Linux, which contains the
  RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
  declaration. Move those definition/declarations from rseq-internal.h
  to the installed sys/rseq.h header.
- Considering that rseq is only available on Linux, move csu/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c.
- Move __rseq_refcount from nptl/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
- Move both ABI definitions for __rseq_abi and __rseq_refcount to
  sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
- Document __rseq_abi and __rseq_refcount volatile.
- Document the RSEQ_SIG signature define.
- Move registration functions from rseq.c to rseq-internal.h static
  inline functions. Introduce empty stubs in misc/rseq-internal.h,
  which can be overridden by architecture code in
  sysdeps/unix/sysv/linux/rseq-internal.h.
- Rename __rseq_register_current_thread and __rseq_unregister_current_thread
  to rseq_register_current_thread and rseq_unregister_current_thread,
  now that those are only visible as internal static inline functions.
- Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
  rather than nptl init, so applications not linked against
  libpthread.so have rseq registered for their main() thread. Note that
  it is invoked separately for SHARED and !SHARED builds.

Changes since v5:
- Replace __rseq_refcount by __rseq_lib_abi, which contains two
  uint32_t: register_state and refcount. The "register_state" field
  allows inhibiting rseq registration from signal handlers nested on top
  of glibc registration and occuring after rseq unregistration by glibc.
- Introduce enum rseq_register_state, which contains the states allowed
  for the struct rseq_lib_abi register_state field.

Changes since v6:
- Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
  The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
  architecture implementing rseq needs to implement bits/rseq.h.
- Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
- Port to glibc-2.29.

Changes since v7:
- Remove __rseq_lib_abi symbol, including refcount and register_state
  fields.
- Remove reference counting and nested signals handling from
  registration/unregistration functions.
- Introduce new __rseq_handled exported symbol, which is set to 1
  by glibc on C startup when it handles restartable sequences.
  This allows glibc to coexist with early adopter libraries and
  applications wishing to register restartable sequences when it
  is not handled by glibc.
- Introduce rseq_init (), which sets __rseq_handled to 1 from
  C startup.
- Update NEWS entry.
- Update comments at the beginning of new files.
- Registration depends on both __NR_rseq and RSEQ_SIG.
- Remove ARM, powerpc, MIPS RSEQ_SIG until we agree with maintainers
  on the signature choice.
- Update x86, s390 RSEQ_SIG based on discussion with arch maintainers.
- Remove rseq-internal.h from headers list of misc/Makefile, so it
  it not installed by make install.
---
 NEWS                                          | 15 ++++
 csu/libc-start.c                              | 14 ++-
 misc/rseq-internal.h                          | 39 ++++++++
 nptl/pthread_create.c                         |  9 ++
 sysdeps/unix/sysv/linux/Makefile              |  4 +-
 sysdeps/unix/sysv/linux/Versions              |  4 +
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   | 32 +++++++
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |  2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |  2 +
 sysdeps/unix/sysv/linux/arm/libc.abilist      |  2 +
 sysdeps/unix/sysv/linux/bits/rseq.h           | 30 +++++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |  2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |  2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |  2 +
 .../unix/sysv/linux/microblaze/libc.abilist   |  2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |  2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |  2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |  2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |  2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |  2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |  2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |  2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |  2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/rseq-internal.h       | 89 +++++++++++++++++++
 sysdeps/unix/sysv/linux/rseq-sym.c            | 64 +++++++++++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      | 31 +++++++
 .../unix/sysv/linux/s390/s390-32/libc.abilist |  2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |  2 +
 sysdeps/unix/sysv/linux/sh/libc.abilist       |  2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |  2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 51 +++++++++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       | 31 +++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |  2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |  2 +
 40 files changed, 462 insertions(+), 5 deletions(-)
 create mode 100644 misc/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h

diff --git a/NEWS b/NEWS
index 912a9bdc0f..7276a09b08 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,21 @@ See the end for copying conditions.
 Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
 using `glibc' in the "product" field.
 
+Version 2.30
+
+Major new features:
+
+* Support for automatically registering threads with the Linux rseq(2)
+  system call has been added.  This system call is implemented starting
+  from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
+  operations on per-cpu data.  It allows user-space to perform updates
+  on per-cpu data without requiring heavy-weight atomic operations.
+  Automatically registering threads allows all libraries, including libc,
+  to make immediate use of the rseq(2) support by using the documented ABI.
+  See 'man 2 rseq' for the details of the ABI shared between libc and the
+  kernel.
+
+
 Version 2.29
 
 Major new features:
diff --git a/csu/libc-start.c b/csu/libc-start.c
index 5d9c3675fa..e101196b0d 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -22,6 +22,7 @@
 #include <ldsodefs.h>
 #include <exit-thread.h>
 #include <libc-internal.h>
+#include <rseq-internal.h>
 
 #include <elf/dl-tunables.h>
 
@@ -140,7 +141,12 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
 
-#ifndef SHARED
+  rseq_init ();
+
+#ifdef SHARED
+  /* Register rseq ABI to the kernel. */
+  (void) rseq_register_current_thread ();
+#else
   _dl_relocate_static_pie ();
 
   char **ev = &argv[argc + 1];
@@ -218,6 +224,9 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
     }
 # endif
 
+  /* Register rseq ABI to the kernel. */
+  (void) rseq_register_current_thread ();
+
   /* Initialize libpthread if linked in.  */
   if (__pthread_initialize_minimal != NULL)
     __pthread_initialize_minimal ();
@@ -230,8 +239,7 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 # else
   __pointer_chk_guard_local = pointer_chk_guard;
 # endif
-
-#endif /* !SHARED  */
+#endif
 
   /* Register the destructor of the dynamic linker if there is any.  */
   if (__glibc_likely (rtld_fini != NULL))
diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h
new file mode 100644
index 0000000000..b6159319c8
--- /dev/null
+++ b/misc/rseq-internal.h
@@ -0,0 +1,39 @@
+/* Restartable Sequences internal API. Stub version.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_init (void)
+{
+}
+
+#endif /* rseq-internal.h */
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 2bd2b10727..90b3419390 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -33,6 +33,7 @@
 #include <default-sched.h>
 #include <futex-internal.h>
 #include <tls-setup.h>
+#include <rseq-internal.h>
 #include "libioP.h"
 
 #include <shlib-compat.h>
@@ -378,6 +379,7 @@ __free_tcb (struct pthread *pd)
 START_THREAD_DEFN
 {
   struct pthread *pd = START_THREAD_SELF;
+  bool has_rseq = false;
 
 #if HP_TIMING_AVAIL
   /* Remember the time when the thread was started.  */
@@ -396,6 +398,9 @@ START_THREAD_DEFN
   if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
     futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
 
+  /* Register rseq TLS to the kernel. */
+  has_rseq = !rseq_register_current_thread ();
+
 #ifdef __NR_set_robust_list
 # ifndef __ASSUME_SET_ROBUST_LIST
   if (__set_robust_list_avail >= 0)
@@ -573,6 +578,10 @@ START_THREAD_DEFN
     }
 #endif
 
+  /* Unregister rseq TLS from kernel. */
+  if (has_rseq && rseq_unregister_current_thread ())
+    abort();
+
   advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
       pd->guardsize);
 
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 5f8c2c7c7d..5b541469ec 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -1,5 +1,5 @@
 ifeq ($(subdir),csu)
-sysdep_routines += errno-loc
+sysdep_routines += errno-loc rseq-sym
 endif
 
 ifeq ($(subdir),assert)
@@ -48,7 +48,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
   bits/termios-c_iflag.h bits/termios-c_oflag.h \
   bits/termios-baud.h bits/termios-c_cflag.h \
   bits/termios-c_lflag.h bits/termios-tcflow.h \
-  bits/termios-misc.h
+  bits/termios-misc.h sys/rseq.h bits/rseq.h
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
  tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index f1e12d9c69..bee3d727e5 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -174,6 +174,10 @@ libc {
   GLIBC_2.29 {
     getcpu;
   }
+  GLIBC_2.30 {
+    __rseq_abi;
+    __rseq_handled;
+  }
   GLIBC_PRIVATE {
     # functions used in other libraries
     __syscall_rt_sigqueueinfo;
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
new file mode 100644
index 0000000000..b02471a89a
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
@@ -0,0 +1,32 @@
+/* Restartable Sequences Linux aarch64 architecture header.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.  */
+
+#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 9c330f325e..331f39e41a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2141,3 +2141,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f630fa4c6f..05dfdd3393 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2204,6 +2204,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index b96f45590f..24e9b89a50 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -126,6 +126,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
new file mode 100644
index 0000000000..2f3e4c0e21
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/rseq.h
@@ -0,0 +1,30 @@
+/* Restartable Sequences architecture header. Stub version.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.  */
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 019044c3cd..e2b0538088 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2085,3 +2085,5 @@ GLIBC_2.29 xdrstdio_create F
 GLIBC_2.29 xencrypt F
 GLIBC_2.29 xprt_register F
 GLIBC_2.29 xprt_unregister F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 088a8ee369..263a91b97e 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2037,6 +2037,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index f7ff2c57b9..18ce09d48a 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2203,6 +2203,8 @@ GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 vm86 F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index becd8b1033..b61e2ee010 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2069,6 +2069,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 74e42a5209..e55792bb22 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -127,6 +127,8 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4af5a74e8a..9845499048 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2146,6 +2146,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index ccef673fd2..1aba8cb86c 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2133,3 +2133,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 1054bb599e..df54e2adab 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2120,6 +2120,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 4f5b5ffebf..ce95ae7e86 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2118,6 +2118,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 943aee58d4..c9fb5d2096 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2126,6 +2126,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 17a5d17ef9..6335df9acf 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2120,6 +2120,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 4d62a540fd..5465b96768 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2174,3 +2174,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index ecc2d6fa13..eb3808dbd4 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2164,6 +2164,8 @@ GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index f5830f9c33..6a49a7b718 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2197,6 +2197,8 @@ GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 633d8f4792..83177dc75f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2027,6 +2027,8 @@ GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 2c712636ef..e714de994c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2231,3 +2231,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 195bc8b2cf..d190623993 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2103,3 +2103,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
new file mode 100644
index 0000000000..a27324ac28
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -0,0 +1,89 @@
+/* Restartable Sequences internal API. Linux implementation.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+#include <sysdep.h>
+#include <errno.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+
+static inline int
+rseq_register_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED)
+    return -1;
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              0, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY)
+    __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
+  ret = -1;
+end:
+  return ret;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  ret = -1;
+end:
+  return ret;
+}
+
+static inline void
+rseq_init (void)
+{
+  __rseq_handled = 1;
+}
+#else
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+static inline void
+rseq_init (void)
+{
+}
+#endif
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c
new file mode 100644
index 0000000000..65403807c8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-sym.c
@@ -0,0 +1,64 @@
+/* Restartable Sequences exported symbols. Linux Implementation.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/syscall.h>
+#include <stdint.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#else
+
+enum rseq_cpu_id_state {
+  RSEQ_CPU_ID_UNINITIALIZED = -1,
+  RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
+};
+
+/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI
+   size is 20 bytes.  */
+struct rseq {
+  uint32_t cpu_id_start;
+  uint32_t cpu_id;
+  uint64_t rseq_cs;
+  uint32_t flags;
+} __attribute__ ((aligned(4 * sizeof(uint64_t))));
+
+#endif
+
+/* volatile because fields can be read/updated by the kernel.  */
+__thread volatile struct rseq __rseq_abi = {
+  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.  */
+
+int __rseq_handled;
diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
new file mode 100644
index 0000000000..7eba4042ea
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
@@ -0,0 +1,31 @@
+/* Restartable Sequences Linux s390 architecture header.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   RSEQ_SIG uses the trap4 instruction. As Linux does not make use of the
+   access-register mode nor the linkage stack this instruction will always
+   cause a special-operation exception (the trap-enabled bit in the DUCT
+   is and will stay 0). The instruction pattern is
+ b2 ff 0f ff trap4   4095(%r0)  */
+
+#define RSEQ_SIG 0xB2FF0FFF
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 334def033c..dacae17ec4 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2159,6 +2159,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 536f4c4ced..c277b3bd90 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2063,6 +2063,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index 30ae3b6ebb..5f70e5c53b 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -2041,6 +2041,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 68b107d080..537da009d3 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2153,6 +2153,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index e5b6a4da50..1fee8e34fc 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2092,6 +2092,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
new file mode 100644
index 0000000000..c48a4bf8ff
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -0,0 +1,51 @@
+/* Restartable Sequences exported symbols. Linux header.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+#define _SYS_RSEQ_H 1
+
+/* We use the structures declarations from the kernel headers.  */
+#include <linux/rseq.h>
+/* Architecture-specific rseq signature.  */
+#include <bits/rseq.h>
+#include <stdint.h>
+
+/* volatile because fields can be read/updated by the kernel.  */
+extern __thread volatile struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.  */
+
+extern int __rseq_handled;
+
+#endif /* sys/rseq.h */
diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
new file mode 100644
index 0000000000..8064dda509
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
@@ -0,0 +1,31 @@
+/* Restartable Sequences Linux x86 architecture header.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   RSEQ_SIG is used with the following reserved undefined instructions, which
+   trap in user-space:
+
+   x86-32:    0f b9 3d 53 30 05 53      ud1    0x53053053,%edi
+   x86-64:    0f b9 3d 53 30 05 53      ud1    0x53053053(%rip),%edi  */
+
+#define RSEQ_SIG 0x53053053
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 86dfb0c94d..a834f65383 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2050,6 +2050,8 @@ GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index dd688263aa..fb8417bde7 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2149,3 +2149,5 @@ GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
--
2.17.1



Reply | Threaded
Open this post in threaded view
|

[PATCH 2/5] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v2)

Mathieu Desnoyers-4
In reply to this post by Mathieu Desnoyers-4
When available, use the cpu_id field from __rseq_abi on Linux to
implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable.

Benchmarks:

x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

glibc sched_getcpu():                     13.7 ns (baseline)
glibc sched_getcpu() using rseq:           2.5 ns (speedup:  5.5x)
inline load cpuid from __rseq_abi TLS:     0.8 ns (speedup: 17.1x)

Signed-off-by: Mathieu Desnoyers <[hidden email]>
CC: Carlos O'Donell <[hidden email]>
CC: Florian Weimer <[hidden email]>
CC: Joseph Myers <[hidden email]>
CC: Szabolcs Nagy <[hidden email]>
CC: Thomas Gleixner <[hidden email]>
CC: Ben Maurer <[hidden email]>
CC: Peter Zijlstra <[hidden email]>
CC: "Paul E. McKenney" <[hidden email]>
CC: Boqun Feng <[hidden email]>
CC: Will Deacon <[hidden email]>
CC: Dave Watson <[hidden email]>
CC: Paul Turner <[hidden email]>
CC: [hidden email]
CC: [hidden email]
CC: [hidden email]
---
Changes since v1:
- rseq is only used if both __NR_rseq and RSEQ_SIG are defined.
---
 sysdeps/unix/sysv/linux/sched_getcpu.c | 27 ++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c
index fb0d317f83..f9466c3b22 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -24,8 +24,8 @@
 #endif
 #include <sysdep-vdso.h>
 
-int
-sched_getcpu (void)
+static int
+vsyscall_sched_getcpu (void)
 {
 #ifdef __NR_getcpu
   unsigned int cpu;
@@ -37,3 +37,26 @@ sched_getcpu (void)
   return -1;
 #endif
 }
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+extern __attribute__ ((tls_model ("initial-exec")))
+__thread volatile struct rseq __rseq_abi;
+
+int
+sched_getcpu (void)
+{
+  int cpu_id = __rseq_abi.cpu_id;
+
+  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else
+int
+sched_getcpu (void)
+{
+  return vsyscall_sched_getcpu ();
+}
+#endif
--
2.17.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 3/5] support record failure: allow use from constructor

Mathieu Desnoyers-4
In reply to this post by Mathieu Desnoyers-4
Expose support_record_failure_init () so constructors can explicitly
initialize the record failure API.

This is preferred to lazy initialization at first use, because
lazy initialization does not cover use in constructors within
forked children processes (forked from parent constructor).

Signed-off-by: Mathieu Desnoyers <[hidden email]>
Reviewed-by: Carlos O'Donell <[hidden email]>
CC: Carlos O'Donell <[hidden email]>
CC: Florian Weimer <[hidden email]>
CC: Joseph Myers <[hidden email]>
CC: Szabolcs Nagy <[hidden email]>
CC: [hidden email]
---
 support/check.h                  |  4 ++++
 support/support_record_failure.c | 18 +++++++++++++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/support/check.h b/support/check.h
index eb3d2487a7..f8684a477a 100644
--- a/support/check.h
+++ b/support/check.h
@@ -88,6 +88,10 @@ void support_test_verify_exit_impl (int status, const char *file, int line,
    does not support reporting failures from a DSO.  */
 void support_record_failure (void);
 
+/* Initialize record failure.  Calling this is only needed when
+   recording failures from constructors.  */
+void support_record_failure_init (void);
+
 /* Static assertion, under a common name for both C++ and C11.  */
 #ifdef __cplusplus
 # define support_static_assert static_assert
diff --git a/support/support_record_failure.c b/support/support_record_failure.c
index a8ffd1fb7d..86d9c71409 100644
--- a/support/support_record_failure.c
+++ b/support/support_record_failure.c
@@ -32,8 +32,12 @@
    zero, the failure of a test can be detected.
 
    The init constructor function below puts *state on a shared
-   annonymous mapping, so that failure reports from subprocesses
-   propagate to the parent process.  */
+   anonymous mapping, so that failure reports from subprocesses
+   propagate to the parent process.
+
+   support_record_failure_init is exposed so it can be called explicitly
+   in case this API needs to be used from a constructor.  */
+
 struct test_failures
 {
   unsigned int counter;
@@ -41,10 +45,14 @@ struct test_failures
 };
 static struct test_failures *state;
 
-static __attribute__ ((constructor)) void
-init (void)
+__attribute__ ((constructor)) void
+support_record_failure_init (void)
 {
-  void *ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE,
+  void *ptr;
+
+  if (state != NULL)
+    return;
+  ptr = mmap (NULL, sizeof (*state), PROT_READ | PROT_WRITE,
                     MAP_ANONYMOUS | MAP_SHARED, -1, 0);
   if (ptr == MAP_FAILED)
     {
--
2.17.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 4/5] support: implement xpthread key create/delete

Mathieu Desnoyers-4
In reply to this post by Mathieu Desnoyers-4
Expose xpthread_key_create () and xpthread_key_delete () wrappers
for tests.

Signed-off-by: Mathieu Desnoyers <[hidden email]>
CC: Carlos O'Donell <[hidden email]>
CC: Florian Weimer <[hidden email]>
CC: Joseph Myers <[hidden email]>
CC: Szabolcs Nagy <[hidden email]>
CC: [hidden email]
---
 support/Makefile              |  2 ++
 support/xpthread_key_create.c | 25 +++++++++++++++++++++++++
 support/xpthread_key_delete.c | 25 +++++++++++++++++++++++++
 support/xthread.h             |  2 ++
 4 files changed, 54 insertions(+)
 create mode 100644 support/xpthread_key_create.c
 create mode 100644 support/xpthread_key_delete.c

diff --git a/support/Makefile b/support/Makefile
index 432cf2fe6c..7ae0d9171d 100644
--- a/support/Makefile
+++ b/support/Makefile
@@ -116,6 +116,8 @@ libsupport-routines = \
   xpthread_create \
   xpthread_detach \
   xpthread_join \
+  xpthread_key_create \
+  xpthread_key_delete \
   xpthread_mutex_consistent \
   xpthread_mutex_destroy \
   xpthread_mutex_init \
diff --git a/support/xpthread_key_create.c b/support/xpthread_key_create.c
new file mode 100644
index 0000000000..a493de6c99
--- /dev/null
+++ b/support/xpthread_key_create.c
@@ -0,0 +1,25 @@
+/* pthread_key_create with error checking.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/xthread.h>
+
+void
+xpthread_key_create (pthread_key_t *key, void (*destr_function) (void *))
+{
+  xpthread_check_return ("pthread_key_create", pthread_key_create (key, destr_function));
+}
diff --git a/support/xpthread_key_delete.c b/support/xpthread_key_delete.c
new file mode 100644
index 0000000000..abf758c7c8
--- /dev/null
+++ b/support/xpthread_key_delete.c
@@ -0,0 +1,25 @@
+/* pthread_key_delete with error checking.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/xthread.h>
+
+void
+xpthread_key_delete (pthread_key_t key)
+{
+  xpthread_check_return ("pthread_key_delete", pthread_key_delete (key));
+}
diff --git a/support/xthread.h b/support/xthread.h
index 47c23235f3..fce3435d65 100644
--- a/support/xthread.h
+++ b/support/xthread.h
@@ -84,6 +84,8 @@ void xpthread_rwlockattr_setkind_np (pthread_rwlockattr_t *attr, int pref);
 void xpthread_rwlock_wrlock (pthread_rwlock_t *rwlock);
 void xpthread_rwlock_rdlock (pthread_rwlock_t *rwlock);
 void xpthread_rwlock_unlock (pthread_rwlock_t *rwlock);
+void xpthread_key_create (pthread_key_t *key, void (*destr_function) (void *));
+void xpthread_key_delete (pthread_key_t key);
 
 __END_DECLS
 
--
2.17.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 5/5] rseq registration tests (v3)

Mathieu Desnoyers-4
In reply to this post by Mathieu Desnoyers-4
These tests validate that rseq is registered from various execution
contexts (main thread, constructor, destructor, other threads, other
threads created from constructor and destructor, forked process
(without exec), pthread_atfork handlers, pthread setspecific
destructors, C++ thread and process destructors, signal handlers,
atexit handlers).

tst-rseq.c only links against libc.so, testing registration of rseq in
a non-multithreaded environment.

tst-rseq-nptl.c also links against libpthread.so, testing registration
of rseq in a multithreaded environment.

See the Linux kernel selftests for extensive rseq stress-tests.

Signed-off-by: Mathieu Desnoyers <[hidden email]>
CC: Carlos O'Donell <[hidden email]>
CC: Florian Weimer <[hidden email]>
CC: Joseph Myers <[hidden email]>
CC: Szabolcs Nagy <[hidden email]>
CC: Thomas Gleixner <[hidden email]>
CC: Ben Maurer <[hidden email]>
CC: Peter Zijlstra <[hidden email]>
CC: "Paul E. McKenney" <[hidden email]>
CC: Boqun Feng <[hidden email]>
CC: Will Deacon <[hidden email]>
CC: Dave Watson <[hidden email]>
CC: Paul Turner <[hidden email]>
CC: [hidden email]
---
Changes since v1:
- Rename tst-rseq.c to tst-rseq-nptl.c.
- Introduce tst-rseq.c testing rseq registration in a non-multithreaded
  environment.

Chances since v2:
- Update file headers.
- use xpthread key create/delete.
- remove set stacksize.
- Tests depend on both __NR_rseq and RSEQ_SIG being defined.
---
 sysdeps/unix/sysv/linux/Makefile        |   4 +-
 sysdeps/unix/sysv/linux/tst-rseq-nptl.c | 367 ++++++++++++++++++++++++
 sysdeps/unix/sysv/linux/tst-rseq.c      | 114 ++++++++
 3 files changed, 483 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq-nptl.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-rseq.c

diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 5b541469ec..5f69f644a8 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -53,7 +53,7 @@ sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
  tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
  test-errno-linux tst-memfd_create tst-mlock2 tst-pkey \
- tst-rlimit-infinity tst-ofdlocks
+ tst-rlimit-infinity tst-ofdlocks tst-rseq
 tests-internal += tst-ofdlocks-compat
 
 
@@ -230,5 +230,5 @@ ifeq ($(subdir),nptl)
 tests += tst-align-clone tst-getpid1 \
  tst-thread-affinity-pthread tst-thread-affinity-pthread2 \
  tst-thread-affinity-sched
-tests-internal += tst-setgetname
+tests-internal += tst-setgetname tst-rseq-nptl
 endif
diff --git a/sysdeps/unix/sysv/linux/tst-rseq-nptl.c b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
new file mode 100644
index 0000000000..9bc2f244e7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq-nptl.c
@@ -0,0 +1,367 @@
+/* Restartable Sequences NPTL test.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* These tests validate that rseq is registered from various execution
+   contexts (main thread, constructor, destructor, other threads, other
+   threads created from constructor and destructor, forked process
+   (without exec), pthread_atfork handlers, pthread setspecific
+   destructors, C++ thread and process destructors, signal handlers,
+   atexit handlers).
+
+   See the Linux kernel selftests for extensive rseq stress-tests.  */
+
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <support/check.h>
+#include <support/xthread.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+#include <pthread.h>
+#include <syscall.h>
+#include <stdlib.h>
+#include <error.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <signal.h>
+#include <atomic.h>
+
+static pthread_key_t rseq_test_key;
+
+static int
+rseq_thread_registered (void)
+{
+  return (int32_t) __rseq_abi.cpu_id >= 0;
+}
+
+static int
+do_rseq_main_test (void)
+{
+  if (raise (SIGUSR1))
+    FAIL_EXIT1 ("error raising signal");
+  if (pthread_setspecific (rseq_test_key, (void *) 1l))
+    FAIL_EXIT1 ("error in pthread_setspecific");
+  if (!rseq_thread_registered ())
+    {
+      FAIL_RET ("rseq not registered in main thread");
+    }
+  return 0;
+}
+
+static void
+cancel_routine (void *arg)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("rseq not registered in cancel routine\n");
+      support_record_failure ();
+    }
+}
+
+static int cancel_thread_ready;
+
+static void
+test_cancel_thread (void)
+{
+  pthread_cleanup_push (cancel_routine, NULL);
+  atomic_store_release (&cancel_thread_ready, 1);
+  for (;;)
+    usleep (100);
+  pthread_cleanup_pop (0);
+}
+
+static void *
+thread_function (void * arg)
+{
+  int i = (int) (intptr_t) arg;
+
+  if (raise (SIGUSR1))
+    FAIL_EXIT1 ("error raising signal");
+  if (i == 0)
+    test_cancel_thread ();
+  if (pthread_setspecific (rseq_test_key, (void *) 1l))
+    FAIL_EXIT1 ("error in pthread_setspecific");
+  return rseq_thread_registered () ? NULL : (void *) 1l;
+}
+
+static void
+sighandler (int sig)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("rseq not registered in signal handler\n");
+      support_record_failure ();
+    }
+}
+
+static void
+setup_signals (void)
+{
+  struct sigaction sa;
+
+  sigemptyset (&sa.sa_mask);
+  sigaddset (&sa.sa_mask, SIGUSR1);
+  sa.sa_flags = 0;
+  sa.sa_handler = sighandler;
+  if (sigaction (SIGUSR1, &sa, NULL) != 0)
+    {
+      FAIL_EXIT1 ("sigaction failure: %s", strerror (errno));
+    }
+}
+
+#define N 7
+static const int t[N] = { 1, 2, 6, 5, 4, 3, 50 };
+
+static int
+do_rseq_threads_test (int nr_threads)
+{
+  pthread_t th[nr_threads];
+  int i;
+  int result = 0;
+  pthread_attr_t at;
+
+  if (pthread_attr_init (&at) != 0)
+    {
+      FAIL_EXIT1 ("attr_init failed");
+    }
+
+  cancel_thread_ready = 0;
+  for (i = 0; i < nr_threads; ++i)
+    if (pthread_create (&th[i], NULL, thread_function,
+                        (void *) (intptr_t) i) != 0)
+      {
+        FAIL_EXIT1 ("creation of thread %d failed", i);
+      }
+
+  if (pthread_attr_destroy (&at) != 0)
+    {
+      FAIL_EXIT1 ("attr_destroy failed");
+    }
+
+  while (!atomic_load_acquire (&cancel_thread_ready))
+    usleep (100);
+
+  if (pthread_cancel (th[0]))
+    FAIL_EXIT1 ("error in pthread_cancel");
+
+  for (i = 0; i < nr_threads; ++i)
+    {
+      void *v;
+      if (pthread_join (th[i], &v) != 0)
+        {
+          printf ("join of thread %d failed\n", i);
+          result = 1;
+        }
+      else if (i != 0 && v != NULL)
+        {
+          printf ("join %d successful, but child failed\n", i);
+          result = 1;
+        }
+      else if (i == 0 && v == NULL)
+        {
+          printf ("join %d successful, child did not fail as expected\n", i);
+          result = 1;
+        }
+    }
+  return result;
+}
+
+static int
+sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len,
+          int flags, uint32_t sig)
+{
+  return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+static int
+rseq_available (void)
+{
+  int rc;
+
+  rc = sys_rseq (NULL, 0, 0, 0);
+  if (rc != -1)
+    FAIL_EXIT1 ("Unexpected rseq return value %d", rc);
+  switch (errno)
+    {
+    case ENOSYS:
+      return 0;
+    case EINVAL:
+      return 1;
+    default:
+      FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno));
+    }
+}
+
+static int
+do_rseq_fork_test (void)
+{
+  int status;
+  pid_t pid, retpid;
+
+  pid = fork ();
+  switch (pid)
+    {
+      case 0:
+        exit (do_rseq_main_test ());
+      case -1:
+        FAIL_EXIT1 ("Unexpected fork error %s", strerror (errno));
+    }
+  retpid = TEMP_FAILURE_RETRY (waitpid (pid, &status, 0));
+  if (retpid != pid)
+    {
+      FAIL_EXIT1 ("waitpid returned %ld, expected %ld",
+                  (long int) retpid, (long int) pid);
+    }
+  if (WEXITSTATUS (status))
+    {
+      printf ("rseq not registered in child\n");
+      return 1;
+    }
+  return 0;
+}
+
+static int
+do_rseq_test (void)
+{
+  int i, result = 0;
+
+  if (!rseq_available ())
+    {
+      FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+    }
+  setup_signals ();
+  if (raise (SIGUSR1))
+    FAIL_EXIT1 ("error raising signal");
+  if (do_rseq_main_test ())
+    result = 1;
+  for (i = 0; i < N; i++)
+    {
+      if (do_rseq_threads_test (t[i]))
+        result = 1;
+    }
+  if (do_rseq_fork_test ())
+    result = 1;
+  return result;
+}
+
+static void
+atfork_prepare (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("rseq not registered in pthread atfork prepare\n");
+      support_record_failure ();
+    }
+}
+
+static void
+atfork_parent (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("rseq not registered in pthread atfork parent\n");
+      support_record_failure ();
+    }
+}
+
+static void
+atfork_child (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      printf ("rseq not registered in pthread atfork child\n");
+      support_record_failure ();
+    }
+}
+
+static void
+rseq_key_destructor (void *arg)
+{
+  /* Cannot use deferred failure reporting after main () returns.  */
+  if (!rseq_thread_registered ())
+    FAIL_EXIT1 ("rseq not registered in pthread key destructor");
+}
+
+static void
+atexit_handler (void)
+{
+  /* Cannot use deferred failure reporting after main () returns.  */
+  if (!rseq_thread_registered ())
+    FAIL_EXIT1 ("rseq not registered in atexit handler");
+}
+
+static void __attribute__ ((constructor))
+do_rseq_constructor_test (void)
+{
+  support_record_failure_init ();
+  if (atexit (atexit_handler))
+    {
+      FAIL_EXIT1 ("error calling atexit");
+    }
+  xpthread_key_create (&rseq_test_key, rseq_key_destructor);
+  if (pthread_atfork (atfork_prepare, atfork_parent, atfork_child))
+    FAIL_EXIT1 ("error calling pthread_atfork");
+  if (do_rseq_test ())
+    FAIL_EXIT1 ("rseq not registered within constructor");
+}
+
+static void __attribute__ ((destructor))
+do_rseq_destructor_test (void)
+{
+  /* Cannot use deferred failure reporting after main () returns.  */
+  if (do_rseq_test ())
+    FAIL_EXIT1 ("rseq not registered within destructor");
+  xpthread_key_delete (rseq_test_key);
+}
+
+/* Test C++ destructor called at thread and process exit.  */
+void
+__call_tls_dtors (void)
+{
+  /* Cannot use deferred failure reporting after main () returns.  */
+  if (!rseq_thread_registered ())
+    FAIL_EXIT1 ("rseq not registered in C++ thread/process exit destructor");
+}
+#else
+static int
+do_rseq_test (void)
+{
+#ifndef __NR_rseq
+  FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test");
+#endif
+#ifndef RSEQ_SIG
+  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
+#endif
+  return 0;
+}
+#endif
+
+static int
+do_test (void)
+{
+  return do_rseq_test ();
+}
+
+#include <support/test-driver.c>
diff --git a/sysdeps/unix/sysv/linux/tst-rseq.c b/sysdeps/unix/sysv/linux/tst-rseq.c
new file mode 100644
index 0000000000..ce60af8ac8
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-rseq.c
@@ -0,0 +1,114 @@
+/* Restartable Sequences single-threaded tests.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* These tests validate that rseq is registered from main in an executable
+   not linked against libpthread.  */
+
+#include <sys/syscall.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <support/check.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+#include <syscall.h>
+#include <stdlib.h>
+#include <error.h>
+#include <errno.h>
+#include <stdint.h>
+#include <string.h>
+
+static int
+rseq_thread_registered (void)
+{
+  return (int32_t) __rseq_abi.cpu_id >= 0;
+}
+
+static int
+do_rseq_main_test (void)
+{
+  if (!rseq_thread_registered ())
+    {
+      FAIL_RET ("rseq not registered in main thread");
+    }
+  return 0;
+}
+
+static int
+sys_rseq (volatile struct rseq *rseq_abi, uint32_t rseq_len,
+          int flags, uint32_t sig)
+{
+  return syscall (__NR_rseq, rseq_abi, rseq_len, flags, sig);
+}
+
+static int
+rseq_available (void)
+{
+  int rc;
+
+  rc = sys_rseq (NULL, 0, 0, 0);
+  if (rc != -1)
+    FAIL_EXIT1 ("Unexpected rseq return value %d", rc);
+  switch (errno)
+    {
+    case ENOSYS:
+      return 0;
+    case EINVAL:
+      return 1;
+    default:
+      FAIL_EXIT1 ("Unexpected rseq error %s", strerror (errno));
+    }
+}
+
+static int
+do_rseq_test (void)
+{
+  int result = 0;
+
+  if (!rseq_available ())
+    {
+      FAIL_UNSUPPORTED ("kernel does not support rseq, skipping test");
+    }
+  if (do_rseq_main_test ())
+    result = 1;
+  return result;
+}
+#else
+static int
+do_rseq_test (void)
+{
+#ifndef __NR_rseq
+  FAIL_UNSUPPORTED ("kernel headers do not support rseq, skipping test");
+#endif
+#ifndef RSEQ_SIG
+  FAIL_UNSUPPORTED ("glibc does not define RSEQ_SIG, skipping test");
+#endif
+  return 0;
+}
+#endif
+
+static int
+do_test (void)
+{
+  return do_rseq_test ();
+}
+
+#include <support/test-driver.c>
--
2.17.1

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
In reply to this post by Mathieu Desnoyers-4
----- On Apr 16, 2019, at 1:32 PM, Mathieu Desnoyers [hidden email] wrote:

[...]

> diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> new file mode 100644
> index 0000000000..b02471a89a
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
> @@ -0,0 +1,32 @@
> +/* Restartable Sequences Linux aarch64 architecture header.
> +
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _SYS_RSEQ_H
> +# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
> +#endif
> +
> +/* RSEQ_SIG is a signature required before each abort handler code.
> +
> +   It is a 32-bit value that maps to actual architecture code compiled
> +   into applications and libraries. It needs to be defined for each
> +   architecture. When choosing this value, it needs to be taken into
> +   account that generating invalid instructions may have ill effects on
> +   tools like objdump, and may also have impact on the CPU speculative
> +   execution efficiency in some cases.  */
> +
> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */

After further investigation, we should probably do the following
to handle compiling with -mbig-endian on aarch64, which generates
binaries with mixed code vs data endianness (little endian code,
big endian data):

#ifdef __ARM_BIG_ENDIAN
#define RSEQ_SIG 0x00bc28d4 /* BRK #0x45E0.  */
#else
#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
#endif

Else mismatch between code endianness for the generated
signatures and data endianness for the RSEQ_SIG parameter
passed to the rseq registration will trigger application
segmentation faults when the kernel try to abort rseq
critical sections.

For ARM32, the situation is a bit more complex. Only armv6+
generates mixed-endianness code vs data with -mbig-endian.
Prior to armv6, the code and data endianness matches. Therefore,
I plan to #ifdef the reversed endianness handling with:

#if __ARM_ARCH >= 6 && __ARM_BIG_ENDIAN

on arm32.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Joseph Myers
On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:

> > +/* RSEQ_SIG is a signature required before each abort handler code.
> > +
> > +   It is a 32-bit value that maps to actual architecture code compiled
> > +   into applications and libraries. It needs to be defined for each
> > +   architecture. When choosing this value, it needs to be taken into
> > +   account that generating invalid instructions may have ill effects on
> > +   tools like objdump, and may also have impact on the CPU speculative
> > +   execution efficiency in some cases.  */
> > +
> > +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>
> After further investigation, we should probably do the following
> to handle compiling with -mbig-endian on aarch64, which generates
> binaries with mixed code vs data endianness (little endian code,
> big endian data):

First, the comment on RSEQ_SIG should specify whether it is to be
interpreted in the code or the data endianness.

> For ARM32, the situation is a bit more complex. Only armv6+
> generates mixed-endianness code vs data with -mbig-endian.
> Prior to armv6, the code and data endianness matches. Therefore,
> I plan to #ifdef the reversed endianness handling with:
>
> #if __ARM_ARCH >= 6 && __ARM_BIG_ENDIAN
>
> on arm32.

That doesn't work well because BE code (.o files) can be built for v5te
(for example) and used on a range of different architecture variants with
both BE32 and BE8 - the choice between BE32 and BE8 is a link-time choice,
not a compile-time choice.  So if the value for Arm is a compile-time
constant, it should also work for both BE32 and BE8.

In turn, that suggests to me that RSEQ_SIG should be defined to be a value
that is always in the code endianness (and whatever corresponding kernel
code handles RSEQ_SIG values should act accordingly on architectures where
the two endiannesses can differ).  If the kernel ABI is already fixed in a
way that prevents such a definition of RSEQ_SIG semantics as using code
endianness, a value should be chosen for Arm that works for both
endiannesses.

(Also, installed glibc headers are supposed to work with older compilers,
and support for __ARM_ARCH was only added in GCC 4.8.  Before that you
need to test lots of separate macros for different architecture variants
to determine a version number.)

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:

> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>
>> > +/* RSEQ_SIG is a signature required before each abort handler code.
>> > +
>> > +   It is a 32-bit value that maps to actual architecture code compiled
>> > +   into applications and libraries. It needs to be defined for each
>> > +   architecture. When choosing this value, it needs to be taken into
>> > +   account that generating invalid instructions may have ill effects on
>> > +   tools like objdump, and may also have impact on the CPU speculative
>> > +   execution efficiency in some cases.  */
>> > +
>> > +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>
>> After further investigation, we should probably do the following
>> to handle compiling with -mbig-endian on aarch64, which generates
>> binaries with mixed code vs data endianness (little endian code,
>> big endian data):
>
> First, the comment on RSEQ_SIG should specify whether it is to be
> interpreted in the code or the data endianness.

Right. The signature passed as argument to the rseq registration
system call needs to be in data endianness (currently exposed kernel
ABI).

Ideally for userspace, we want to define a signature in code endianness
that happens to nicely match specific code patterns.

>
>> For ARM32, the situation is a bit more complex. Only armv6+
>> generates mixed-endianness code vs data with -mbig-endian.
>> Prior to armv6, the code and data endianness matches. Therefore,
>> I plan to #ifdef the reversed endianness handling with:
>>
>> #if __ARM_ARCH >= 6 && __ARM_BIG_ENDIAN
>>
>> on arm32.
>
> That doesn't work well because BE code (.o files) can be built for v5te
> (for example) and used on a range of different architecture variants with
> both BE32 and BE8 - the choice between BE32 and BE8 is a link-time choice,
> not a compile-time choice.  So if the value for Arm is a compile-time
> constant, it should also work for both BE32 and BE8.

Good to know! Then we need to be even more careful.

>
> In turn, that suggests to me that RSEQ_SIG should be defined to be a value
> that is always in the code endianness (and whatever corresponding kernel
> code handles RSEQ_SIG values should act accordingly on architectures where
> the two endiannesses can differ).  If the kernel ABI is already fixed in a
> way that prevents such a definition of RSEQ_SIG semantics as using code
> endianness, a value should be chosen for Arm that works for both
> endiannesses.

It might be tricky to pick up a trap instruction that is a palindrome
endianness-wise.

>
> (Also, installed glibc headers are supposed to work with older compilers,
> and support for __ARM_ARCH was only added in GCC 4.8.  Before that you
> need to test lots of separate macros for different architecture variants
> to determine a version number.)

Good point!

Here is an alternative to the palindrome approach. I'm taking arm32
as an example:

* We define RSEQ_SIG_CODE in code endianness, meant to be used with
  .inst in rseq assembly:

#define RSEQ_SIG_CODE 0xe7f5def3

* We define RSEQ_SIG_DATA in data endianness:

#define RSEQ_SIG_DATA \
        ({ \
                int sig; \
                asm volatile (  "b 2f\n\t" \
                                ".arm\n\t" \
                                "1: .inst 0xe7f5def3\n\t" \
                                "2:\n\t" \
                                "ldr %[sig], 1b\n\t" \
                                : [sig] "=r" (sig)); \
                sig; \
        })

Technically, only glibc and early-adopter libraries wishing to
register rseq need to use RSEQ_SIG_DATA. The RSEQ_SIG_CODE needs
to be used from inline assembly to create the signatures before
each abort handler.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers [hidden email] wrote:

> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>
>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>
>>> > +/* RSEQ_SIG is a signature required before each abort handler code.
>>> > +
>>> > +   It is a 32-bit value that maps to actual architecture code compiled
>>> > +   into applications and libraries. It needs to be defined for each
>>> > +   architecture. When choosing this value, it needs to be taken into
>>> > +   account that generating invalid instructions may have ill effects on
>>> > +   tools like objdump, and may also have impact on the CPU speculative
>>> > +   execution efficiency in some cases.  */
>>> > +
>>> > +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>
>>> After further investigation, we should probably do the following
>>> to handle compiling with -mbig-endian on aarch64, which generates
>>> binaries with mixed code vs data endianness (little endian code,
>>> big endian data):
>>
>> First, the comment on RSEQ_SIG should specify whether it is to be
>> interpreted in the code or the data endianness.
>
> Right. The signature passed as argument to the rseq registration
> system call needs to be in data endianness (currently exposed kernel
> ABI).
>
> Ideally for userspace, we want to define a signature in code endianness
> that happens to nicely match specific code patterns.
>
>>
>>> For ARM32, the situation is a bit more complex. Only armv6+
>>> generates mixed-endianness code vs data with -mbig-endian.
>>> Prior to armv6, the code and data endianness matches. Therefore,
>>> I plan to #ifdef the reversed endianness handling with:
>>>
>>> #if __ARM_ARCH >= 6 && __ARM_BIG_ENDIAN
>>>
>>> on arm32.
>>
>> That doesn't work well because BE code (.o files) can be built for v5te
>> (for example) and used on a range of different architecture variants with
>> both BE32 and BE8 - the choice between BE32 and BE8 is a link-time choice,
>> not a compile-time choice.  So if the value for Arm is a compile-time
>> constant, it should also work for both BE32 and BE8.
>
> Good to know! Then we need to be even more careful.
>
>>
>> In turn, that suggests to me that RSEQ_SIG should be defined to be a value
>> that is always in the code endianness (and whatever corresponding kernel
>> code handles RSEQ_SIG values should act accordingly on architectures where
>> the two endiannesses can differ).  If the kernel ABI is already fixed in a
>> way that prevents such a definition of RSEQ_SIG semantics as using code
>> endianness, a value should be chosen for Arm that works for both
>> endiannesses.
>
> It might be tricky to pick up a trap instruction that is a palindrome
> endianness-wise.
>
>>
>> (Also, installed glibc headers are supposed to work with older compilers,
>> and support for __ARM_ARCH was only added in GCC 4.8.  Before that you
>> need to test lots of separate macros for different architecture variants
>> to determine a version number.)
>
> Good point!
>
> Here is an alternative to the palindrome approach. I'm taking arm32
> as an example:
>
> * We define RSEQ_SIG_CODE in code endianness, meant to be used with
>  .inst in rseq assembly:
>
> #define RSEQ_SIG_CODE 0xe7f5def3
>
> * We define RSEQ_SIG_DATA in data endianness:
>
> #define RSEQ_SIG_DATA \
>        ({ \
>                int sig; \
>                asm volatile (  "b 2f\n\t" \
>                                ".arm\n\t" \
>                                "1: .inst 0xe7f5def3\n\t" \
>                                "2:\n\t" \
>                                "ldr %[sig], 1b\n\t" \
>                                : [sig] "=r" (sig)); \
>                sig; \
>        })
>
> Technically, only glibc and early-adopter libraries wishing to
> register rseq need to use RSEQ_SIG_DATA. The RSEQ_SIG_CODE needs
> to be used from inline assembly to create the signatures before
> each abort handler.

The approach above should work for arm32 be8 vs be32 linker weirdness.

For aarch64, I think we can simply do:

/*
 * aarch64 -mbig-endian generates mixed endianness code vs data:
 * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
 * matches code endianness.
 */
#define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */

#ifdef __ARM_BIG_ENDIAN
#define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
#else
#define RSEQ_SIG_DATA   RSEQ_SIG_CODE
#endif

#define RSEQ_SIG        RSEQ_SIG_DATA

Feedback is most welcome,

Thanks!

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Joseph Myers
On Thu, 18 Apr 2019, Mathieu Desnoyers wrote:

> The approach above should work for arm32 be8 vs be32 linker weirdness.
>
> For aarch64, I think we can simply do:
>
> /*
>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>  * matches code endianness.
>  */
> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>
> #ifdef __ARM_BIG_ENDIAN
> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
> #else
> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
> #endif
>
> #define RSEQ_SIG        RSEQ_SIG_DATA
>
> Feedback is most welcome,

You'll also need __ASSEMBLER__ conditionals in the installed sys/rseq.h
header so that it only defines constants and doesn't include any C
declarations in that case, if RSEQ_SIG_CODE is meant to be usable in .S
files rather than just inline asm in C files.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Szabolcs Nagy-2
In reply to this post by Mathieu Desnoyers-4
On 18/04/2019 14:17, Mathieu Desnoyers wrote:

> ----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers [hidden email] wrote:
>> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>>
>>>>> +/* RSEQ_SIG is a signature required before each abort handler code.
>>>>> +
>>>>> +   It is a 32-bit value that maps to actual architecture code compiled
>>>>> +   into applications and libraries. It needs to be defined for each
>>>>> +   architecture. When choosing this value, it needs to be taken into
>>>>> +   account that generating invalid instructions may have ill effects on
>>>>> +   tools like objdump, and may also have impact on the CPU speculative
>>>>> +   execution efficiency in some cases.  */
>>>>> +
>>>>> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>>
>>>> After further investigation, we should probably do the following
>>>> to handle compiling with -mbig-endian on aarch64, which generates
>>>> binaries with mixed code vs data endianness (little endian code,
>>>> big endian data):
>>>
>>> First, the comment on RSEQ_SIG should specify whether it is to be
>>> interpreted in the code or the data endianness.
>>
>> Right. The signature passed as argument to the rseq registration
>> system call needs to be in data endianness (currently exposed kernel
>> ABI).
>>
>> Ideally for userspace, we want to define a signature in code endianness
>> that happens to nicely match specific code patterns.
...

> For aarch64, I think we can simply do:
>
> /*
>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>  * matches code endianness.
>  */
> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>
> #ifdef __ARM_BIG_ENDIAN
> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
> #else
> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
> #endif
>
> #define RSEQ_SIG        RSEQ_SIG_DATA
>
> Feedback is most welcome,

so the RSEQ_SIG value is supposed to be used with .word
in asm instead of .inst?

i don't think we use __ARM_* in public headers currently,
but hopefully aarch64_be compilers implement it.

otherwise this looks ok to me.

(i think a rare palindrome instruction would work too, e.g.
0a5f5f0a and w10, w24, wzr, lsr #23 // shifted 0
2a5f5f2a orr w10, w25, wzr, lsr #23
eb9f9feb negs x11, xzr, asr #39
c83f3fc8 stxp wzr, x8, x15, [x30]  // store to LR ignoring success
d9ffffd9 stz2g x25, [x30, #-16]!    // v8.5 tag+zero 2 granules around LR
etc. it does not need to be a guaranteed trap)
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/5] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v2)

Szabolcs Nagy-2
In reply to this post by Mathieu Desnoyers-4
On 16/04/2019 18:32, Mathieu Desnoyers wrote:

> --- a/sysdeps/unix/sysv/linux/sched_getcpu.c
> +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
> @@ -37,3 +37,26 @@ sched_getcpu (void)
>    return -1;
>  #endif
>  }
> +
> +#ifdef __NR_rseq
> +#include <sys/rseq.h>
> +#endif
> +
> +#if defined __NR_rseq && defined RSEQ_SIG
> +extern __attribute__ ((tls_model ("initial-exec")))
> +__thread volatile struct rseq __rseq_abi;

i'd expect sys/rseq.h to provide this declaration.

> +
> +int
> +sched_getcpu (void)
> +{
> +  int cpu_id = __rseq_abi.cpu_id;
> +
> +  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
> +}
> +#else
> +int
> +sched_getcpu (void)
> +{
> +  return vsyscall_sched_getcpu ();
> +}
> +#endif
> -- 2.17.1
>

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
In reply to this post by Joseph Myers
----- On Apr 18, 2019, at 10:48 AM, Joseph Myers [hidden email] wrote:

> On Thu, 18 Apr 2019, Mathieu Desnoyers wrote:
>
>> The approach above should work for arm32 be8 vs be32 linker weirdness.
>>
>> For aarch64, I think we can simply do:
>>
>> /*
>>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>  * matches code endianness.
>>  */
>> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>>
>> #ifdef __ARM_BIG_ENDIAN
>> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
>> #else
>> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
>> #endif
>>
>> #define RSEQ_SIG        RSEQ_SIG_DATA
>>
>> Feedback is most welcome,
>
> You'll also need __ASSEMBLER__ conditionals in the installed sys/rseq.h
> header so that it only defines constants and doesn't include any C
> declarations in that case, if RSEQ_SIG_CODE is meant to be usable in .S
> files rather than just inline asm in C files.

Good point!

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
In reply to this post by Szabolcs Nagy-2
----- On Apr 18, 2019, at 11:33 AM, Szabolcs Nagy [hidden email] wrote:

> On 18/04/2019 14:17, Mathieu Desnoyers wrote:
>> ----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers
>> [hidden email] wrote:
>>> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>>>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>>>
>>>>>> +/* RSEQ_SIG is a signature required before each abort handler code.
>>>>>> +
>>>>>> +   It is a 32-bit value that maps to actual architecture code compiled
>>>>>> +   into applications and libraries. It needs to be defined for each
>>>>>> +   architecture. When choosing this value, it needs to be taken into
>>>>>> +   account that generating invalid instructions may have ill effects on
>>>>>> +   tools like objdump, and may also have impact on the CPU speculative
>>>>>> +   execution efficiency in some cases.  */
>>>>>> +
>>>>>> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>>>
>>>>> After further investigation, we should probably do the following
>>>>> to handle compiling with -mbig-endian on aarch64, which generates
>>>>> binaries with mixed code vs data endianness (little endian code,
>>>>> big endian data):
>>>>
>>>> First, the comment on RSEQ_SIG should specify whether it is to be
>>>> interpreted in the code or the data endianness.
>>>
>>> Right. The signature passed as argument to the rseq registration
>>> system call needs to be in data endianness (currently exposed kernel
>>> ABI).
>>>
>>> Ideally for userspace, we want to define a signature in code endianness
>>> that happens to nicely match specific code patterns.
> ...
>> For aarch64, I think we can simply do:
>>
>> /*
>>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>  * matches code endianness.
>>  */
>> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>>
>> #ifdef __ARM_BIG_ENDIAN
>> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
>> #else
>> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
>> #endif
>>
>> #define RSEQ_SIG        RSEQ_SIG_DATA
>>
>> Feedback is most welcome,
>
> so the RSEQ_SIG value is supposed to be used with .word
> in asm instead of .inst?

We want a .inst so it translates into a valid trap instruction.
It's better to trap in case program execution reaches this
by mistake (makes debugging easier).

>
> i don't think we use __ARM_* in public headers currently,
> but hopefully aarch64_be compilers implement it.

Can I use #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__  then ?

>
> otherwise this looks ok to me.
>
> (i think a rare palindrome instruction would work too, e.g.
> 0a5f5f0a and w10, w24, wzr, lsr #23 // shifted 0
> 2a5f5f2a orr w10, w25, wzr, lsr #23
> eb9f9feb negs x11, xzr, asr #39
> c83f3fc8 stxp wzr, x8, x15, [x30]  // store to LR ignoring success
> d9ffffd9 stz2g x25, [x30, #-16]!    // v8.5 tag+zero 2 granules around LR
> etc. it does not need to be a guaranteed trap)

Unfortunately it's not a trap :/

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/5] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v2)

Mathieu Desnoyers-4
In reply to this post by Szabolcs Nagy-2
----- On Apr 18, 2019, at 11:33 AM, Szabolcs Nagy [hidden email] wrote:

> On 16/04/2019 18:32, Mathieu Desnoyers wrote:
>> --- a/sysdeps/unix/sysv/linux/sched_getcpu.c
>> +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
>> @@ -37,3 +37,26 @@ sched_getcpu (void)
>>    return -1;
>>  #endif
>>  }
>> +
>> +#ifdef __NR_rseq
>> +#include <sys/rseq.h>
>> +#endif
>> +
>> +#if defined __NR_rseq && defined RSEQ_SIG
>> +extern __attribute__ ((tls_model ("initial-exec")))
>> +__thread volatile struct rseq __rseq_abi;
>
> i'd expect sys/rseq.h to provide this declaration.

And it actually does! Will remove this duplicate.

Thanks,

Mathieu

>
>> +
>> +int
>> +sched_getcpu (void)
>> +{
>> +  int cpu_id = __rseq_abi.cpu_id;
>> +
>> +  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
>> +}
>> +#else
>> +int
>> +sched_getcpu (void)
>> +{
>> +  return vsyscall_sched_getcpu ();
>> +}
>> +#endif
>> -- 2.17.1

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Szabolcs Nagy-2
In reply to this post by Mathieu Desnoyers-4
On 18/04/2019 16:41, Mathieu Desnoyers wrote:

> ----- On Apr 18, 2019, at 11:33 AM, Szabolcs Nagy [hidden email] wrote:
>
>> On 18/04/2019 14:17, Mathieu Desnoyers wrote:
>>> ----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers
>>> [hidden email] wrote:
>>>> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>>>>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>>>>
>>>>>>> +/* RSEQ_SIG is a signature required before each abort handler code.
>>>>>>> +
>>>>>>> +   It is a 32-bit value that maps to actual architecture code compiled
>>>>>>> +   into applications and libraries. It needs to be defined for each
>>>>>>> +   architecture. When choosing this value, it needs to be taken into
>>>>>>> +   account that generating invalid instructions may have ill effects on
>>>>>>> +   tools like objdump, and may also have impact on the CPU speculative
>>>>>>> +   execution efficiency in some cases.  */
>>>>>>> +
>>>>>>> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>>>>
>>>>>> After further investigation, we should probably do the following
>>>>>> to handle compiling with -mbig-endian on aarch64, which generates
>>>>>> binaries with mixed code vs data endianness (little endian code,
>>>>>> big endian data):
>>>>>
>>>>> First, the comment on RSEQ_SIG should specify whether it is to be
>>>>> interpreted in the code or the data endianness.
>>>>
>>>> Right. The signature passed as argument to the rseq registration
>>>> system call needs to be in data endianness (currently exposed kernel
>>>> ABI).
>>>>
>>>> Ideally for userspace, we want to define a signature in code endianness
>>>> that happens to nicely match specific code patterns.
>> ...
>>> For aarch64, I think we can simply do:
>>>
>>> /*
>>>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>>>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>>  * matches code endianness.
>>>  */
>>> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>>>
>>> #ifdef __ARM_BIG_ENDIAN
>>> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
>>> #else
>>> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
>>> #endif
>>>
>>> #define RSEQ_SIG        RSEQ_SIG_DATA
>>>
>>> Feedback is most welcome,
>>
>> so the RSEQ_SIG value is supposed to be used with .word
>> in asm instead of .inst?
>
> We want a .inst so it translates into a valid trap instruction.
> It's better to trap in case program execution reaches this
> by mistake (makes debugging easier).

that does not make sense to me.

".inst" is an asm directive that requires a the value to
be the same on BE and LE (normal insn encoding).

".word" is an asm directive that requires the value to
use swapped encoding on BE (if it's used in the instruction
stream it will create a data mapping symbol and disasm to
.word value instead of the instruction mnemonics).

so which one is it?

>> i don't think we use __ARM_* in public headers currently,
>> but hopefully aarch64_be compilers implement it.
>
> Can I use #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__  then ?

hm, i'd either use #ifdef __AARCH64EB__ (since we already use it)
or the portable #include endian.h and __BYTE_ORDER == __BIG_ENDIAN
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4

----- On Apr 18, 2019, at 12:07 PM, Szabolcs Nagy [hidden email] wrote:

> On 18/04/2019 16:41, Mathieu Desnoyers wrote:
>> ----- On Apr 18, 2019, at 11:33 AM, Szabolcs Nagy [hidden email] wrote:
>>
>>> On 18/04/2019 14:17, Mathieu Desnoyers wrote:
>>>> ----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers
>>>> [hidden email] wrote:
>>>>> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>>>>>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>>>>>
>>>>>>>> +/* RSEQ_SIG is a signature required before each abort handler code.
>>>>>>>> +
>>>>>>>> +   It is a 32-bit value that maps to actual architecture code compiled
>>>>>>>> +   into applications and libraries. It needs to be defined for each
>>>>>>>> +   architecture. When choosing this value, it needs to be taken into
>>>>>>>> +   account that generating invalid instructions may have ill effects on
>>>>>>>> +   tools like objdump, and may also have impact on the CPU speculative
>>>>>>>> +   execution efficiency in some cases.  */
>>>>>>>> +
>>>>>>>> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>>>>>
>>>>>>> After further investigation, we should probably do the following
>>>>>>> to handle compiling with -mbig-endian on aarch64, which generates
>>>>>>> binaries with mixed code vs data endianness (little endian code,
>>>>>>> big endian data):
>>>>>>
>>>>>> First, the comment on RSEQ_SIG should specify whether it is to be
>>>>>> interpreted in the code or the data endianness.
>>>>>
>>>>> Right. The signature passed as argument to the rseq registration
>>>>> system call needs to be in data endianness (currently exposed kernel
>>>>> ABI).
>>>>>
>>>>> Ideally for userspace, we want to define a signature in code endianness
>>>>> that happens to nicely match specific code patterns.
>>> ...
>>>> For aarch64, I think we can simply do:
>>>>
>>>> /*
>>>>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>>>>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>>>  * matches code endianness.
>>>>  */
>>>> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>>>>
>>>> #ifdef __ARM_BIG_ENDIAN
>>>> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
>>>> #else
>>>> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
>>>> #endif
>>>>
>>>> #define RSEQ_SIG        RSEQ_SIG_DATA
>>>>
>>>> Feedback is most welcome,
>>>
>>> so the RSEQ_SIG value is supposed to be used with .word
>>> in asm instead of .inst?
>>
>> We want a .inst so it translates into a valid trap instruction.
>> It's better to trap in case program execution reaches this
>> by mistake (makes debugging easier).
>
> that does not make sense to me.
>
> ".inst" is an asm directive that requires a the value to
> be the same on BE and LE (normal insn encoding).
>
> ".word" is an asm directive that requires the value to
> use swapped encoding on BE (if it's used in the instruction
> stream it will create a data mapping symbol and disasm to
> .word value instead of the instruction mnemonics).
>
> so which one is it?

We declare the signature with ".inst" in assembler.

However, we also need to pass that 32-bit signature value as
argument to the rseq system call when registering rseq.

The signature comparison is performed by the kernel before
moving the instruction pointer to the abort handler. It compares
the signature received as parameter by sys_rseq (data) to the
4-byte signature preceding the abort IP.

On aarch64 big endian, AFAIU the signature in the code is in
little endian, and the signature value passed as argument to
the rseq system call is in big endian. One way to handle this
is to swap the byte order of the signature "data" representation
passed as argument to sys_rseq.

>
>>> i don't think we use __ARM_* in public headers currently,
>>> but hopefully aarch64_be compilers implement it.
>>
>> Can I use #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__  then ?
>
> hm, i'd either use #ifdef __AARCH64EB__ (since we already use it)
> or the portable #include endian.h and __BYTE_ORDER == __BIG_ENDIAN

I'll use #ifdef __AARCH64EB__ given this header is specific to aarch64.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Szabolcs Nagy-2
On 18/04/2019 18:10, Mathieu Desnoyers wrote:

>
> ----- On Apr 18, 2019, at 12:07 PM, Szabolcs Nagy [hidden email] wrote:
>
>> On 18/04/2019 16:41, Mathieu Desnoyers wrote:
>>> ----- On Apr 18, 2019, at 11:33 AM, Szabolcs Nagy [hidden email] wrote:
>>>
>>>> On 18/04/2019 14:17, Mathieu Desnoyers wrote:
>>>>> ----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers
>>>>> [hidden email] wrote:
>>>>>> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>>>>>>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>>>>>>
>>>>>>>>> +/* RSEQ_SIG is a signature required before each abort handler code.
>>>>>>>>> +
>>>>>>>>> +   It is a 32-bit value that maps to actual architecture code compiled
>>>>>>>>> +   into applications and libraries. It needs to be defined for each
>>>>>>>>> +   architecture. When choosing this value, it needs to be taken into
>>>>>>>>> +   account that generating invalid instructions may have ill effects on
>>>>>>>>> +   tools like objdump, and may also have impact on the CPU speculative
>>>>>>>>> +   execution efficiency in some cases.  */
>>>>>>>>> +
>>>>>>>>> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>>>>>>
>>>>>>>> After further investigation, we should probably do the following
>>>>>>>> to handle compiling with -mbig-endian on aarch64, which generates
>>>>>>>> binaries with mixed code vs data endianness (little endian code,
>>>>>>>> big endian data):
>>>>>>>
>>>>>>> First, the comment on RSEQ_SIG should specify whether it is to be
>>>>>>> interpreted in the code or the data endianness.
>>>>>>
>>>>>> Right. The signature passed as argument to the rseq registration
>>>>>> system call needs to be in data endianness (currently exposed kernel
>>>>>> ABI).
>>>>>>
>>>>>> Ideally for userspace, we want to define a signature in code endianness
>>>>>> that happens to nicely match specific code patterns.
>>>> ...
>>>>> For aarch64, I think we can simply do:
>>>>>
>>>>> /*
>>>>>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>>>>>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>>>>  * matches code endianness.
>>>>>  */
>>>>> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>>>>>
>>>>> #ifdef __ARM_BIG_ENDIAN
>>>>> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
>>>>> #else
>>>>> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
>>>>> #endif
>>>>>
>>>>> #define RSEQ_SIG        RSEQ_SIG_DATA
>>>>>
>>>>> Feedback is most welcome,
>>>>
>>>> so the RSEQ_SIG value is supposed to be used with .word
>>>> in asm instead of .inst?
>>>
>>> We want a .inst so it translates into a valid trap instruction.
>>> It's better to trap in case program execution reaches this
>>> by mistake (makes debugging easier).
>>
>> that does not make sense to me.
>>
>> ".inst" is an asm directive that requires a the value to
>> be the same on BE and LE (normal insn encoding).
>>
>> ".word" is an asm directive that requires the value to
>> use swapped encoding on BE (if it's used in the instruction
>> stream it will create a data mapping symbol and disasm to
>> .word value instead of the instruction mnemonics).
>>
>> so which one is it?
>
> We declare the signature with ".inst" in assembler.
>
> However, we also need to pass that 32-bit signature value as
> argument to the rseq system call when registering rseq.
>
> The signature comparison is performed by the kernel before
> moving the instruction pointer to the abort handler. It compares
> the signature received as parameter by sys_rseq (data) to the
> 4-byte signature preceding the abort IP.
>
> On aarch64 big endian, AFAIU the signature in the code is in
> little endian, and the signature value passed as argument to
> the rseq system call is in big endian. One way to handle this
> is to swap the byte order of the signature "data" representation
> passed as argument to sys_rseq.

you have to add a documentation comment somewhere
explaining if RSEQ_SIG is the value that's passed to
the kernel and then aarch64 asm code has to use

 .inst endianfixup(RSEQ_SIG) // or
 .word RSEQ_SIG

or if RSEQ_SIG is used as

 .inst RSEQ_SIG

in aarch64 asm and then endianfixup(RSEQ_SIG) should
be passed to the syscall.

either way it can be a brk 0x45e0 on both LE and BE,
but in the latter case you have to document this in
arch independent way, since the syscall api must be
portable (i assume "RSEQ_SIG" is part of the api).
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v8)

Mathieu Desnoyers-4
----- On Apr 18, 2019, at 1:37 PM, Szabolcs Nagy [hidden email] wrote:

> On 18/04/2019 18:10, Mathieu Desnoyers wrote:
>>
>> ----- On Apr 18, 2019, at 12:07 PM, Szabolcs Nagy [hidden email] wrote:
>>
>>> On 18/04/2019 16:41, Mathieu Desnoyers wrote:
>>>> ----- On Apr 18, 2019, at 11:33 AM, Szabolcs Nagy [hidden email] wrote:
>>>>
>>>>> On 18/04/2019 14:17, Mathieu Desnoyers wrote:
>>>>>> ----- On Apr 17, 2019, at 3:56 PM, Mathieu Desnoyers
>>>>>> [hidden email] wrote:
>>>>>>> ----- On Apr 17, 2019, at 12:17 PM, Joseph Myers [hidden email] wrote:
>>>>>>>> On Wed, 17 Apr 2019, Mathieu Desnoyers wrote:
>>>>>>>>
>>>>>>>>>> +/* RSEQ_SIG is a signature required before each abort handler code.
>>>>>>>>>> +
>>>>>>>>>> +   It is a 32-bit value that maps to actual architecture code compiled
>>>>>>>>>> +   into applications and libraries. It needs to be defined for each
>>>>>>>>>> +   architecture. When choosing this value, it needs to be taken into
>>>>>>>>>> +   account that generating invalid instructions may have ill effects on
>>>>>>>>>> +   tools like objdump, and may also have impact on the CPU speculative
>>>>>>>>>> +   execution efficiency in some cases.  */
>>>>>>>>>> +
>>>>>>>>>> +#define RSEQ_SIG 0xd428bc00 /* BRK #0x45E0.  */
>>>>>>>>>
>>>>>>>>> After further investigation, we should probably do the following
>>>>>>>>> to handle compiling with -mbig-endian on aarch64, which generates
>>>>>>>>> binaries with mixed code vs data endianness (little endian code,
>>>>>>>>> big endian data):
>>>>>>>>
>>>>>>>> First, the comment on RSEQ_SIG should specify whether it is to be
>>>>>>>> interpreted in the code or the data endianness.
>>>>>>>
>>>>>>> Right. The signature passed as argument to the rseq registration
>>>>>>> system call needs to be in data endianness (currently exposed kernel
>>>>>>> ABI).
>>>>>>>
>>>>>>> Ideally for userspace, we want to define a signature in code endianness
>>>>>>> that happens to nicely match specific code patterns.
>>>>> ...
>>>>>> For aarch64, I think we can simply do:
>>>>>>
>>>>>> /*
>>>>>>  * aarch64 -mbig-endian generates mixed endianness code vs data:
>>>>>>  * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
>>>>>>  * matches code endianness.
>>>>>>  */
>>>>>> #define RSEQ_SIG_CODE   0xd428bc00      /* BRK #0x45E0.  */
>>>>>>
>>>>>> #ifdef __ARM_BIG_ENDIAN
>>>>>> #define RSEQ_SIG_DATA   0x00bc28d4      /* BRK #0x45E0.  */
>>>>>> #else
>>>>>> #define RSEQ_SIG_DATA   RSEQ_SIG_CODE
>>>>>> #endif
>>>>>>
>>>>>> #define RSEQ_SIG        RSEQ_SIG_DATA
>>>>>>
>>>>>> Feedback is most welcome,
>>>>>
>>>>> so the RSEQ_SIG value is supposed to be used with .word
>>>>> in asm instead of .inst?
>>>>
>>>> We want a .inst so it translates into a valid trap instruction.
>>>> It's better to trap in case program execution reaches this
>>>> by mistake (makes debugging easier).
>>>
>>> that does not make sense to me.
>>>
>>> ".inst" is an asm directive that requires a the value to
>>> be the same on BE and LE (normal insn encoding).
>>>
>>> ".word" is an asm directive that requires the value to
>>> use swapped encoding on BE (if it's used in the instruction
>>> stream it will create a data mapping symbol and disasm to
>>> .word value instead of the instruction mnemonics).
>>>
>>> so which one is it?
>>
>> We declare the signature with ".inst" in assembler.
>>
>> However, we also need to pass that 32-bit signature value as
>> argument to the rseq system call when registering rseq.
>>
>> The signature comparison is performed by the kernel before
>> moving the instruction pointer to the abort handler. It compares
>> the signature received as parameter by sys_rseq (data) to the
>> 4-byte signature preceding the abort IP.
>>
>> On aarch64 big endian, AFAIU the signature in the code is in
>> little endian, and the signature value passed as argument to
>> the rseq system call is in big endian. One way to handle this
>> is to swap the byte order of the signature "data" representation
>> passed as argument to sys_rseq.
>
> you have to add a documentation comment somewhere
> explaining if RSEQ_SIG is the value that's passed to
> the kernel and then aarch64 asm code has to use
>
> .inst endianfixup(RSEQ_SIG) // or
> .word RSEQ_SIG

Using ".word" won't allow objdump to show the instruction it
maps to. It will consider it as data. So .inst is preferred here.

>
> or if RSEQ_SIG is used as
>
> .inst RSEQ_SIG
>
> in aarch64 asm and then endianfixup(RSEQ_SIG) should
> be passed to the syscall.

At this stage, we control the meaning of the definitions we
publicly expose. They are part of glibc headers, not part of the
kernel uapi.

On architectures where data and code endianness match, RSEQ_SIG
can be used both as argument to sys_rseq and as value for
.inst in assembler.

On architectures where data and code endianness differ, I am
tempted to declare them separately:

* RSEQ_SIG_CODE: for use with .inst in assembly,
* RSEQ_SIG_DATA (mapping to RSEQ_SIG): to pass as parameter to sys_rseq.

So those specific architectures would use "RSEQ_SIG_CODE" with
.inst in assembly, and we can still pass the RSEQ_SIG as parameter
to sys_rseq in generic rseq registration code.

> either way it can be a brk 0x45e0 on both LE and BE,
> but in the latter case you have to document this in
> arch independent way, since the syscall api must be
> portable (i assume "RSEQ_SIG" is part of the api).

The RSEQ_SIG is defined by glibc bits/rseq.h which is included from
sys/rseq.h. It's therefore not part of the Linux kernel uapi. So
we can define whatever we need to at this point, but we won't be
able to change it after it has been exposed for a given
architecture.

All the kernel ABI expects is a data-endian value of the signature
it needs to compare to when it loads the 4 bytes prior to the abort
ip.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
12