[Bug libc/23296] New: Data race in setting function descriptor during lazy binding

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] New: Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

            Bug ID: 23296
           Summary: Data race in setting function descriptor during lazy
                    binding
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: danglin at gcc dot gnu.org
                CC: carlos at redhat dot com, deller at gmx dot de,
                    drepper.fsp at gmail dot com
  Target Milestone: ---
              Host: hppa-unknown-linux-gnu
            Target: hppa-unknown-linux-gnu
             Build: hppa-unknown-linux-gnu

The debian normaliz package defaults to using OpenMP.  With OpenMP, the
testsuite
fails quite consistently on hppa-linux.  If I disable OpenMP or export
LD_BIND_NOW=y, the testsuite doesn't fail.  OpenMP is also disabled on
alpha-linux-gnu mipsel-linux-gnu.  The comment indicates that random bus
errors occur on mipsel but I don't know that the problem is the same as hppa.

The most common failure on hppa is a SEGV in _dl_fixup.  It appears the
reloc offset passed to _dl_fixup is corrupt.  Presumably, this is caused
by a race in installing the function descriptor.

dave@mx3210:~/debian/normaliz/normaliz-3.6.0+ds$ gdb -c
./_build/test/run_tests/core
/home/dave/debian/normaliz/normaliz-3.6.0+ds/_build/source/.libs/normaliz
GNU gdb (Debian 7.12-6+b2) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "hppa-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from
/home/dave/debian/normaliz/normaliz-3.6.0+ds/_build/source/.libs/normaliz...done.
[New LWP 14189]
[New LWP 14197]
[New LWP 14192]
[New LWP 14150]
[New LWP 14208]
[New LWP 14204]
[New LWP 14200]
[New LWP 14194]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1".
Core was generated by
`/home/dave/debian/normaliz/normaliz-3.6.0+ds/_build/source/.libs/normaliz -s
te'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  _dl_fixup (l=0xf8bdeb50, reloc_arg=4136748372) at dl-runtime.c:75
75      dl-runtime.c: No such file or directory.
[Current thread is 1 (Thread 0xf5c910c0 (LWP 14189))]
(gdb) bt
#0  _dl_fixup (l=0xf8bdeb50, reloc_arg=4136748372) at dl-runtime.c:75
#1  0xf8bcc730 in _dl_runtime_resolve () at ../sysdeps/hppa/dl-trampoline.S:83
#2  0xf87af7e4 in libnormaliz::Full_Cone<long long>::find_new_facets(unsigned
int const&) [clone ._omp_fn.13] () at
../../source/libnormaliz/full_cone.cpp:421
#3  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) disass
Dump of assembler code for function _dl_fixup:
   0xf8bc564c <+0>:     stw rp,-14(sp)
   0xf8bc5650 <+4>:     ldo 80(sp),sp
   0xf8bc5654 <+8>:     stw r7,-6c(sp)
   0xf8bc5658 <+12>:    ldo -70(sp),r7
   0xf8bc565c <+16>:    stw r6,-68(sp)
   0xf8bc5660 <+20>:    stw r5,-64(sp)
   0xf8bc5664 <+24>:    stw r4,-60(sp)
   0xf8bc5668 <+28>:    stw r3,-5c(sp)
   0xf8bc566c <+32>:    stw r19,-20(sp)
   0xf8bc5670 <+36>:    ldw 7c(r26),r21
   0xf8bc5674 <+40>:    ldw 4(r21),r22
   0xf8bc5678 <+44>:    ldw 38(r26),r20
   0xf8bc567c <+48>:    add,l r25,r22,r6
=> 0xf8bc5680 <+52>:    ldw r25(r22),r3
   0xf8bc5684 <+56>:    ldw 4(r6),r21
   0xf8bc5688 <+60>:    extrw,u r21,23,24,r31
   0xf8bc568c <+64>:    depw,z r31,27,28,ret1
   0xf8bc5690 <+68>:    ldw 34(r26),r22
   0xf8bc5694 <+72>:    ldw 4(r20),r20
   0xf8bc5698 <+76>:    add,l ret1,r20,r20
   0xf8bc569c <+80>:    extrw,u r21,31,8,r21
   0xf8bc56a0 <+84>:    stw r20,0(r7)
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) list dl-runtime.c:75
70      in dl-runtime.c
(gdb) directory /home/dave/debian/glibc/glibc-2.27/elf
Source directories searched: /home/dave/debian/glibc/glibc-2.27/elf:$cdir:$cwd
(gdb) list dl-runtime.c:75
70
71        const PLTREL *const reloc
72          = (const void *) (D_PTR (l, l_info[DT_JMPREL]) + reloc_offset);
73        const ElfW(Sym) *sym = &symtab[ELFW(R_SYM) (reloc->r_info)];
74        const ElfW(Sym) *refsym = sym;
75        void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
76        lookup_t result;
77        DL_FIXUP_VALUE_TYPE value;
78
79        /* Sanity check that we're really looking at a PLT relocation.  */
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0xf5c910c0 (LWP 14189) _dl_fixup (l=0xf8bdeb50,
    reloc_arg=4136748372) at dl-runtime.c:75
  2    Thread 0xf448e0c0 (LWP 14197) new_heap (size=0, top_pad=<optimized out>)
    at arena.c:520
  3    Thread 0xf54900c0 (LWP 14192) _dl_runtime_resolve ()
    at ../sysdeps/hppa/dl-trampoline.S:50
  4    Thread 0xf8cfb040 (LWP 14150) boost::operator&<unsigned long,
std::allocator<unsigned long> > (x=..., y=...)
    at /usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:1772
  5    Thread 0xf2c8b0c0 (LWP 14208) new_heap (size=0, top_pad=<optimized out>)
    at arena.c:520
  6    Thread 0xf348c0c0 (LWP 14204) new_heap (size=0, top_pad=<optimized out>)
    at arena.c:520
  7    Thread 0xf3c8d0c0 (LWP 14200) new_heap (size=0, top_pad=<optimized out>)
    at arena.c:520
  8    Thread 0xf4c8f0c0 (LWP 14194) new_heap (size=0, top_pad=<optimized out>)
    at arena.c:520
(gdb) thread 3
[Switching to thread 3 (Thread 0xf54900c0 (LWP 14192))]
#0  _dl_runtime_resolve () at ../sysdeps/hppa/dl-trampoline.S:50
50              stw     %rp, -20(%sp)
(gdb) thread 2
[Switching to thread 2 (Thread 0xf448e0c0 (LWP 14197))]
#0  new_heap (size=0, top_pad=<optimized out>) at arena.c:520
520     arena.c: No such file or directory.
(gdb) thread 3
[Switching to thread 3 (Thread 0xf54900c0 (LWP 14192))]
#0  _dl_runtime_resolve () at ../sysdeps/hppa/dl-trampoline.S:50
50              stw     %rp, -20(%sp)
(gdb) p/x $sp
$1 = 0xf4c904c0
(gdb) thread 1
[Switching to thread 1 (Thread 0xf5c910c0 (LWP 14189))]
#0  _dl_fixup (l=0xf8bdeb50, reloc_arg=4136748372) at dl-runtime.c:75
75        void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
(gdb) p/x $sp
$2 = 0xf5491500
(gdb) p *l
$3 = {l_addr = 4166217728,
  l_name = 0xf8bdeaf8
"/home/dave/debian/normaliz/normaliz-3.6.0+ds/_build/source/.libs/libnormaliz.so.3",
l_ld = 0xf8855a7c, l_next = 0xf8cff000,
  l_prev = 0xf8bde668, l_real = 0xf8bdeb50, l_ns = 0, l_libname = 0xf8bdedac,
  l_info = {0x0, 0xf8855ac4, 0xf8855b2c, 0xf8855b24, 0xf8855af4, 0xf8855b04,
    0xf8855b0c, 0xf8855b44, 0xf8855b4c, 0xf8855b54, 0xf8855b14, 0xf8855b1c,
    0xf8855ad4, 0xf8855adc, 0xf8855acc, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf8855b34,
    0x0, 0x0, 0xf8855b3c, 0x0, 0xf8855ae4, 0x0, 0xf8855aec, 0x0, 0x0, 0x0,
    0x0, 0x0, 0x0, 0xf8855b64, 0xf8855b5c, 0x0, 0x0, 0x0, 0x0, 0xf8855b74,
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf8855b6c,
    0x0 <repeats 25 times>, 0xf8855afc}, l_phdr = 0xf8537034,
  l_entry = 4166670160, l_phnum = 5, l_ldnum = 37, l_searchlist = {
    r_list = 0x0, r_nlist = 0}, l_symbolic_searchlist = {r_list = 0xf8bdeda8,
    r_nlist = 0}, l_loader = 0xf8bde668, l_versions = 0xf8cfd788,
  l_nversions = 23, l_nbuckets = 2053, l_gnu_bitmask_idxbits = 1023,
  l_gnu_shift = 15, l_gnu_bitmask = 0xf853cacc, {l_gnu_buckets = 0xf853dacc,
    l_chain = 0xf853dacc}, {l_gnu_chain_zero = 0xf853f75c,
    l_buckets = 0xf853f75c}, l_direct_opencount = 0, l_type = lt_library,
  l_relocated = 1, l_init_called = 1, l_global = 1, l_reserved = 0,
  l_phdr_allocated = 0, l_soname_added = 0, l_faked = 0, l_need_tls_init = 0,
  l_auditing = 0, l_audit_any_plt = 0, l_removed = 0, l_contiguous = 1,
  l_symbolic_in_local_scope = 0, l_free_initfini = 0, l_rpath_dirs = {
    dirs = 0x0, malloced = 0}, l_reloc_result = 0x0, l_versyms = 0xf8593dd0,
---Type <return> to continue, or q <return> to quit---
  l_origin = 0xf8bdedd0
"/home/dave/debian/normaliz/normaliz-3.6.0+ds/_build/source/.libs", l_map_start
= 4166217728, l_map_end = 4169519492,
  l_text_end = 4169523200, l_scope_mem = {0xf8bde7c4, 0x0, 0x0, 0x0},
  l_scope_max = 4, l_scope = 0xf8bded08, l_local_scope = {0xf8bdecac, 0x0},
  l_file_id = {dev = 2065, ino = 4994321}, l_runpath_dirs = {dirs = 0x0,
    malloced = 0}, l_initfini = 0xf8d00b88, l_reldeps = 0x0, l_reldepsmax = 0,
  l_used = 1, l_feature_1 = 0, l_flags_1 = 0, l_flags = 0, l_idx = 0,
  l_mach = {fptr_table_len = 3690, fptr_table = 0xf8ce0000}, l_lookup_cache = {
    sym = 0xf8543ed4, type_class = 0, value = 0xf8cffde8, ret = 0xf6fd5fe8},
  l_tls_initimage = 0x0, l_tls_initimage_size = 0, l_tls_blocksize = 0,
  l_tls_align = 0, l_tls_firstbyte_offset = 0, l_tls_offset = 0,
  l_tls_modid = 0, l_tls_dtor_count = 0, l_relro_addr = 0, l_relro_size = 0,
  l_serial = 2, l_audit = 0xf8bdeda8}
(gdb) p *reloc
value has been optimized out
(gdb) ptype reloc
type = const struct {
    Elf32_Addr r_offset;
    Elf32_Word r_info;
    Elf32_Sword r_addend;
} * const
(gdb) x/3x $r25
0xf691c554:     0x00031ee8      0xf8d00368      0xf691bed8
(gdb) frame 2
#2  0xf87af7e4 in libnormaliz::Full_Cone<long long>::find_new_facets(unsigned
int const&) [clone ._omp_fn.13] () at
../../source/libnormaliz/full_cone.cpp:421
421         for (; jj!= Neg_Subfacet_Multi_United.end(); ++jj) {
(gdb) disass $pc-16,$pc+16
Dump of assembler code from 0xf87af7d4 to 0xf87af7f4:
   0xf87af7d4
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+752>:      ldw
14(r26),r20
   0xf87af7d8
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+756>:      stw
r20,54(r22)
   0xf87af7dc
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+760>:      b,l
0xf87b7098,rp
   0xf87af7e0
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+764>:      copy
r19,r4
=> 0xf87af7e4
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+768>:      copy
r4,r19
   0xf87af7e8
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+772>:      b,l
0xf87b9630,rp
   0xf87af7ec
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+776>:      copy
r19,r4
   0xf87af7f0
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+780>:    
cmpib,= 0,ret0,0xf87af818
<_ZN11libnormaliz9Full_ConeIxE15find_new_facetsERKj._omp_fn.13+820>
End of assembler dump.
(gdb) p/x $r19
$4 = 0xf8bdd164
(gdb) frame 0
#0  _dl_fixup (l=0xf8bdeb50, reloc_arg=4136748372) at dl-runtime.c:75
75        void *const rel_addr = (void *)(l->l_addr + reloc->r_offset);
(gdb) p/x $r25
$5 = 0xf691c554
(gdb) x/4i 0xf87b9630
   0xf87b9630:  addil L%-1800,r19,r1
   0xf87b9634:  ldw a8(r1),r21
   0xf87b9638:  bv r0(r21)
   0xf87b963c:  ldw ac(r1),r19

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #1 from John David Anglin <danglin at gcc dot gnu.org> ---
After thinking about this, I don't believe this can be fixed in glibc.

1) The window for the race could be reduced by atomically updating function
descriptors but the race can't be fully eliminated.  See for example the linker
stub shown at the end of comment #1.  Function descriptors would have to have
8 byte alignment.

2) Even if we could detect the case where the function descriptor has been
updated between the loading of the ip and gp values, there's no way to find
the function descriptor and correct the ip value.

So, I think we should link pthread executables and shared libraries with
"-z now".

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #2 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to John David Anglin from comment #1)

> After thinking about this, I don't believe this can be fixed in glibc.
>
> 1) The window for the race could be reduced by atomically updating function
> descriptors but the race can't be fully eliminated.  See for example the
> linker
> stub shown at the end of comment #1.  Function descriptors would have to have
> 8 byte alignment.
>
> 2) Even if we could detect the case where the function descriptor has been
> updated between the loading of the ip and gp values, there's no way to find
> the function descriptor and correct the ip value.
>
> So, I think we should link pthread executables and shared libraries with
> "-z now".

You have not described the race.

Could you please describe the race in detail?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #3 from dave.anglin at bell dot net ---
On 2018-06-18 12:31 PM, carlos at redhat dot com wrote:
> You have not described the race.
>
> Could you please describe the race in detail?
What I believe is happening is a race in updating a function descriptor
between two or more
threads during lazy binding.  It can occur due to a context switch
either in setting the descriptor or
in loading the descriptor.  The gp value is stored first when a
descriptor is set.  It is loaded
last in call stubs and indirect calls.  The result is
_dl_runtime_resolve can be called with r19 containing
the ltp value for the descriptor instead of the reloc offset.  This
causes a segmentation fault in
_dl_fixup when it happens.

The problem is worse in OpenMP code as several threads can be competing
to call the same
function.

Dave

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to dave.anglin from comment #3)

> On 2018-06-18 12:31 PM, carlos at redhat dot com wrote:
> > You have not described the race.
> >
> > Could you please describe the race in detail?
> What I believe is happening is a race in updating a function descriptor
> between two or more
> threads during lazy binding.  It can occur due to a context switch
> either in setting the descriptor or
> in loading the descriptor.  The gp value is stored first when a
> descriptor is set.  It is loaded
> last in call stubs and indirect calls.  The result is
> _dl_runtime_resolve can be called with r19 containing
> the ltp value for the descriptor instead of the reloc offset.  This
> causes a segmentation fault in
> _dl_fixup when it happens.

Thanks, that makes it clearer.

I can't see how the store order of gp first then ip second ever really made a
difference in anything then. What does that ordering fix? Was it designed that
way to simply cause the process to enter rtld and fail? If the gp was equal to
some reloc offset then it would be an invisible failure!

My suggestion:

Make the update sequence thread-safe.

- reloc offset should be even.
- gp should be even.
- store "this is a reloc offset" in the low bit.
- alter _dl_fixup generic code to be able to cope with a sheared read of the
two words in the function descriptor. The low bit of reloc-offset is
effectively helping you keep the two words in sync (we used a common idiom for
64-bit counters constructed of two 32-bit counters in condition variables).

Like with POWER, you may need to alter binutils to reject odd reloc offsets,
and always add the '| 1' bit, and then make rtld handle that.

Thoughts?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #5 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to John David Anglin from comment #1)
> 2) Even if we could detect the case where the function descriptor has been
> updated between the loading of the ip and gp values, there's no way to find
> the function descriptor and correct the ip value.

Ah, this is the problem then, you have only the relocated gp, and you need the
reloc itself to compute ip, and the gp is not enough information.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #6 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Carlos O'Donell from comment #5)
> (In reply to John David Anglin from comment #1)
> > 2) Even if we could detect the case where the function descriptor has been
> > updated between the loading of the ip and gp values, there's no way to find
> > the function descriptor and correct the ip value.
>
> Ah, this is the problem then, you have only the relocated gp, and you need
> the reloc itself to compute ip, and the gp is not enough information.

No... wait a minute.

We do all the work in elf_machine_runtime_setup to set these up?

For PLT relocs we walk all the jmprel relocs and setup the reloc offset.

So to fix this at additional per-thread cost in lazy binding, we could just do
the work all over again for the specific thread?


249       /* Process all the relocs, now that we know the GOT... */
250       for (iplt = jmprel; iplt < end_jmprel; iplt += sizeof (Elf32_Rela))
251         {
252           reloc = (const Elf32_Rela *) iplt;
253           r_type = ELF32_R_TYPE (reloc->r_info);
254           r_sym = ELF32_R_SYM (reloc->r_info);
255
256           if (__builtin_expect (r_type == R_PARISC_IPLT, 1))
257             {
258               fptr = (struct fdesc *) (reloc->r_offset + l_addr);

Here we would compare fptr to the plt entry we're trying to update.

If it matched, we would have the matching reloc?

259               if (r_sym != 0)
260                 {
261                   /* Relocate the pointer to the stub.  */
262                   fptr->ip = (Elf32_Addr) got - GOT_FROM_PLT_STUB;
263
264                   /* Instead of the LTP value, we put the reloc offset
265                      here.  The trampoline code will load the proper
266                      LTP and pass the reloc offset to the fixup
267                      function.  */
268                   fptr->gp = iplt - jmprel;

Instead of doing this, we defer this to each thread which redoes the work.

Then every thread does an idempotent update.

269                 } /* r_sym != 0 */
270               else
271                 {
272                   /* Relocate this *ABS* entry.  */
273                   fptr->ip = reloc->r_addend + l_addr;
274                   fptr->gp = D_PTR (l, l_info[DT_PLTGOT]);

Ignore *ABS*, since they were already done.

275                 }
276             } /* r_type == R_PARISC_IPLT */
277         } /* for all the relocations */

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #7 from dave.anglin at bell dot net ---
On 2018-06-18 4:17 PM, carlos at redhat dot com wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=23296
>
> --- Comment #6 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to Carlos O'Donell from comment #5)
>> (In reply to John David Anglin from comment #1)
>>> 2) Even if we could detect the case where the function descriptor has been
>>> updated between the loading of the ip and gp values, there's no way to find
>>> the function descriptor and correct the ip value.
>> Ah, this is the problem then, you have only the relocated gp, and you need
>> the reloc itself to compute ip, and the gp is not enough information.
> No... wait a minute.
>
> We do all the work in elf_machine_runtime_setup to set these up?
>
> For PLT relocs we walk all the jmprel relocs and setup the reloc offset.
>
> So to fix this at additional per-thread cost in lazy binding, we could just do
> the work all over again for the specific thread?
>
>
> 249       /* Process all the relocs, now that we know the GOT... */
> 250       for (iplt = jmprel; iplt < end_jmprel; iplt += sizeof (Elf32_Rela))
> 251         {
> 252           reloc = (const Elf32_Rela *) iplt;
> 253           r_type = ELF32_R_TYPE (reloc->r_info);
> 254           r_sym = ELF32_R_SYM (reloc->r_info);
> 255
> 256           if (__builtin_expect (r_type == R_PARISC_IPLT, 1))
> 257             {
> 258               fptr = (struct fdesc *) (reloc->r_offset + l_addr);
>
> Here we would compare fptr to the plt entry we're trying to update.
Unfortunately, we don't have fptr.  We have got, gp and l.  This gives
jmprel and end_jmprel
but I don't think gp is sufficient to find the reloc.

We have rp.  With that, we can find the original branch but it would be
tough to find the fptr
as the tramp currently clobbers r20 to r22.  It doesn't need to clobber
r22 but that doesn't
help much.

Dave

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to dave.anglin from comment #7)
> On 2018-06-18 4:17 PM, carlos at redhat dot com wrote:
> > Here we would compare fptr to the plt entry we're trying to update.
> Unfortunately, we don't have fptr.  We have got, gp and l.  This gives
> jmprel and end_jmprel but I don't think gp is sufficient to find the reloc.

It should be sufficient.

If you have jmprel and end_jmprel you can iterate over all the relocs.

If you can iterate the relocs you can compute the address that each reloc
adjusts.

You can compare the adjusted address to the PLT you are going to adjust?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #9 from dave.anglin at bell dot net ---
On 2018-06-20 10:21 AM, carlos at redhat dot com wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=23296
>
> --- Comment #8 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to dave.anglin from comment #7)
>> On 2018-06-18 4:17 PM, carlos at redhat dot com wrote:
>>> Here we would compare fptr to the plt entry we're trying to update.
>> Unfortunately, we don't have fptr.  We have got, gp and l.  This gives
>> jmprel and end_jmprel but I don't think gp is sufficient to find the reloc.
> It should be sufficient.
>
> If you have jmprel and end_jmprel you can iterate over all the relocs.
>
> If you can iterate the relocs you can compute the address that each reloc
> adjusts.
Yes.
>
> You can compare the adjusted address to the PLT you are going to adjust?
I don't think we know the PLT entry requiring adjustment.  That came
from the reloc offset
which has been clobbered.

The trampoline code could be modified so it doesn't use r22.  If we
modified the call stubs
and gcc so that r22 contains the address of the PLT entry, then we
should be able to find the
reloc offset.

We don't need to clobber r22 in plt_stub:

static const bfd_byte plt_stub[] =
{
   0x0e, 0x80, 0x10, 0x96,  /* 1: ldw    0(%r20),%r22            */
   0xea, 0xc0, 0xc0, 0x00,  /*    bv     %r0(%r22)               */
   0x0e, 0x88, 0x10, 0x95,  /*    ldw    4(%r20),%r21            */
#define PLT_STUB_ENTRY (3*4)
   0xea, 0x9f, 0x1f, 0xdd,  /*    b,l    1b,%r20                 */
   0xd6, 0x80, 0x1c, 0x1e,  /*    depi   0,31,2,%r20             */
   0x00, 0xc0, 0xff, 0xee,  /* 9: .word  fixup_func              */
   0xde, 0xad, 0xbe, 0xef   /*    .word  fixup_ltp               */
};

We would need an extra ldo instruction in the following stubs:

    Import stub to call shared library routine from normal object file
    (single sub-space version)
    :            addil LR'lt_ptr+ltoff,%dp       ; get procedure entry point
    :            ldw RR'lt_ptr+ltoff(%r1),%r21
    :            bv %r0(%r21)
    :            ldw RR'lt_ptr+ltoff+4(%r1),%r19 ; get new dlt value.

    Import stub to call shared library routine from shared library
    (single sub-space version)
    :            addil LR'ltoff,%r19             ; get procedure entry point
    :            ldw RR'ltoff(%r1),%r21
    :            bv %r0(%r21)
    :            ldw RR'ltoff+4(%r1),%r19        ; get new dlt value.

I don't much like increasing the size of these stubs it would eliminate
one relocation.

$$dyncall would need fixing to use r21 for branch target.  The LTP value
should be loaded after target address as in the stubs:

#ifdef L_dyncall
         SUBSPA_MILLI
         ATTR_DATA
GSYM($$dyncall)
         .export $$dyncall,millicode
         .proc
         .callinfo       millicode
         .entry
         bb,>=,n %r22,30,LREF(1)         ; branch if not plabel address
         depi    0,31,2,%r22             ; clear the two least
significant bits
         ldw     4(%r22),%r19            ; load new LTP value
         ldw     0(%r22),%r22            ; load address of target
LSYM(1)
#ifdef LINUX
         bv      %r0(%r22)               ; branch to the real target
#else
         ldsid   (%sr0,%r22),%r1         ; get the "space ident"
selected by r22
         mtsp    %r1,%sr0                ; move that space identifier
into sr0
         be      0(%sr0,%r22)            ; branch to the real target
#endif
         stw     %r2,-24(%r30)           ; save return address into
frame marker
         .exit
         .procend

The inline indirect calls generated by gcc would need review.

We need to implement your suggestion for setting the LSB in the reloc
offset.  PLT
entries could be updated atomically using a floating point store. Then,
the only
possible race is between the load of the target address and the new LTP
value.
If the LSB in reloc offset is not set, then we should be able to just
reload the target
address using r22 and transfer to the target.

The only thing I don't like is making the stubs larger.  This has caused
no end of trouble.

Dave

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #10 from John David Anglin <danglin at gcc dot gnu.org> ---
I looked at what 32-bit HPUX does.  It sets the LTP to zero initially.
It sets the function address to point to a "b,l,n" instruction before the
trampoline.

The trampoline looks like this:

0x7b04d170:     b,l,n 0x7b04d17c,r31
0x7b04d174:     b,l,n 0x7b04d17c,r31
0x7b04d178:     b,l,n 0x7b04d17c,r31
0x7b04d17c:     ldil -3ffdc000,r21
0x7b04d180:     ldo 160(r21),r21
0x7b04d184:     ldsid (sr0,r21),r1
0x7b04d188:     mtsp r1,sr0
0x7b04d18c:     addil -7b04d000,r31,%r1
0x7b04d190:     ldo -16c(r1),r1
0x7b04d194:     extrw,u r1,29,30,r1
0x7b04d198:     addil 0,r1,%r1
0x7b04d19c:     ldo 3(r1),r19
0x7b04d1a0:     ldil 7b04f000,r31
0x7b04d1a4:     ldo 360(r31),r31
0x7b04d1a8:     be 0(sr0,r21)
0x7b04d1ac:     ldo 0(r31),r21

With technique, all the information needed to find the relocation is encoded
in the function address.  It needs a bunch of "b,l,n" instructions and a larger
trampoline.

HPUX appears to bind indirect calls immediately.  I guess this is to work
around the fact that $$dyncall loads the LTP value before the function address.

The trampoline only seems to be used when a call goes via an import stub.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #11 from John David Anglin <danglin at gcc dot gnu.org> ---
This race causes new test failures building Debian 2.29-2.  This Debian
bug#941174:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941174

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
           See Also|                            |http://bugs.debian.org/9411
                   |                            |74

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #12 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 12019
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12019&action=edit
Patch to fix function descriptor race

The attached change adds support to fix the race in updating and accessing
function descriptors on hppa.  The approach used here is to modify binutils and
gcc so that indirect calls load and leave the function pointer for the call in
register %r22.  We also mark reloc offsets with a 1 in the least significant
bit of the offset.  If the runtime resolver is entered with a reloc offset
without a 1 in the least significant bit, we can then use the function pointer
in %r22 to find and update the descriptor.

The main race is a descriptor being updated between the load of the function
address and the global pointer.  However, PA 2.0 machines support out-of-order
execution of loads and stores, and can appear to execute out-of-order when
viewed from another processor.  I have revamped the code so glibc stores the
global pointer before the function address, and binutils and gcc generated code
load the function address before the global pointer.  We need to avoid the
situation where we branch to functions with an incorrect global pointer.  I've
never seen this happen but technically the function pointer store and load need
to be ordered (see page G-2 of PA-RISC Architecture).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #13 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 12020
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12020&action=edit
Binutils patch

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #14 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 12021
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12021&action=edit
GCC patch

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

John David Anglin <danglin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #12019|0                           |1
        is obsolete|                            |

--- Comment #15 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 12039
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12039&action=edit
Patch to fix function descriptor race

With this patch, I'm not seeing any segmentation faults due to lazy binding in
testsuite.  Still haven't worked on profile code.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #16 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to John David Anglin from comment #15)
> Created attachment 12039 [details]
> Patch to fix function descriptor race
>
> With this patch, I'm not seeing any segmentation faults due to lazy binding
> in testsuite.  Still haven't worked on profile code.

Have you posted this for review on libc-alpha? It's looking like a good
solution.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #17 from dave.anglin at bell dot net ---
On 2019-10-14 8:41 p.m., carlos at redhat dot com wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=23296
>
> --- Comment #16 from Carlos O'Donell <carlos at redhat dot com> ---
> (In reply to John David Anglin from comment #15)
>> Created attachment 12039 [details]
>> Patch to fix function descriptor race
>>
>> With this patch, I'm not seeing any segmentation faults due to lazy binding
>> in testsuite.  Still haven't worked on profile code.
> Have you posted this for review on libc-alpha? It's looking like a good
> solution.
Not yet.  Want to finish updating profile code.

I have revised gcc trunk and gcc-9 branch to pass address of descriptor in
register %r22.  The
binutils stub patch has been posted on binutils list and I'm going to install
it soon.

I've been working on changes to try to fix the ordering issues that are present
on PA 2.0 machines.
I have an update to __cffc() that I'm testing.  For it to fully work, it needs
descriptors aligned on
double word boundaries.  This allows descriptor to be updated with atomic
store.

There are subtle issue to PA 2.0 ordering that I don't fully understand and
they aren't clearly described
in architecture book.  When one has a sequence like the following:

        ldd    x
        bve   x
        ldd    y

it would seem to me that the load of y must occur after x.  Otherwise, the non
sequential execution of
loads and stores, and other instructions would be apparent even on UP machines
using simulated threads.
So, it would seem that the branch must be a barrier.  I came to this conclusion
thinking about hppa64 which
uses the above sequence for indirect calls.  It doesn't use an ordered load for
x and there isn't a way to
atomically update x and y together on hppa64 (no quad word store).

Dave

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug libc/23296] Data race in setting function descriptor during lazy binding

Sourceware - glibc-bugs mailing list
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23296

--- Comment #18 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by John David Anglin
<[hidden email]>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1a044511a3f9020c3f430164e0a6a77426fecd7e

commit 1a044511a3f9020c3f430164e0a6a77426fecd7e
Author: John David Anglin <[hidden email]>
Date:   Mon Mar 30 20:36:49 2020 +0000

    Fix data race in setting function descriptors during lazy binding on hppa.

    This addresses an issue that is present mainly on SMP machines running
    threaded code.  In a typical indirect call or PLT import stub, the
    target address is loaded first.  Then the global pointer is loaded into
    the PIC register in the delay slot of a branch to the target address.
    During lazy binding, the target address is a trampoline which transfers
    to _dl_runtime_resolve().

    _dl_runtime_resolve() uses the relocation offset stored in the global
    pointer and the linkage map stored in the trampoline to find the
    relocation.  Then, the function descriptor is updated.

    In a multi-threaded application, it is possible for the global pointer
    to be updated between the load of the target address and the global
    pointer.  When this happens, the relocation offset has been replaced
    by the new global pointer.  The function pointer has probably been
    updated as well but there is no way to find the address of the function
    descriptor and to transfer to the target.  So, _dl_runtime_resolve()
    typically crashes.

    HP-UX addressed this problem by adding an extra pc-relative branch to
    the trampoline.  The descriptor is initially setup to point to the
    branch.  The branch then transfers to the trampoline.  This allowed
    the trampoline code to figure out which descriptor was being used
    without any modification to user code.  I didn't use this approach
    as it is more complex and changes function pointer canonicalization.

    The order of loading the target address and global pointer in
    indirect calls was not consistent with the order used in import stubs.
    In particular, $$dyncall and some inline versions of it loaded the
    global pointer first.  This was inconsistent with the global pointer
    being updated first in dl-machine.h.  Assuming the accesses are
    ordered, we want elf_machine_fixup_plt() to store the global pointer
    first and calls to load it last.  Then, the global pointer will be
    correct when the target function is entered.

    However, just to make things more fun, HP added support for
    out-of-order execution of accesses in PA 2.0.  The accesses used by
    calls are weakly ordered. So, it's possibly under some circumstances
    that a function might be entered with the wrong global pointer.
    However, HP uses weakly ordered accesses in 64-bit HP-UX, so I assume
    that loading the global pointer in the delay slot of the branch must
    work consistently.

    The basic fix for the race is a combination of modifying user code to
    preserve the address of the function descriptor in register %r22 and
    setting the least-significant bit in the relocation offset.  The
    latter was suggested by Carlos as a way to distinguish relocation
    offsets from global pointer values.  Conventionally, %r22 is used
    as the address of the function descriptor in calls to $$dyncall.
    So, it wasn't hard to preserve the address in %r22.

    I have updated gcc trunk and gcc-9 branch to not clobber %r22 in
    $$dyncall and inline indirect calls.  I have also modified the import
    stubs in binutils trunk and the 2.33 branch to preserve %r22.  This
    required making the stubs one instruction longer but we save one
    relocation.  I also modified binutils to align the .plt section on
    a 8-byte boundary.  This allows descriptors to be updated atomically
    with a floting-point store.

    With these changes, _dl_runtime_resolve() can fallback to an alternate
    mechanism to find the relocation offset when it has been clobbered.
    There's just one additional instruction in the fast path. I tested
    the fallback function, _dl_fix_reloc_arg(), by changing the branch to
    always use the fallback.  Old code still runs as it did before.

    Fixes bug 23296.

    Reviewed-by: Carlos O'Donell <[hidden email]>

--
You are receiving this mail because:
You are on the CC list for the bug.
12