[PATCH 01/12] linux: Fix vDSO macros build with time64 interfaces

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 07/12] elf: Move vDSO setup to rtld (BZ#24967)

Adhemerval Zanella-2


On 13/12/2019 10:56, Florian Weimer wrote:

> * Adhemerval Zanella:
>
>>> This IFUNC resolver is still not valid because _rtld_global_ro has a
>>> relocation dependency.
>>
>> Afaik with current ld guaranties, .rel{a}.dyn will be sort prior ifunc
>> so _rtld_global_ro should be reallocated prior the ifunc itself.
>
> But that's intra-DSO.  It does not necessarily affect cross-DSO
> relocation ordering.  In theory, the DT_NEEDED ordering should make this
> work, except for the static dlopen case, where you get a
> default-initialized _rtld_global_ro from the inactive (inner) loader.
> The latter is why some architectures use _dl_var_init for essential
> non-loader data, and why getauxval is currently broken for something
> that is dlopen'ed from a statically linked program (bug 20802).

Another point to just moved out from ifunc to implement such functions,
although imho this specific usercase should not prevent this change,
since by fixing this issue in also fix the ifunc selection.

>
>>> What we should do instead is to patch the vDSO function pointers (and
>>> pretty much all shared variables, including the page size) into
>>> libc.so.6 right after loading it (before relocation).  I had hoped to
>>> post a patch for this, but the prerequisite
>>>
>>>   <https://sourceware.org/ml/libc-alpha/2019-11/msg00977.html>
>>>
>>> has not been reviewed.  It provides _dl_lookup_direct, which we could
>>> use to get access to this global variables structure very efficiently.
>>
>> We can go on this way (which is a rather complex solution with might add
>> even more pitfalls). However another much simple possibility is to make
>> gettimeoday and time use normal function that calls the vDSO directly.  
>> On x86_64 haswell (i7-4790K) I see just 7% increase of latency to by
>> using INLINE_VSYSCALL with my patch, which I think it quite acceptable.
>
> It's not more complex overall because we already need _dl_var_init on
> some architectures.  This new mechanism would replace that and do away
> with undo-RELRO/_dl_var_init/reapply-RELRO sequence.

It is more complex to maintain a specific initialization code for some
arch and it is orthogonal to fix static dlopen. It also requires more
patch review iterations to get some more piece in place to keep providing
the ifunc optimization.

So I see that if you think that the specific ifunc scenario, which is
already broken, is a block for BZ#24967 I see it is better to remove the
ifunc optimization and reinstate it once this ifunc scenario is fixed.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] linux: Fix vDSO macros build with time64 interfaces

Adhemerval Zanella-2
In reply to this post by Florian Weimer-5


On 13/12/2019 11:03, Florian Weimer wrote:

> * Adhemerval Zanella:
>
>> On 13/12/2019 08:51, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> diff --git a/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c b/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c
>>>> index 7e772e05ce..07d38466e2 100644
>>>> --- a/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c
>>>> +++ b/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c
>>>> @@ -22,10 +22,6 @@
>>>>  
>>>>  #include <time.h>
>>>>  #include <sysdep.h>
>>>> -
>>>> -#ifdef HAVE_GETTIMEOFDAY_VSYSCALL
>>>> -# define HAVE_VSYSCALL
>>>> -#endif
>>>>  #include <sysdep-vdso.h>
>>>>  
>>>>  /* Used as a fallback in the ifunc resolver if VDSO is not available
>>>> @@ -36,7 +32,9 @@ __gettimeofday_vsyscall (struct timeval *restrict tv, void *restrict tz)
>>>>    if (__glibc_unlikely (tz != 0))
>>>>      memset (tz, 0, sizeof *tz);
>>>>  
>>>> -  return INLINE_VSYSCALL (gettimeofday, 2, tv, tz);
>>>> +  if (INLINE_VSYSCALL (gettimeofday, 2, tv, tz) == 0)
>>>> +    return 0;
>>>> +  return INLINE_SYSCALL_CALL (gettimeofday, tv, tz);
>>>>  }
>>>
>>> Given that this is the fallback function why do we try INLINE_VSYSCALL
>>> first?
>>>
>>> (The static case would need adjusting, of course.)
>>
>> Because it will be used on static build and the fallback case will be
>> unlikely. But I can add static only case that uses vDSO plus syscall and
>> change the shared fallback case that just issues the syscall.
>
> I think that would make more sense, yes.
>
>>>> diff --git a/sysdeps/unix/sysv/linux/clock_gettime.c b/sysdeps/unix/sysv/linux/clock_gettime.c
>>>> index 875c4fe905..4ea56c9a4b 100644
>>>> --- a/sysdeps/unix/sysv/linux/clock_gettime.c
>>>> +++ b/sysdeps/unix/sysv/linux/clock_gettime.c
>>>> @@ -21,10 +21,6 @@
>>>>  #include <errno.h>
>>>>  #include <time.h>
>>>>  #include "kernel-posix-cpu-timers.h"
>>>> -
>>>> -#ifdef HAVE_CLOCK_GETTIME_VSYSCALL
>>>> -# define HAVE_VSYSCALL
>>>> -#endif
>>>>  #include <sysdep-vdso.h>
>>>>  
>>>>  #include <shlib-compat.h>
>>>> @@ -33,24 +29,39 @@
>>>>  int
>>>>  __clock_gettime64 (clockid_t clock_id, struct __timespec64 *tp)
>>>>  {
>>>> +  int r = -1;
>>>> +
>>>>  #ifdef __ASSUME_TIME64_SYSCALLS
>>>> +  /* 64 bit ABIs or Newer 32-bit ABIs that only support 64-bit time_t.  */
>>>> +# ifdef __NR_clock_gettime64
>>>> +  r = INLINE_SYSCALL_CALL (clock_gettime64, clock_id, tp);
>>>> +# else
>>>> +#  ifdef HAVE_CLOCK_GETTIME_VSYSCALL
>>>> +  r = INLINE_VSYSCALL (clock_gettime, 2, clock_id, tp);
>>>> +#  endif
>>>> +  if (r == -1)
>>>> +    r = INLINE_SYSCALL_CALL (clock_gettime, clock_id, tp);
>>>
>>> Why do you check __NR_clock_gettime64 first?  Won't this make the vDSO
>>> unused?
>>
>> The vDSO support for clock_gettime64 was added later in this set. I
>> explicit removed because even if an architecture sets
>> HAVE_CLOCK_GETTIME64_VSYSCALL, it won't build.
>
> Ah, I see it now:
>
>   /* Get current value of CLOCK and store it in TP.  */
>   int
>   __clock_gettime64 (clockid_t clock_id, struct __timespec64 *tp)
>   {
>     int r = -1;
>  
>   #ifdef __ASSUME_TIME64_SYSCALLS
>     /* 64 bit ABIs or Newer 32-bit ABIs that only support 64-bit time_t.  */
>   # ifdef __NR_clock_gettime64
>   #  ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
>     r = INLINE_VSYSCALL (clock_gettime64, 2, clock_id, tp);
>   #  endif
>     if (r == -1)
>       r = INLINE_SYSCALL_CALL (clock_gettime64, clock_id, tp);
>   # else
>   #  ifdef HAVE_CLOCK_GETTIME_VSYSCALL
>     r = INLINE_VSYSCALL (clock_gettime, 2, clock_id, tp);
>   #  endif
>     if (r == -1)
>       r = INLINE_SYSCALL_CALL (clock_gettime, clock_id, tp);
>   # endif
>   #else
>     /* Old 32-bit ABI with possible 64-bit time_t support.  */
>   # ifdef __NR_clock_gettime64
>   #  ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
>     r = INLINE_VSYSCALL (clock_gettime64, 2, clock_id, tp);
>   #  endif
>     if (r == -1)
>       r = INLINE_SYSCALL_CALL (clock_gettime64, clock_id, tp);
>   # endif
>     if (r == -1)
>       {
>         /* Fallback code that uses 32-bit support.  */
>         struct timespec tp32;
>   # ifdef HAVE_CLOCK_GETTIME_VSYSCALL
>         r = INLINE_VSYSCALL (clock_gettime, 2, clock_id, &tp32);
>   # endif
>         if (r == -1)
>   r = INLINE_SYSCALL_CALL (clock_gettime, clock_id, &tp32);
>         if (r == 0)
>   *tp = valid_timespec_to_timespec64 (tp32);
>       }
>   #endif
>  
>     return r;
>   }
>
> This looks quite ugly to me.

It is with current constraints. The idea is that eventually
!__ASSUME_TIME64_SYSCALLS path will be phase out.

>
> If I read it correctly, it still does not cover for 32-bit the case of
> an old kernel without clock_gettime64 support.  Here, INLINE_VSYSCALL
> for clock_gettimeofday64 will fail (without a context switch), then
> INLINE_SYSCALL_CALL will fail, *with* a context switch, and only then,
> and only then the INLINE_VSYSCALL call for clock_gettimeofday will
> suceed.

It does cover, it is just not optimal for older kernels. This patch is
following the current semantic on the clock_gettime64. I see what you
are proposing as an additional optimization.

>
> Since this is used to implement clock_gettime:
>
>   #if __TIMESIZE != 64
>   int
>   __clock_gettime (clockid_t clock_id, struct timespec *tp)
>   {
>     int ret;
>     struct __timespec64 tp64;
>  
>     ret = __clock_gettime64 (clock_id, &tp64);
>  
>     if (ret == 0)
>       {
>         if (! in_time_t_range (tp64.tv_sec))
>           {
>             __set_errno (EOVERFLOW);
>             return -1;
>           }
>  
>         *tp = valid_timespec64_to_timespec (tp64);
>       }
>  
>     return ret;
>   }
>   #endif
>
> it will impact quite a lot of installations.  We know that
> clock_gettimeofday performance is critical to many users, eveon i386.

I agree and that's why I initially preferred to not implement the
old time32 on top of the time64 ones.  But it was decided and this
is the most straightforward way to progressively test the new
interfaces without require doing a full switch to time64.

>
> The main question is whether it is worth supporting clock_gettime64
> without vDSO support.  If it is not, at startup, the loader should
> select a function pointer for the clock_gettime64 implementation used by
> the clock_gettime64 wrapper:
>
>   (a) kernel-provided clock_gettime64 from the vDSO
>   (b) glibc clock_gettime64 implementation on top of clock_gettime vDSO
>   (c) glibc clock_gettime64 implementation on top of clock_gettime syscall
>
> Checking the presence of vDSO symbols is reasonably efficient because
> it's just a hash table lookup (especially if _dl_lookup_direct is used).
> We would have two indirect calls for the legacy vDSO case, but getting
> rid of that would mean to use an IFUNC for clock_gettime and
> clock_gettime64, driving up complexity again.
>
> Unfortunately, writing (b) and (c) may not be easy on architectures
> where INTERNAL_VSYSCALL_CALL uses a non-standard calling convention with
> inline assembly, due to lack of proper GCC support for it.
>
> If we need to support systems without clock_gettime64 in the vDSO, we
> have a problem because that requires system call probing, which is
> probably not something that we want to do at startup.

Linux is moving to make the vDSO infrastructure more generic and easy
so the architectures will require less boilerplate to enable it.  However
I do see that it might take some time to enable on all architectures and
there might be some kernel configuration that might explicit disable
clock_gettime64 vDSO.

But I think essentially what you are suggesting is an optimization to a
scenario that in practice should be unusual: a glibc build with a v5.1+
kernel headers, but deployed in a older kernel without time64 support.

This scenario could be a more common possibility to static build, so
an possible option is to use the a global flag atomically set in the
case of ENOSYS (as discussed in time64_t previously).  We could set
it by syscall interface or globally to assume or not kernel support
time64_t. But again I think this should be handled as an optimization,
rather than a requisite/blocker.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] linux: Fix vDSO macros build with time64 interfaces

Florian Weimer-5
* Adhemerval Zanella:

> But I think essentially what you are suggesting is an optimization to a
> scenario that in practice should be unusual: a glibc build with a v5.1+
> kernel headers, but deployed in a older kernel without time64 support.

I don't think that's quite right.  It will affect any future Fedora
release that is deployed on current container environments, irrespective
of container technology.  In the past, vendors were really slow to
rebase kernels.

Furthermore, I think we have tentative agreement that we want to move to
built-in system call tables to make it clearer what functionality we
support.  In particular, we viewed this as a requirement for rseq
support.  While it seems unlikely at this point that rseq support will
make it into the upcoming release, I still hope to contribute my syscall
tables patch next week.  (The patch is done, but the auto-updating of
the tables doesn't quite work yet the way Joseph would like it.)

It's also not just an optimization because the selection logic should be
generic and could be written once because it does not depend on the
function pointer.

But unfortunately, probing the way I suggested will not work.  It's
incompatible with existing seccomp filters running on newer kernels
because they will cause the syscall in the vDSO to fail with ENOSYS.  So
we still need a fallback path unfortunately.  If I'm right, this
invalidates a previous review comment of mine regarding the fallback
path after INLINE_VSYSCALL.

Thanks,
Florian

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] linux: Fix vDSO macros build with time64 interfaces

Florian Weimer-5
In reply to this post by Florian Weimer-5
* Florian Weimer:

> * Adhemerval Zanella:
>
>> diff --git a/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c b/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c
>> index 7e772e05ce..07d38466e2 100644
>> --- a/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c
>> +++ b/sysdeps/unix/sysv/linux/aarch64/gettimeofday.c
>> @@ -22,10 +22,6 @@
>>  
>>  #include <time.h>
>>  #include <sysdep.h>
>> -
>> -#ifdef HAVE_GETTIMEOFDAY_VSYSCALL
>> -# define HAVE_VSYSCALL
>> -#endif
>>  #include <sysdep-vdso.h>
>>  
>>  /* Used as a fallback in the ifunc resolver if VDSO is not available
>> @@ -36,7 +32,9 @@ __gettimeofday_vsyscall (struct timeval *restrict tv, void *restrict tz)
>>    if (__glibc_unlikely (tz != 0))
>>      memset (tz, 0, sizeof *tz);
>>  
>> -  return INLINE_VSYSCALL (gettimeofday, 2, tv, tz);
>> +  if (INLINE_VSYSCALL (gettimeofday, 2, tv, tz) == 0)
>> +    return 0;
>> +  return INLINE_SYSCALL_CALL (gettimeofday, tv, tz);
>>  }
>
> Given that this is the fallback function why do we try INLINE_VSYSCALL
> first?

As I mentioned in my other message, seccomp filters make INLINE_VSYSCALL
without fallback tricky because the vDSO could use a filtered system
call, while the direct syscall path would still succeed because it uses
a system call known to and approved by the seccomp filter.

Thanks,
Florian

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 07/12] elf: Move vDSO setup to rtld (BZ#24967)

Florian Weimer-5
In reply to this post by Adhemerval Zanella-2
* Adhemerval Zanella:

> It is more complex to maintain a specific initialization code for some
> arch and it is orthogonal to fix static dlopen. It also requires more
> patch review iterations to get some more piece in place to keep providing
> the ifunc optimization.
>
> So I see that if you think that the specific ifunc scenario, which is
> already broken, is a block for BZ#24967 I see it is better to remove the
> ifunc optimization and reinstate it once this ifunc scenario is fixed.

Sorry, I agree my review wasn't helpful.  I'm not sure if I can re-visit
these patches today, though.

Florian

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 07/12] elf: Move vDSO setup to rtld (BZ#24967)

Adhemerval Zanella-2


On 13/12/2019 11:54, Florian Weimer wrote:

> * Adhemerval Zanella:
>
>> It is more complex to maintain a specific initialization code for some
>> arch and it is orthogonal to fix static dlopen. It also requires more
>> patch review iterations to get some more piece in place to keep providing
>> the ifunc optimization.
>>
>> So I see that if you think that the specific ifunc scenario, which is
>> already broken, is a block for BZ#24967 I see it is better to remove the
>> ifunc optimization and reinstate it once this ifunc scenario is fixed.
>
> Sorry, I agree my review wasn't helpful.  I'm not sure if I can re-visit
> these patches today, though.

Fair enough, I just don't want to get stalled in a pre-requisite to fix
an already broken interface for an specific scenario to a small latency
gain.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] linux: Fix vDSO macros build with time64 interfaces

Adhemerval Zanella-2
In reply to this post by Florian Weimer-5


On 13/12/2019 11:49, Florian Weimer wrote:

> * Adhemerval Zanella:
>
>> But I think essentially what you are suggesting is an optimization to a
>> scenario that in practice should be unusual: a glibc build with a v5.1+
>> kernel headers, but deployed in a older kernel without time64 support.
>
> I don't think that's quite right.  It will affect any future Fedora
> release that is deployed on current container environments, irrespective
> of container technology.  In the past, vendors were really slow to
> rebase kernels.

If this scenario is indeed something usual for new glibc installations,
a generic build to enable both time64 and time support will eventually
require some probing to get kernel support.

>
> Furthermore, I think we have tentative agreement that we want to move to
> built-in system call tables to make it clearer what functionality we
> support.  In particular, we viewed this as a requirement for rseq
> support.  While it seems unlikely at this point that rseq support will
> make it into the upcoming release, I still hope to contribute my syscall
> tables patch next week.  (The patch is done, but the auto-updating of
> the tables doesn't quite work yet the way Joseph would like it.)
>
> It's also not just an optimization because the selection logic should be
> generic and could be written once because it does not depend on the
> function pointer.

Even with syscall tables being update with latest kernel releases, the
optimization I see that you are suggesting it to avoid probing on
every syscall.

>
> But unfortunately, probing the way I suggested will not work.  It's
> incompatible with existing seccomp filters running on newer kernels
> because they will cause the syscall in the vDSO to fail with ENOSYS.  So
> we still need a fallback path unfortunately.  If I'm right, this
> invalidates a previous review comment of mine regarding the fallback
> path after INLINE_VSYSCALL.

The fallback after the vDSO is essentially because some vDSO implementation
does not issue the syscall itself, but rather return ENOSYS. If I recall
correctly it was the case for mips for some 3.1x version, I would expected
that this is behaviour an outlier and usual vDSO support it to call the
syscall.

And the fallback also work if seccomp triggers an ENOSYS all well. So
what I think it might an option for !__ASSUME_TIME64_SYSCALLS with
an optimization to avoid probing is:


  int r == -1;
  static int time64_support = 1;
  if (atomic_load_relaxed (&time64_support) == 1)
    {
#ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
      /* It assumes that vDSO will always fallback to syscall
         for invalid timers.  */
      r = INLINE_VSYSCALL (clock_gettime64, 2, clock_id, tp);
#else
      r = INLINE_SYSCALL_CALL (clock_gettime64, clock_id, tp);
#endif
      if (r == ENOSYS)
        {
          atomic_store_relaxed (&time64_support, 0);
          r = -1;
        }
    }

  if (r == -1)
    {
      /* Fallback code that uses 32-bit support.  */
      struct timespec tp32;
# ifdef HAVE_CLOCK_GETTIME_VSYSCALL
      /* Some vDSO implementation might not call the syscall for
         invalid timers.  */
      r = INLINE_VSYSCALL (clock_gettime, 2, clock_id, &tp32);
# endif
      if (r == -1)
        r = INLINE_SYSCALL_CALL (clock_gettime, clock_id, &tp32);
      if (r == 0)
        *tp = valid_timespec_to_timespec64 (tp32);
    }

  return r;
     
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 07/12] elf: Move vDSO setup to rtld (BZ#24967)

Adhemerval Zanella-2
In reply to this post by Florian Weimer-5


On 13/12/2019 11:54, Florian Weimer wrote:

> * Adhemerval Zanella:
>
>> It is more complex to maintain a specific initialization code for some
>> arch and it is orthogonal to fix static dlopen. It also requires more
>> patch review iterations to get some more piece in place to keep providing
>> the ifunc optimization.
>>
>> So I see that if you think that the specific ifunc scenario, which is
>> already broken, is a block for BZ#24967 I see it is better to remove the
>> ifunc optimization and reinstate it once this ifunc scenario is fixed.
>
> Sorry, I agree my review wasn't helpful.  I'm not sure if I can re-visit
> these patches today, though.
>
> Florian
>

I am now working a v3 for this set and it is not clear to me if you think
it is a good idea or not to move the vDSO pointer to _rtld_global_ro. I
am aware you have been working on trying to decouple and get rid of the
_dl_var_init, and on my v3 the ifunc variants will not depend on the
_rtld_global_ro, but rather call the vDSO resolve function (as currently
done).

Also, vDSO calls should not be a problem for static dlopen as indicated
by BZ#20802. The vDSO pointer would be zero-initialized and the syscall
will be issued instead. I see that we might this working once we we
have a proper fix for BZ#20802 and I don't see this as a blocker.
12