[PATCH] Add ifunc memcpy and memmove for aarch64

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
This patch adds ifunc versions of memcpy and memmove for aarch64.  I
know this isn't appropriate for 2.25 but I wanted to submit it and get
it reviewed for 2.26.  The basic change is to include software
prefetching for large memcpy's on thunderx which can speed up those
routines by around 2X.  For memcpy's under 32K bytes I found that the
software prefetching did not help (and sometimes hurt).  I wasn't
really interested in speeding up memmove but since memcpy and memmove
are implemented in one file it seemed easier to make memmove an ifunc
along with memcpy rather than try and split them up.  memmove does get
a speedup when it uses the memcpy code.

The ifunc code depends on the mrs instruction which is a privileged
instruction but the 4.11 version of the linux kernel will have
emulation for it (https://lkml.org/lkml/2017/1/10/816).  Since it is
emulated I added code to save it's value rather than read it everytime
we want to execute an ifunc selection function.  I also saved a flag to
specify if the platform was thunderx or not so that glibc did not have
to do multiple logical operations on the mrs value in each ifunc
selection function to determine if it was on a thunderx platform or
not.

I have attached the bench-memcpy.out, bench-memcpy-large.out, bench-
memmove.out and bench-memmove-large.out files to show the performance
difference, most of the difference is seen in the large versions as the
smaller ones only use prefetching on a couple of inputs.

Steve Ellcey
[hidden email]


2017-01-19  Steve Ellcey  <[hidden email]>

        * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
        (memmove): Use MEMMOVE for name.
        (memcpy): Use MEMCPY for name.  Add loop with prefetching
        under USE_THUNDERX macro.
        * sysdeps/aarch64/multiarch/Makefile: New file.
        * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
        * sysdeps/aarch64/multiarch/init-arch.h: Ditto.
        * sysdeps/aarch64/multiarch/memcpy.c: Ditto.
        * sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
        * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/configure.ac (arch_minimum_kernel):
        Set to 4.11.0 if building with multi_arch.
        * sysdeps/unix/sysv/linux/aarch64/configure: Regenerate.

ifunc.patch (21K) Download Attachment
bench-memcpy.out (34K) Download Attachment
bench-memcpy-large.out (2K) Download Attachment
bench-memmove.out (29K) Download Attachment
bench-memmove-large.out (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Adhemerval Zanella-2
Hi Steve,

On 19/01/2017 16:22, Steve Ellcey wrote:

> +extern uint64_t __midr attribute_hidden;
> +extern bool __is_thunderx attribute_hidden;
> +
> +#define INIT_ARCH() \
> +  { \
> +    if (__midr == 0) \
> +      { \
> + asm volatile ("mrs %0, midr_el1" : "=r"(__midr)); \
> + __is_thunderx = IS_THUNDERX(__midr); \
> +      } \
> +  }

I think to avoid potentially multiple kernel traps at loading or plt resolve time,
a better solution would be issue the mrs instruction once at loader/program startup,
fill in an internal structure with the required information and use it later on
ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.


> diff --git a/sysdeps/unix/sysv/linux/aarch64/configure.ac b/sysdeps/unix/sysv/linux/aarch64/configure.ac
> index 211fa9c..684cb46 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/configure.ac
> +++ b/sysdeps/unix/sysv/linux/aarch64/configure.ac
> @@ -1,6 +1,11 @@
>  GLIBC_PROVIDES dnl See aclocal.m4 in the top level source directory.
>  # Local configure fragment for sysdeps/unix/sysv/linux/aarch64.
>  
> -arch_minimum_kernel=3.7.0
> +# For multi-arch support we need a kernel that emulates the mrs instruction.
> +if test x$multi_arch = xyes; then
> +    arch_minimum_kernel=4.11.0
> +else
> +    arch_minimum_kernel=3.7.0
> +fi

I do not think this is suffice to prevent the multiarch version on system with
old installed kernel headers.  This will only prevents if you explicit use
--enable-multi-arch, however multiarch are enabled by default in configure.ac
(configure.ac:877).  So building on with old kernel headers will broke
the runtime.

We need to make sure glibc built against older kernel headers (or with
--enable-kernel=x.y.z) do not use mrs instruction and glibc built against
newer kernel that may use mrs fail on loading with DL_SYSDEP_OSCHECK.

From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
bit on hwcap to indication it supports the mrs emulation.  So using my previous
suggestion I would recommend:

  1. Remove any configure check or restriction.
  2. Add a cpu_features module similar to x86 that set a global state with
     the cpu information obtained from kernel.  It will first check HWCAP_CPUID
     bit on hwcap and if it is set then issue the mrs instruction.  It will
     then populate the global state with the required cpu information.
  3. Use the cpu information to select the correct ifunc.

It has another advantage of avoid more complexity with different glibc
with different minimum required kernels.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Joseph Myers
On Thu, 19 Jan 2017, Adhemerval Zanella wrote:

> We need to make sure glibc built against older kernel headers (or with
> --enable-kernel=x.y.z) do not use mrs instruction and glibc built against
> newer kernel that may use mrs fail on loading with DL_SYSDEP_OSCHECK.

Agreed.  That is, I think that either the configured minimum kernel
version or the kernel support at runtime (or both, with the configured
minimum kernel allowing runtime tests to be disabled) should be what
determines whether these implementations can be used - rather than
enabling multi-arch changing the minimum kernel version.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
In reply to this post by Adhemerval Zanella-2
On Thu, 2017-01-19 at 17:41 -0200, Adhemerval Zanella wrote:


> I think to avoid potentially multiple kernel traps at loading or plt resolve time,
> a better solution would be issue the mrs instruction once at loader/program startup,
> fill in an internal structure with the required information and use it later on
> ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.
>
> From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
> bit on hwcap to indication it supports the mrs emulation.  So using my previous
> suggestion I would recommend:
>
>   1. Remove any configure check or restriction.
>   2. Add a cpu_features module similar to x86 that set a global state with
>      the cpu information obtained from kernel.  It will first check HWCAP_CPUID
>      bit on hwcap and if it is set then issue the mrs instruction.  It will
>      then populate the global state with the required cpu information.
>   3. Use the cpu information to select the correct ifunc.
>
> It has another advantage of avoid more complexity with different glibc
> with different minimum required kernels.

Adhemerval,

I am looking at the cpu-features setup from x86 and trying to implement
that for aarch64 but there are some things I don't understand about the
code and I was hoping you (or someone else on the list) could help me.
I have attached the patch I have so far, this code doesn't contain any
use of the cpu features code but is just the code that tries to initialize
it on start up.  Right now it doesn't build and I am not sure what I am
missing.

Specifically I have these questions.

How is cpu-features-offsets.sym used and what do I need in this file?
I think this may be how _dl_aarch64_cpu_features is supposed to be
defined but I am not sure.

I obviously need something in init_cpu_features to check if mrs is
emulated in the kernel but I am not sure how to do that.  I know it
involves the HWCAPs but I am not sure how to access them, do I need a
sym file to get access to that too?  Something like
sysdeps/arm/rtld-global-offsets.sym?

Right now my build dies with:

<stdin>:2:102: error: implicit declaration of function ‘rtld_global_ro_offsetof’ [-Werror=implicit-function-declaration]
<stdin>:2:127: error: ‘_dl_aarch64_cpu_features’ undeclared (first use in this function)
<stdin>:2:127: note: each undeclared identifier is reported only once for each function it appears in
<stdin>:3:82: error: invalid application of ‘sizeof’ to incomplete type ‘struct cpu_features’
cc1: all warnings being treated as errors
../Makerules:266: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h' failed
make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h] Error 1

Steve Ellcey
[hidden email]

ifunc2.diff (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Florian Weimer-5
On 01/24/2017 12:33 AM, Steve Ellcey wrote:
> How is cpu-features-offsets.sym used and what do I need in this file?
> I think this may be how _dl_aarch64_cpu_features is supposed to be
> defined but I am not sure.

It allows the assembler to use the values of C constant expressions.
Commit 67aae64512cb42332f76a83e84ac2bc608ad4ad2 is an aarch64 example of
its use.

Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Adhemerval Zanella-2
In reply to this post by Steve Ellcey-4


On 23/01/2017 21:33, Steve Ellcey wrote:

> On Thu, 2017-01-19 at 17:41 -0200, Adhemerval Zanella wrote:
>>  
>> I think to avoid potentially multiple kernel traps at loading or plt resolve time,
>> a better solution would be issue the mrs instruction once at loader/program startup,
>> fill in an internal structure with the required information and use it later on
>> ifunc resolution.  This is similar the cpu-features/cacheinfo strategy for x86.
>>
>> From last patch iteration [1] documentation, kernel provides the HWCAP_CPUID
>> bit on hwcap to indication it supports the mrs emulation.  So using my previous
>> suggestion I would recommend:
>>
>>   1. Remove any configure check or restriction.
>>   2. Add a cpu_features module similar to x86 that set a global state with
>>      the cpu information obtained from kernel.  It will first check HWCAP_CPUID
>>      bit on hwcap and if it is set then issue the mrs instruction.  It will
>>      then populate the global state with the required cpu information.
>>   3. Use the cpu information to select the correct ifunc.
>>
>> It has another advantage of avoid more complexity with different glibc
>> with different minimum required kernels.
>
>
> Adhemerval,
>
> I am looking at the cpu-features setup from x86 and trying to implement
> that for aarch64 but there are some things I don't understand about the
> code and I was hoping you (or someone else on the list) could help me.
> I have attached the patch I have so far, this code doesn't contain any
> use of the cpu features code but is just the code that tries to initialize
> it on start up.  Right now it doesn't build and I am not sure what I am
> missing.
>
> Specifically I have these questions.
>
> How is cpu-features-offsets.sym used and what do I need in this file?
> I think this may be how _dl_aarch64_cpu_features is supposed to be
> defined but I am not sure.
>

The .sym files are a trick glibc uses to basically define struct or tls
offsets so use on assembly implementations. x86 uses it because it
originally implemented most of ifunc resolvers directly in assembly
(back when compiler support was lacking).

Since you are implementing directly on C, these files are unnecessary.

> I obviously need something in init_cpu_features to check if mrs is
> emulated in the kernel but I am not sure how to do that.  I know it
> involves the HWCAPs but I am not sure how to access them, do I need a
> sym file to get access to that too?  Something like
> sysdeps/arm/rtld-global-offsets.sym?
>
> Right now my build dies with:
>
> <stdin>:2:102: error: implicit declaration of function ‘rtld_global_ro_offsetof’ [-Werror=implicit-function-declaration]
> <stdin>:2:127: error: ‘_dl_aarch64_cpu_features’ undeclared (first use in this function)
> <stdin>:2:127: note: each undeclared identifier is reported only once for each function it appears in
> <stdin>:3:82: error: invalid application of ‘sizeof’ to incomplete type ‘struct cpu_features’
> cc1: all warnings being treated as errors
> ../Makerules:266: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h' failed
> make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/cpu-features-offsets.h] Error 1
>
> Steve Ellcey
> [hidden email]
>

This branch in my personal repo [1] have a workable draft version for
aarch64.  It contains 2 patches, one that implements the cpu-features.c
for aarch64 and another one that actually uses it to implement the
thundex ifunc.

On the first patch I would like to remove the sysdeps/aarch64/ldsodefs.h and
make only Linux specific, because of hwcap. I will try to cleanup this up
later.

[1] https://github.com/zatrazz/glibc/tree/master-aarch64-ifunc
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
On Tue, 2017-01-24 at 12:09 -0200, Adhemerval Zanella wrote:

> This branch in my personal repo [1] have a workable draft version for
> aarch64.  It contains 2 patches, one that implements the cpu-features.c
> for aarch64 and another one that actually uses it to implement the
> thundex ifunc.
>
> On the first patch I would like to remove the sysdeps/aarch64/ldsodefs.h and
> make only Linux specific, because of hwcap. I will try to cleanup this up
> later.
>
> [1] https://github.com/zatrazz/glibc/tree/master-aarch64-ifunc

Thanks Adhemerval,

That clears a lot of things up.  One thing I noticed in your tree is
that you only call init_cpu_features from  __libc_start_main for the
static glibc.  On x86 they also defined DL_PLATFORM_INIT to be a
routine that calls init_cpu_features for the dynamically loaded glibc. 

I added this code to sysdeps/aarch64/dl-machine.h but when I added it I
got a build error.  I am using the same prototype for dl_platform_init
that x86 has so I am not sure why I get this error.

Steve Ellcey
[hidden email]


diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-
machine.h
index 84b8aec..7f38a68 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -426,4 +426,20 @@ elf_machine_lazy_rel (struct link_map *map,
     _dl_reloc_bad_type (map, r_type, 1);
 }
 
+#define DL_PLATFORM_INIT dl_platform_init ()
+
+static inline void __attribute__ ((unused))
+dl_platform_init (void)
+{
+  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
+    /* Avoid an empty string which would disturb us.  */
+    GLRO(dl_platform) = NULL;
+
+#ifdef SHARED
+  /* init_cpu_features has been called early from __libc_start_main in
+     static executable.  */
+  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
+#endif
+}
+
 #endif



The error I get is:

In file included from dynamic-link.h:92:0,
                 from dl-conflict.c:59:
../sysdeps/aarch64/dl-machine.h: In function ‘_dl_resolve_conflicts’:
../sysdeps/aarch64/dl-machine.h:432:1: error: invalid storage class for function ‘dl_platform_init’
 dl_platform_init (void)
 ^~~~~~~~~~~~~~~~
../o-iterator.mk:9: recipe for target '/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/elf/dl-conflict.o' failed
make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-glibc64/elf/dl-conflict.o] Error 1

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
Never mind.  I fixed this by moving the definition of dl_platform_init
up earlier in the file.  That, plus including cpu-features.c fixed the
build.

Steve


On Tue, 2017-01-24 at 11:34 -0800, Steve Ellcey wrote:


> I added this code to sysdeps/aarch64/dl-machine.h but when I added it
> I
> got a build error.  I am using the same prototype for
> dl_platform_init
> that x86 has so I am not sure why I get this error.
>
> Steve Ellcey
> [hidden email]
>
>
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-
> machine.h
> index 84b8aec..7f38a68 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -426,4 +426,20 @@ elf_machine_lazy_rel (struct link_map *map,
>      _dl_reloc_bad_type (map, r_type, 1);
>  }
>  
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main
> in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
>  #endif
>
>
>
> The error I get is:
>
> In file included from dynamic-link.h:92:0,
>                  from dl-conflict.c:59:
> ../sysdeps/aarch64/dl-machine.h: In function ‘_dl_resolve_conflicts’:
> ../sysdeps/aarch64/dl-machine.h:432:1: error: invalid storage class
> for function ‘dl_platform_init’
>  dl_platform_init (void)
>  ^~~~~~~~~~~~~~~~
> ../o-iterator.mk:9: recipe for target '/home/ubuntu/sellcey/glibc-
> ifunc-new/obj-glibc64/elf/dl-conflict.o' failed
> make[2]: *** [/home/ubuntu/sellcey/glibc-ifunc-new/obj-
> glibc64/elf/dl-conflict.o] Error 1
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
In reply to this post by Adhemerval Zanella-2
Here is a new version of the aarch64 ifunc patch with the cpu-features
style of initialization on startup.  Adhemerval, since I took some code
from your branch I added your name to the ChangeLog.  In addition to
doing the mrs instruction on startup the main difference in this patch
from the last one is that it uses ifuncs in both the shared and archive
libc libraries.

Steve Ellcey
[hidden email]


2017-01-25  Steve Ellcey  <[hidden email]>
            Adhemerval Zanella  <[hidden email]>

        * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
        (DL_PLATFORM_INIT): New define.
        (dl_platform_init): New function.
        * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
        * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
        (memmove): Use MEMMOVE for name.
        (memcpy): Use MEMCPY for name.  Add loop with prefetching
        under USE_THUNDERX macro.
        * sysdeps/aarch64/multiarch/Makefile: New file.
        * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
        * sysdeps/aarch64/multiarch/init-arch.h: Ditto.
        * sysdeps/aarch64/multiarch/memcpy.c: Ditto.
        * sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
        * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
        * sysdeps/aarch64/multiarch/memmove.c: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.

ifunc.patch (29K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Adhemerval Zanella-2


On 25/01/2017 15:34, Steve Ellcey wrote:
> Here is a new version of the aarch64 ifunc patch with the cpu-features
> style of initialization on startup.  Adhemerval, since I took some code
> from your branch I added your name to the ChangeLog.  In addition to
> doing the mrs instruction on startup the main difference in this patch
> from the last one is that it uses ifuncs in both the shared and archive
> libc libraries.
>
> Steve Ellcey
> [hidden email]

Hi Steve,

I think it is better to split this patchset in two, one for multiarch foundation
for aarch64 and another one for the thunderx memcpy implementation itself.

Besides that I think patch should be ok.

> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..eafbf77 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.

Missing one line description for this file.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Siddhesh Poyarekar-8
In reply to this post by Steve Ellcey-4
On Wednesday 25 January 2017 11:04 PM, Steve Ellcey wrote:

> Here is a new version of the aarch64 ifunc patch with the cpu-features
> style of initialization on startup.  Adhemerval, since I took some code
> from your branch I added your name to the ChangeLog.  In addition to
> doing the mrs instruction on startup the main difference in this patch
> from the last one is that it uses ifuncs in both the shared and archive
> libc libraries.
>
> Steve Ellcey
> [hidden email]
>
>
> 2017-01-25  Steve Ellcey  <[hidden email]>
>    Adhemerval Zanella  <[hidden email]>
>
> * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
> (DL_PLATFORM_INIT): New define.
> (dl_platform_init): New function.
> * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
> * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
> (memmove): Use MEMMOVE for name.
> (memcpy): Use MEMCPY for name.  Add loop with prefetching
> under USE_THUNDERX macro.
> * sysdeps/aarch64/multiarch/Makefile: New file.
> * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Ditto.
> * sysdeps/aarch64/multiarch/init-arch.h: Ditto.
> * sysdeps/aarch64/multiarch/memcpy.c: Ditto.
> * sysdeps/aarch64/multiarch/memcpy_generic.S: Ditto.
> * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Ditto.
> * sysdeps/aarch64/multiarch/memmove.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.
>
>
> ifunc.patch
>
>
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index 84b8aec..15d79a6 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -25,6 +25,7 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-irel.h>
> +#include <cpu-features.c>
>  
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
> @@ -225,6 +226,23 @@ _dl_start_user: \n\
>  #define ELF_MACHINE_NO_REL 1
>  #define ELF_MACHINE_NO_RELA 0
>  
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> +
>  static inline ElfW(Addr)
>  elf_machine_fixup_plt (struct link_map *map, lookup_t t,
>         const ElfW(Rela) *reloc,
> diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
> index f277074..ba4ada3 100644
> --- a/sysdeps/aarch64/ldsodefs.h
> +++ b/sysdeps/aarch64/ldsodefs.h
> @@ -20,6 +20,7 @@
>  #define _AARCH64_LDSODEFS_H 1
>  
>  #include <elf.h>
> +#include <cpu-features.h>
>  
>  struct La_aarch64_regs;
>  struct La_aarch64_retval;
> diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S
> index 29af8b1..74444b4 100644
> --- a/sysdeps/aarch64/memcpy.S
> +++ b/sysdeps/aarch64/memcpy.S
> @@ -59,7 +59,14 @@
>     Overlapping large forward memmoves use a loop that copies backwards.
>  */
>  
> -ENTRY_ALIGN (memmove, 6)
> +#ifndef MEMMOVE
> +#  define MEMMOVE memmove
> +#endif
> +#ifndef MEMCPY
> +#  define MEMCPY memcpy
> +#endif
> +
> +ENTRY_ALIGN (MEMMOVE, 6)
>  
>   DELOUSE (0)
>   DELOUSE (1)
> @@ -71,9 +78,9 @@ ENTRY_ALIGN (memmove, 6)
>   b.lo L(move_long)
>  
>   /* Common case falls through into memcpy.  */
> -END (memmove)
> -libc_hidden_builtin_def (memmove)
> -ENTRY (memcpy)
> +END (MEMMOVE)
> +libc_hidden_builtin_def (MEMMOVE)
> +ENTRY (MEMCPY)
>  
>   DELOUSE (0)
>   DELOUSE (1)
> @@ -158,10 +165,22 @@ L(copy96):
>  
>   .p2align 4
>  L(copy_long):
> +
> +#ifdef USE_THUNDERX
> +
> + /* On thunderx, large memcpy's are helped by software prefetching.
> +   This loop is identical to the one below it but with prefetching
> +   instructions included.  For loops that are less than 32768 bytes,
> +   the prefetching does not help and slow the code down so we only
> +   use the prefetching loop for the largest memcpys.  */

I think it would be cleaner to put the full generic and thunderx
implementations in separate files instead of trying to do this macro
dance because it keeps micro-architecture details separate.  Assembly
code is hard to maintain as it is without adding conditional compilation
using macros.

I also second Adhemerval's suggestion to separate the patch to add the
framework from the one to add the thunderx ifunc.  It makes for easier
cherry picking and git-blaming.

Siddhesh

> +
> + cmp count, #32768
> + b.lo L(copy_long_without_prefetch)
>   and tmp1, dstin, 15
>   bic dst, dstin, 15
>   ldp D_l, D_h, [src]
>   sub src, src, tmp1
> + prfm pldl1strm, [src, 384]
>   add count, count, tmp1 /* Count is now 16 too large.  */
>   ldp A_l, A_h, [src, 16]
>   stp D_l, D_h, [dstin]
> @@ -169,7 +188,10 @@ L(copy_long):
>   ldp C_l, C_h, [src, 48]
>   ldp D_l, D_h, [src, 64]!
>   subs count, count, 128 + 16 /* Test and readjust count.  */
> - b.ls 2f
> +
> +L(prefetch_loop64):
> + tbz src, #6, 1f
> + prfm pldl1strm, [src, 512]
>  1:
>   stp A_l, A_h, [dst, 16]
>   ldp A_l, A_h, [src, 16]
> @@ -180,12 +202,40 @@ L(copy_long):
>   stp D_l, D_h, [dst, 64]!
>   ldp D_l, D_h, [src, 64]!
>   subs count, count, 64
> - b.hi 1b
> + b.hi L(prefetch_loop64)
> + b L(last64)
> +
> +L(copy_long_without_prefetch):
> +#endif
> +
> + and tmp1, dstin, 15
> + bic dst, dstin, 15
> + ldp D_l, D_h, [src]
> + sub src, src, tmp1
> + add count, count, tmp1 /* Count is now 16 too large.  */
> + ldp A_l, A_h, [src, 16]
> + stp D_l, D_h, [dstin]
> + ldp B_l, B_h, [src, 32]
> + ldp C_l, C_h, [src, 48]
> + ldp D_l, D_h, [src, 64]!
> + subs count, count, 128 + 16 /* Test and readjust count.  */
> + b.ls L(last64)
> +L(loop64):
> + stp A_l, A_h, [dst, 16]
> + ldp A_l, A_h, [src, 16]
> + stp B_l, B_h, [dst, 32]
> + ldp B_l, B_h, [src, 32]
> + stp C_l, C_h, [dst, 48]
> + ldp C_l, C_h, [src, 48]
> + stp D_l, D_h, [dst, 64]!
> + ldp D_l, D_h, [src, 64]!
> + subs count, count, 64
> + b.hi L(loop64)
>  
>   /* Write the last full set of 64 bytes.  The remainder is at most 64
>     bytes, so it is safe to always copy 64 bytes from the end even if
>     there is just 1 byte left.  */
> -2:
> +L(last64):
>   ldp E_l, E_h, [srcend, -64]
>   stp A_l, A_h, [dst, 16]
>   ldp A_l, A_h, [srcend, -48]
> @@ -256,5 +306,5 @@ L(move_long):
>   stp C_l, C_h, [dstin]
>  3: ret
>  
> -END (memcpy)
> -libc_hidden_builtin_def (memcpy)
> +END (MEMCPY)
> +libc_hidden_builtin_def (MEMCPY)
> diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile
> index e69de29..78d52c7 100644
> --- a/sysdeps/aarch64/multiarch/Makefile
> +++ b/sysdeps/aarch64/multiarch/Makefile
> @@ -0,0 +1,3 @@
> +ifeq ($(subdir),string)
> +sysdep_routines += memcpy_generic memcpy_thunderx
> +endif
> diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> index e69de29..c4f23df 100644
> --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c
> @@ -0,0 +1,51 @@
> +/* Enumerate available IFUNC implementations of a function.  AARCH64 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include <string.h>
> +#include <wchar.h>
> +#include <ldsodefs.h>
> +#include <ifunc-impl-list.h>
> +#include <init-arch.h>
> +#include <stdio.h>
> +
> +/* Maximum number of IFUNC implementations.  */
> +#define MAX_IFUNC 2
> +
> +size_t
> +__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
> + size_t max)
> +{
> +  assert (max >= MAX_IFUNC);
> +
> +  size_t i = 0;
> +
> +  INIT_ARCH ();
> +
> +  /* Support sysdeps/aarch64/multiarch/memcpy.c and memmove.c.  */
> +  IFUNC_IMPL (i, name, memcpy,
> +      IFUNC_IMPL_ADD (array, i, memcpy, IS_THUNDERX (midr),
> +      __memcpy_thunderx)
> +      IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic))
> +  IFUNC_IMPL (i, name, memmove,
> +      IFUNC_IMPL_ADD (array, i, memmove, IS_THUNDERX (midr),
> +      __memmove_thunderx)
> +      IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic))
> +
> +  return i;
> +}
> diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h
> index e69de29..eafbf77 100644
> --- a/sysdeps/aarch64/multiarch/init-arch.h
> +++ b/sysdeps/aarch64/multiarch/init-arch.h
> @@ -0,0 +1,22 @@
> +/* This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <ldsodefs.h>
> +
> +#define INIT_ARCH() \
> +  uint64_t __attribute__((unused)) midr = \
> +    GLRO(dl_aarch64_cpu_features).midr_el1;
> diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c
> index e69de29..4e3f251 100644
> --- a/sysdeps/aarch64/multiarch/memcpy.c
> +++ b/sysdeps/aarch64/multiarch/memcpy.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memcpy. AARCH64 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for the definition in libc.  */
> +
> +#if IS_IN (libc)
> +/* Redefine memcpy so that the compiler won't complain about the type
> +   mismatch with the IFUNC selector in strong_alias, below.  */
> +# undef memcpy
> +# define memcpy __redirect_memcpy
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memcpy) __libc_memcpy;
> +
> +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
> +extern __typeof (__redirect_memcpy) __memcpy_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memcpy,
> +            IS_THUNDERX (midr) ? __memcpy_thunderx : __memcpy_generic);
> +
> +#undef memcpy
> +strong_alias (__libc_memcpy, memcpy);
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memcpy_generic.S b/sysdeps/aarch64/multiarch/memcpy_generic.S
> index e69de29..50e1a1c 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_generic.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_generic.S
> @@ -0,0 +1,42 @@
> +/* A Generic Optimized memcpy implementation for AARCH64.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* The actual memcpy and memmove code is in ../memcpy.S.  If we are
> +   building libc this file defines __memcpy_generic and __memmove_generic.
> +   Otherwise the include of ../memcpy.S will define the normal __memcpy
> +   and__memmove entry points.  */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_generic
> +#define MEMMOVE __memmove_generic
> +
> +/* Do not hide the generic versions of memcpy and memmove, we use them
> +   internally.  */
> +#undef libc_hidden_builtin_def
> +#define libc_hidden_builtin_def(name)
> +
> +/* It doesn't make sense to send libc-internal memcpy calls through a PLT. */
> + .globl __GI_memcpy; __GI_memcpy = __memcpy_generic
> + .globl __GI_memmove; __GI_memmove = __memmove_generic
> +
> +#endif
> +
> +#include "../memcpy.S"
> diff --git a/sysdeps/aarch64/multiarch/memcpy_thunderx.S b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> index e69de29..ee971c8 100644
> --- a/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> +++ b/sysdeps/aarch64/multiarch/memcpy_thunderx.S
> @@ -0,0 +1,32 @@
> +/* A Thunderx Optimized memcpy implementation for AARCH64.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* The actual thunderx optimized code is in ../memcpy.S under the USE_THUNDERX
> +   ifdef.  If we are not building libc then we do not build anything when
> +   compiling this file and __memcpy is defined by memcpy_generic.S.  */
> +
> +#include <sysdep.h>
> +
> +#if IS_IN (libc)
> +
> +#define MEMCPY __memcpy_thunderx
> +#define MEMMOVE __memmove_thunderx
> +#define USE_THUNDERX
> +#include "../memcpy.S"
> +
> +#endif
> diff --git a/sysdeps/aarch64/multiarch/memmove.c b/sysdeps/aarch64/multiarch/memmove.c
> index e69de29..8d7a146 100644
> --- a/sysdeps/aarch64/multiarch/memmove.c
> +++ b/sysdeps/aarch64/multiarch/memmove.c
> @@ -0,0 +1,39 @@
> +/* Multiple versions of memmove. AARCH64 version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* Define multiple versions only for the definition in libc.  */
> +
> +#if IS_IN (libc)
> +/* Redefine memmove so that the compiler won't complain about the type
> +   mismatch with the IFUNC selector in strong_alias, below.  */
> +# undef memmove
> +# define memmove __redirect_memmove
> +# include <string.h>
> +# include <init-arch.h>
> +
> +extern __typeof (__redirect_memmove) __libc_memmove;
> +
> +extern __typeof (__redirect_memmove) __memmove_generic attribute_hidden;
> +extern __typeof (__redirect_memmove) __memmove_thunderx attribute_hidden;
> +
> +libc_ifunc (__libc_memmove,
> +            IS_THUNDERX (midr) ? __memmove_thunderx : __memmove_generic);
> +
> +#undef memmove
> +strong_alias (__libc_memmove, memmove);
> +#endif
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> index e69de29..8e4b514 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> @@ -0,0 +1,38 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <cpu-features.h>
> +
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID (1 << 11)
> +#endif
> +
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;
> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> index e69de29..c92b650 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> @@ -0,0 +1,49 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CPU_FEATURES_AARCH64_H
> +#define _CPU_FEATURES_AARCH64_H
> +
> +#include <stdint.h>
> +
> +#define MIDR_PARTNUM_SHIFT 4
> +#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
> +#define MIDR_PARTNUM(midr) \
> + (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
> +#define MIDR_ARCHITECTURE_SHIFT 16
> +#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_ARCHITECTURE(midr) \
> + (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_VARIANT_SHIFT 20
> +#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
> +#define MIDR_VARIANT(midr) \
> + (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
> +#define MIDR_IMPLEMENTOR_SHIFT 24
> +#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
> +#define MIDR_IMPLEMENTOR(midr) \
> + (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
> +
> +#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
> +   && MIDR_PARTNUM(midr) == 0x0a1)
> +
> +struct cpu_features
> +{
> +  uint64_t midr_el1;
> +};
> +
> +#endif /* _CPU_FEATURES_AARCH64_H  */
> diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> index e69de29..438046a 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> @@ -0,0 +1,60 @@
> +/* Data for AArch64 version of processor capability information.
> +   Linux version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* If anything should be added here check whether the size of each string
> +   is still ok with the given array size.
> +
> +   All the #ifdefs in the definitions are quite irritating but
> +   necessary if we want to avoid duplicating the information.  There
> +   are three different modes:
> +
> +   - PROCINFO_DECL is defined.  This means we are only interested in
> +     declarations.
> +
> +   - PROCINFO_DECL is not defined:
> +
> +     + if SHARED is defined the file is included in an array
> +       initializer.  The .element = { ... } syntax is needed.
> +
> +     + if SHARED is not defined a normal array initialization is
> +       needed.
> +  */
> +
> +#ifndef PROCINFO_CLASS
> +# define PROCINFO_CLASS
> +#endif
> +
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> +  ._dl_aarch64_cpu_features
> +# else
> +PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
> +# endif
> +# ifndef PROCINFO_DECL
> += { }
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> +#undef PROCINFO_DECL
> +#undef PROCINFO_CLASS
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> index e69de29..c98aff1 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> @@ -0,0 +1,40 @@
> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +   int argc, char **argv,
> +   __typeof (main) init,
> +   void (*fini) (void),
> +   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +     stack_end);
> +}
> +#endif
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Wilco Dijkstra-2
In reply to this post by Steve Ellcey-4
Siddhesh wrote:
> I think it would be cleaner to put the full generic and thunderx
> implementations in separate files instead of trying to do this macro
> dance because it keeps micro-architecture details separate.  Assembly
> code is hard to maintain as it is without adding conditional compilation
> using macros.

I agree we want to avoid using conditional compilation as much as possible.
On the other hand duplication is a bad idea too, I've seen too many cases where
bugs were only fixed in one of the N duplicates.

However I'm actually wondering whether we need an ifunc for this case.
For large copies from L2 I think adding a prefetch should be benign even on
cores that don't need it, so if the benchmarks confirm this we should consider
updating the generic memcpy.

> I also second Adhemerval's suggestion to separate the patch to add the
> framework from the one to add the thunderx ifunc.  It makes for easier
> cherry picking and git-blaming.

Agreed.

Wilco
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Siddhesh Poyarekar-8
On Tuesday 07 February 2017 06:12 PM, Wilco Dijkstra wrote:
> I agree we want to avoid using conditional compilation as much as possible.
> On the other hand duplication is a bad idea too, I've seen too many cases where
> bugs were only fixed in one of the N duplicates.

Sure, but then in that case the de-duplication must be done by
identifying a logical code block and make that into a macro to override
and not just arbitrarily inject hunks of code.  So in this case it could
be alternate implementations of copy_long that is sufficient so #define
COPY_LONG in both memcpy_generic and memcpy_thunderx and have the parent
(memcpy.S) use that macro.  In fact, that might even end up making the
code a bit nicer to read.

> However I'm actually wondering whether we need an ifunc for this case.
> For large copies from L2 I think adding a prefetch should be benign even on
> cores that don't need it, so if the benchmarks confirm this we should consider
> updating the generic memcpy.

That is a call that ARM maintainers can take and is also another reason
to separate the IFUNC infrastructure code from the thunderx change.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Adhemerval Zanella-2


On 07/02/2017 11:01, Siddhesh Poyarekar wrote:

> On Tuesday 07 February 2017 06:12 PM, Wilco Dijkstra wrote:
>> I agree we want to avoid using conditional compilation as much as possible.
>> On the other hand duplication is a bad idea too, I've seen too many cases where
>> bugs were only fixed in one of the N duplicates.
>
> Sure, but then in that case the de-duplication must be done by
> identifying a logical code block and make that into a macro to override
> and not just arbitrarily inject hunks of code.  So in this case it could
> be alternate implementations of copy_long that is sufficient so #define
> COPY_LONG in both memcpy_generic and memcpy_thunderx and have the parent
> (memcpy.S) use that macro.  In fact, that might even end up making the
> code a bit nicer to read.
>
>> However I'm actually wondering whether we need an ifunc for this case.
>> For large copies from L2 I think adding a prefetch should be benign even on
>> cores that don't need it, so if the benchmarks confirm this we should consider
>> updating the generic memcpy.
>
> That is a call that ARM maintainers can take and is also another reason
> to separate the IFUNC infrastructure code from the thunderx change.
I checked only the memcpy change on a APM X-Gene 1 and results seems to show
improvements on aligned input, at least for sizes shorter thatn 4MB.  I would
like to check on more armv8 chips, but it does seems a nice improvement
over generic implementation.

bench-memcpy-large.out (1K) Download Attachment
bench-memcpy-large.patched (1K) Download Attachment
memcpy_aarch64.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
OK, here is the basic IFUNC enablement for aarch64 without the
memcpy/memmove changes that use it.  I verified that it builds and
causes no regressions on aarch64.  As mentioned in the original email
this code depends on the mrs instruction which is privileged, but the
4.11 kernel will have emulation for it (https://lkml.org/lkml/2017/1/10
/816).

OK to checkin this part?

Steve Ellcey
[hidden email]


2017-02-07  Steve Ellcey  <[hidden email]>
            Adhemerval Zanella  <[hidden email]>

        * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
        (DL_PLATFORM_INIT): New define.
        (dl_platform_init): New function.
        * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
        * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.

ifunc.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Siddhesh Poyarekar-8
On Wednesday 08 February 2017 04:50 AM, Steve Ellcey wrote:
> OK, here is the basic IFUNC enablement for aarch64 without the
> memcpy/memmove changes that use it.  I verified that it builds and
> causes no regressions on aarch64.  As mentioned in the original email
> this code depends on the mrs instruction which is privileged, but the
> 4.11 kernel will have emulation for it (https://lkml.org/lkml/2017/1/10
> /816).
>
> OK to checkin this part?

Looks OK with a couple of nits below.

>
> Steve Ellcey
> [hidden email]
>
>
> 2017-02-07  Steve Ellcey  <[hidden email]>
>    Adhemerval Zanella  <[hidden email]>
>
> * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
> (DL_PLATFORM_INIT): New define.
> (dl_platform_init): New function.
> * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
> * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Ditto.
> * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Ditto.

I was told years ago that we prefer 'Likewise' to 'Ditto' :)

>
>
> ifunc.patch
>
>
> diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
> index 84b8aec..15d79a6 100644
> --- a/sysdeps/aarch64/dl-machine.h
> +++ b/sysdeps/aarch64/dl-machine.h
> @@ -25,6 +25,7 @@
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
>  #include <dl-irel.h>
> +#include <cpu-features.c>
>  
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
> @@ -225,6 +226,23 @@ _dl_start_user: \n\
>  #define ELF_MACHINE_NO_REL 1
>  #define ELF_MACHINE_NO_RELA 0
>  
> +#define DL_PLATFORM_INIT dl_platform_init ()
> +
> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
> +
> +
>  static inline ElfW(Addr)
>  elf_machine_fixup_plt (struct link_map *map, lookup_t t,
>         const ElfW(Rela) *reloc,
> diff --git a/sysdeps/aarch64/ldsodefs.h b/sysdeps/aarch64/ldsodefs.h
> index f277074..ba4ada3 100644
> --- a/sysdeps/aarch64/ldsodefs.h
> +++ b/sysdeps/aarch64/ldsodefs.h
> @@ -20,6 +20,7 @@
>  #define _AARCH64_LDSODEFS_H 1
>  
>  #include <elf.h>
> +#include <cpu-features.h>
>  
>  struct La_aarch64_regs;
>  struct La_aarch64_retval;
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> index e69de29..8e4b514 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c
> @@ -0,0 +1,38 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <cpu-features.h>
> +
> +#ifndef HWCAP_CPUID
> +# define HWCAP_CPUID (1 << 11)
> +#endif
> +
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;
> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
> diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> index e69de29..c92b650 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h
> @@ -0,0 +1,49 @@
> +/* Initialize CPU feature data.  AArch64 version.
> +   This file is part of the GNU C Library.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _CPU_FEATURES_AARCH64_H
> +#define _CPU_FEATURES_AARCH64_H
> +
> +#include <stdint.h>
> +
> +#define MIDR_PARTNUM_SHIFT 4
> +#define MIDR_PARTNUM_MASK (0xfff << MIDR_PARTNUM_SHIFT)
> +#define MIDR_PARTNUM(midr) \
> + (((midr) & MIDR_PARTNUM_MASK) >> MIDR_PARTNUM_SHIFT)
> +#define MIDR_ARCHITECTURE_SHIFT 16
> +#define MIDR_ARCHITECTURE_MASK (0xf << MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_ARCHITECTURE(midr) \
> + (((midr) & MIDR_ARCHITECTURE_MASK) >> MIDR_ARCHITECTURE_SHIFT)
> +#define MIDR_VARIANT_SHIFT 20
> +#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT)
> +#define MIDR_VARIANT(midr) \
> + (((midr) & MIDR_VARIANT_MASK) >> MIDR_VARIANT_SHIFT)
> +#define MIDR_IMPLEMENTOR_SHIFT 24
> +#define MIDR_IMPLEMENTOR_MASK (0xff << MIDR_IMPLEMENTOR_SHIFT)
> +#define MIDR_IMPLEMENTOR(midr) \
> + (((midr) & MIDR_IMPLEMENTOR_MASK) >> MIDR_IMPLEMENTOR_SHIFT)
> +
> +#define IS_THUNDERX(midr) (MIDR_IMPLEMENTOR(midr) == 'C' \
> +   && MIDR_PARTNUM(midr) == 0x0a1)
> +
> +struct cpu_features
> +{
> +  uint64_t midr_el1;
> +};
> +
> +#endif /* _CPU_FEATURES_AARCH64_H  */
> diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> index e69de29..438046a 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c
> @@ -0,0 +1,60 @@
> +/* Data for AArch64 version of processor capability information.
> +   Linux version.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* If anything should be added here check whether the size of each string
> +   is still ok with the given array size.
> +
> +   All the #ifdefs in the definitions are quite irritating but
> +   necessary if we want to avoid duplicating the information.  There
> +   are three different modes:
> +
> +   - PROCINFO_DECL is defined.  This means we are only interested in
> +     declarations.
> +
> +   - PROCINFO_DECL is not defined:
> +
> +     + if SHARED is defined the file is included in an array
> +       initializer.  The .element = { ... } syntax is needed.
> +
> +     + if SHARED is not defined a normal array initialization is
> +       needed.
> +  */
> +
> +#ifndef PROCINFO_CLASS
> +# define PROCINFO_CLASS
> +#endif
> +
> +#if !IS_IN (ldconfig)
> +# if !defined PROCINFO_DECL && defined SHARED
> +  ._dl_aarch64_cpu_features
> +# else
> +PROCINFO_CLASS struct cpu_features _dl_aarch64_cpu_features
> +# endif
> +# ifndef PROCINFO_DECL
> += { }
> +# endif
> +# if !defined SHARED || defined PROCINFO_DECL
> +;
> +# else
> +,
> +# endif
> +#endif
> +
> +#undef PROCINFO_DECL
> +#undef PROCINFO_CLASS
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc-start.c b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> index e69de29..c98aff1 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc-start.c
> @@ -0,0 +1,40 @@

You've forgotten to add a one line description for this file.

> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +   int argc, char **argv,
> +   __typeof (main) init,
> +   void (*fini) (void),
> +   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +     stack_end);
> +}
> +#endif
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Siddhesh Poyarekar-8
On Wednesday 08 February 2017 11:15 AM, Siddhesh Poyarekar wrote:
> Looks OK with a couple of nits below.

Oh and I suppose you need an ack from the ARM maintainers as well before
you push.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Szabolcs Nagy-2
On 08/02/17 05:46, Siddhesh Poyarekar wrote:
> On Wednesday 08 February 2017 11:15 AM, Siddhesh Poyarekar wrote:
>> Looks OK with a couple of nits below.
>
> Oh and I suppose you need an ack from the ARM maintainers as well before
> you push.
>

arm maintainer != aarch64 maintainer.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Steve Ellcey-4
In reply to this post by Siddhesh Poyarekar-8
On Wed, 2017-02-08 at 11:15 +0530, Siddhesh Poyarekar wrote:

> Looks OK with a couple of nits below.

Here is a de-nitted version with the ChangeLog using 'Likewise'
instead of 'Ditto' and with a one line description at the top
of libc-start.c.

Steve Ellcey
[hidden email]



2017-02-08  Steve Ellcey  <[hidden email]>
            Adhemerval Zanella  <[hidden email]>

        * sysdeps/aarch64/dl-machine.h: Include cpu-features.c.
        (DL_PLATFORM_INIT): New define.
        (dl_platform_init): New function.
        * sysdeps/aarch64/ldsodefs.h: Include cpu-features.h.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.c: New file.
        * sysdeps/unix/sysv/linux/aarch64/cpu-features.h: Likewise.
        * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.c: Likewise.
        * sysdeps/unix/sysv/linux/aarch64/libc-start.c: Likewise.

ifunc.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add ifunc memcpy and memmove for aarch64

Szabolcs Nagy-2
On 09/02/17 00:02, Steve Ellcey wrote:

> +static inline void __attribute__ ((unused))
> +dl_platform_init (void)
> +{
> +  if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
> +    /* Avoid an empty string which would disturb us.  */
> +    GLRO(dl_platform) = NULL;
> +
> +#ifdef SHARED
> +  /* init_cpu_features has been called early from __libc_start_main in
> +     static executable.  */
> +  init_cpu_features (&GLRO(dl_aarch64_cpu_features));
> +#endif
> +}
...
> +static inline void
> +init_cpu_features (struct cpu_features *cpu_features)
> +{
> +  if (GLRO(dl_hwcap) & HWCAP_CPUID)
> +    {
> +      register uint64_t id = 0;
> +      asm volatile ("mrs %0, midr_el1" : "=r"(id));
> +      cpu_features->midr_el1 = id;

this is a trap into the kernel at every process startup

since this is called very early (dynamic linking case
above, static linking case below) i wonder if there
could be a way for the user to request midr_el1==0
unconditionally (avoiding the overhead and making
sure the most generic implementation is used)

is there something like that on other targets?

> +    }
> +  else
> +    {
> +      cpu_features->midr_el1 = 0;
> +    }
> +}
...

> +#ifdef SHARED
> +# include <csu/libc-start.c>
> +# else
> +/* The main work is done in the generic function.  */
> +# define LIBC_START_DISABLE_INLINE
> +# define LIBC_START_MAIN generic_start_main
> +# include <csu/libc-start.c>
> +# include <cpu-features.c>
> +
> +extern struct cpu_features _dl_aarch64_cpu_features;
> +
> +int
> +__libc_start_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
> +   int argc, char **argv,
> +   __typeof (main) init,
> +   void (*fini) (void),
> +   void (*rtld_fini) (void), void *stack_end)
> +{
> +  init_cpu_features (&_dl_aarch64_cpu_features);
> +  return generic_start_main (main, argc, argv, init, fini, rtld_fini,
> +     stack_end);
> +}
> +#endif
>

12