[PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

classic Classic list List threaded Threaded
111 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

[PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Carlos Eduardo Seo

The proposed patch adds a new feature for powerpc. In order to get faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB. This enables users to write versioned code based on the HWCAP bits without going through the overhead of reading them from the auxiliary vector.

A new API is published in ppc.h for get/set the bits in the aforementioned memory area (mainly for gcc to use to create builtins).

Testcases for the API functions were also created.

Tested on ppc32, ppc64 and ppc64le.

Okay to commit?

Thanks,

--
Carlos Eduardo Seo
Software Engineer - Linux on Power Toolchain
[hidden email]


hwcap-tcb.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Joseph Myers
This patch is missing documentation updates to platform.texi.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Adhemerval Zanella-2
In reply to this post by Carlos Eduardo Seo
Hi

On 08-06-2015 18:03, Carlos Eduardo Seo wrote:

>
> The proposed patch adds a new feature for powerpc. In order to get faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB. This enables users to write versioned code based on the HWCAP bits without going through the overhead of reading them from the auxiliary vector.
>
> A new API is published in ppc.h for get/set the bits in the aforementioned memory area (mainly for gcc to use to create builtins).
>
> Testcases for the API functions were also created.
>
> Tested on ppc32, ppc64 and ppc64le.
>
> Okay to commit?
>
> Thanks,
>

Besides the documentation missing pointed by Joseph, some comments below.

> @@ -203,6 +214,32 @@ register void *__thread_register __asm__
>  # define THREAD_SET_TM_CAPABLE(value) \
>      (THREAD_GET_TM_CAPABLE () = (value))
>  
> +/* hwcap & hwcap2 fields in TCB head.  */
> +# define THREAD_GET_HWCAP() \
> +    (((tcbhead_t *) ((char *) __thread_register      \
> +     - TLS_TCB_OFFSET))[-1].hwcap)
> +# define THREAD_SET_HWCAP(value) \
> +    if (value & PPC_FEATURE_ARCH_2_06)      \
> +      value |= PPC_FEATURE_ARCH_2_05 |      \
> +       PPC_FEATURE_POWER5_PLUS |      \
> +       PPC_FEATURE_POWER5 |      \
> +       PPC_FEATURE_POWER4;      \
> +    else if (value & PPC_FEATURE_ARCH_2_05)      \
> +      value |= PPC_FEATURE_POWER5_PLUS |      \
> +             PPC_FEATURE_POWER5 |      \
> +             PPC_FEATURE_POWER4;      \
> +    else if (value & PPC_FEATURE_POWER5_PLUS)      \
> +      value |= PPC_FEATURE_POWER5 |      \
> +             PPC_FEATURE_POWER4;      \
> +    else if (value & PPC_FEATURE_POWER5)      \
> +      value |= PPC_FEATURE_POWER4;      \

This very logic is already presented at other powerpc32 sysdep file [1].
Instead of duplicate the logic, I think it is better to move it in a common
file.

[1] sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h

> Index: glibc-working/sysdeps/powerpc/sys/platform/ppc.h
> ===================================================================
> --- glibc-working.orig/sysdeps/powerpc/sys/platform/ppc.h
> +++ glibc-working/sysdeps/powerpc/sys/platform/ppc.h
> @@ -23,6 +23,86 @@
>  #include <stdint.h>
>  #include <bits/ppc.h>
>  
> +
> +/* Get the hwcap/hwcap2 information from the TCB. Offsets taken
> +   from tcb-offsets.h.  */
> +static inline uint32_t
> +__ppc_get_hwcap (void)
> +{
> +
> +  uint32_t __tcb_hwcap;
> +

> Index: glibc-working/sysdeps/powerpc/sys/platform/ppc.h
> ===================================================================
> --- glibc-working.orig/sysdeps/powerpc/sys/platform/ppc.h
> +++ glibc-working/sysdeps/powerpc/sys/platform/ppc.h
> @@ -23,6 +23,86 @@
>  #include <stdint.h>
>  #include <bits/ppc.h>
>  
> +
> +/* Get the hwcap/hwcap2 information from the TCB. Offsets taken
> +   from tcb-offsets.h.  */
> +static inline uint32_t
> +__ppc_get_hwcap (void)
> +{
> +
> +  uint32_t __tcb_hwcap;
> +
> +#ifdef __powerpc64__
> +  register unsigned long __tp __asm__ ("r13");
> +  __asm__ volatile ("lwz %0,-28772(%1)\n"
> +    : "=r" (__tcb_hwcap)
> +    : "r" (__tp));
> +#else
> +  register unsigned long __tp __asm__ ("r2");
> +  __asm__ volatile ("lwz %0,-28724(%1)\n"
> +    : "=r" (__tcb_hwcap)
> +    : "r" (__tp));
> +#endif
> +
> +  return __tcb_hwcap;
> +}

There is no need to use underline names inside inline functions.  I would also
change to something more simple like:

#ifdef __powerpc64__
# define __TPREG     "r13"
# define __HWCAP1OFF -28772
#else
# define __TPREG     "r2"
# define __HWCAP1OFF -28724
#else

static inline uint32_t
__ppc_get_hwcap (void)
{
  uint32_t tcb_hwcap;
  register unsigned long tp __asm__ (__TPREG);
  __asm__ ("lwz %0, %1(%2)\n"
           : "=r" (tcb_hwcap)
           : "i" (__HWCAPOFF), "r" (tp));
  return tcp_hwcap;
}

I also think the volatile in asm is not required (there is no need to refrain
compiler to possible optimize this load inside the inline function itself).

> Index: glibc-working/sysdeps/powerpc/test-get_hwcap.c
> ===================================================================
> --- /dev/null
> +++ glibc-working/sysdeps/powerpc/test-get_hwcap.c

The test are not wrong, but you could make only on test for this functionality,
instead of splitting the set and get in different ones.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Joseph Myers
On Tue, 9 Jun 2015, Adhemerval Zanella wrote:

> There is no need to use underline names inside inline functions.  I would also

Yes there is, when in installed headers - installed headers should only
take a non-reserved name from the namespace of macros the user might
define before including the header if that name is actually intended to be
part of the API for that header.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Szabolcs Nagy-2
In reply to this post by Carlos Eduardo Seo


On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> The proposed patch adds a new feature for powerpc. In order to get
> faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> This enables users to write versioned code based on the HWCAP bits
> without going through the overhead of reading them from the auxiliary
> vector.

i assume this is for multi-versioning.

i dont see how the compiler can generate code to access the
hwcap bits currently (without making assumptions about libc
interfaces).

> A new API is published in ppc.h for get/set the bits in the
> aforementioned memory area (mainly for gcc to use to create builtins).

how can the compiler use ppc.h? will it replicate the
offset logic instead?

if hwcap is useful abi between compiler and libc
then why is this done in a powerpc specific way?

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Carlos Eduardo Seo
In reply to this post by Adhemerval Zanella-2

On 06/09/2015 11:22 AM, Adhemerval Zanella wrote:

>
>> @@ -203,6 +214,32 @@ register void *__thread_register __asm__
>>  # define THREAD_SET_TM_CAPABLE(value) \
>>      (THREAD_GET_TM_CAPABLE () = (value))
>>  
>> +/* hwcap & hwcap2 fields in TCB head.  */
>> +# define THREAD_GET_HWCAP() \
>> +    (((tcbhead_t *) ((char *) __thread_register      \
>> +     - TLS_TCB_OFFSET))[-1].hwcap)
>> +# define THREAD_SET_HWCAP(value) \
>> +    if (value & PPC_FEATURE_ARCH_2_06)      \
>> +      value |= PPC_FEATURE_ARCH_2_05 |      \
>> +       PPC_FEATURE_POWER5_PLUS |      \
>> +       PPC_FEATURE_POWER5 |      \
>> +       PPC_FEATURE_POWER4;      \
>> +    else if (value & PPC_FEATURE_ARCH_2_05)      \
>> +      value |= PPC_FEATURE_POWER5_PLUS |      \
>> +             PPC_FEATURE_POWER5 |      \
>> +             PPC_FEATURE_POWER4;      \
>> +    else if (value & PPC_FEATURE_POWER5_PLUS)      \
>> +      value |= PPC_FEATURE_POWER5 |      \
>> +             PPC_FEATURE_POWER4;      \
>> +    else if (value & PPC_FEATURE_POWER5)      \
>> +      value |= PPC_FEATURE_POWER4;      \
>
> This very logic is already presented at other powerpc32 sysdep file [1].
> Instead of duplicate the logic, I think it is better to move it in a common
> file.
>
> [1] sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
>

So, do you suggest a cleanup patch first to move this to a common file, then a rewrite of this patch on top of that? If so, in which header should I put that?

Thanks,

Carlos Eduardo Seo
Software Engineer - Linux on Power Toolchain
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Steven Munroe-2
In reply to this post by Szabolcs Nagy-2
On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
>
> On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> > The proposed patch adds a new feature for powerpc. In order to get
> > faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> > This enables users to write versioned code based on the HWCAP bits
> > without going through the overhead of reading them from the auxiliary
> > vector.

> i assume this is for multi-versioning.

The intent is for the compiler to implement the equivalent of
__builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
is RISC so we use the HWCAP. The trick to access the HWCAP[2]
efficiently as getauxv and scanning the auxv is too slow for inline
optimizations.

> i dont see how the compiler can generate code to access the
> hwcap bits currently (without making assumptions about libc
> interfaces).
>
These offset will become a durable part the PowerPC 64-bit ELF V2 ABI.

The TCB offsets are already fixed and can not change from release to
release.


> > A new API is published in ppc.h for get/set the bits in the
> > aforementioned memory area (mainly for gcc to use to create builtins).
>
> how can the compiler use ppc.h? will it replicate the
> offset logic instead?
>
See above

> if hwcap is useful abi between compiler and libc
> then why is this done in a powerpc specific way?
>

Other platform are free use this technique.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Adhemerval Zanella-2
In reply to this post by Joseph Myers


On 09-06-2015 11:26, Joseph Myers wrote:
> On Tue, 9 Jun 2015, Adhemerval Zanella wrote:
>
>> There is no need to use underline names inside inline functions.  I would also
>
> Yes there is, when in installed headers - installed headers should only
> take a non-reserved name from the namespace of macros the user might
> define before including the header if that name is actually intended to be
> part of the API for that header.
>

Does this also apply for the the variable defined inside the function?
My example still uses '__' for the defines used across header.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Joseph Myers
On Tue, 9 Jun 2015, Adhemerval Zanella wrote:

>
>
> On 09-06-2015 11:26, Joseph Myers wrote:
> > On Tue, 9 Jun 2015, Adhemerval Zanella wrote:
> >
> >> There is no need to use underline names inside inline functions.  I would also
> >
> > Yes there is, when in installed headers - installed headers should only
> > take a non-reserved name from the namespace of macros the user might
> > define before including the header if that name is actually intended to be
> > part of the API for that header.
> >
>
> Does this also apply for the the variable defined inside the function?

Yes.  Users should be able to define macros called "tp" or "tcb_hwcap"
before including the header, without those macros having any effect on the
header, unless those names are documented interfaces.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Ondřej Bílka
In reply to this post by Steven Munroe-2
On Tue, Jun 09, 2015 at 10:06:33AM -0500, Steven Munroe wrote:

> On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
> >
> > On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> > > The proposed patch adds a new feature for powerpc. In order to get
> > > faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> > > This enables users to write versioned code based on the HWCAP bits
> > > without going through the overhead of reading them from the auxiliary
> > > vector.
>
> > i assume this is for multi-versioning.
>
> The intent is for the compiler to implement the equivalent of
> __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> efficiently as getauxv and scanning the auxv is too slow for inline
> optimizations.
>
> > i dont see how the compiler can generate code to access the
> > hwcap bits currently (without making assumptions about libc
> > interfaces).
> >
> These offset will become a durable part the PowerPC 64-bit ELF V2 ABI.
>
> The TCB offsets are already fixed and can not change from release to
> release.
>
I don't have problem with this but why do you add tls, how can different
threads have different ones when kernel could move them between cores.

So instead we just add to libc api following two variables below. These would
be initialized by linker as we will probably use them internally.

extern int __hwcap, __hwcap2;
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Szabolcs Nagy-2
In reply to this post by Steven Munroe-2


On 09/06/15 16:06, Steven Munroe wrote:
> On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
>> i assume this is for multi-versioning.
>
> The intent is for the compiler to implement the equivalent of
> __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> efficiently as getauxv and scanning the auxv is too slow for inline
> optimizations.

i think getauxv is not usable by the compiler anyway,
it's not a standard api.

>> i dont see how the compiler can generate code to access the
>> hwcap bits currently (without making assumptions about libc
>> interfaces).
>>
> These offset will become a durable part the PowerPC 64-bit ELF V2 ABI.
>
> The TCB offsets are already fixed and can not change from release to
> release.

hard coded arch specific tcb offsets make sure that
targets need different tcb layout which means more
target specific maintainance instead of common c code.

>> if hwcap is useful abi between compiler and libc
>> then why is this done in a powerpc specific way?
>
> Other platform are free use this technique.

i think this is not a sustainable approach for
compiler abi extensions.

(it means juggling with magic offsets on the order
of compilers * libcs * targets).

unfortunately accessing the ssp canary is already
broken this way, i'm not sure what's a better abi,
but it's probably worth thinking about one before
the tcb code gets too messy.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Steven Munroe-2
In reply to this post by Ondřej Bílka
On Tue, 2015-06-09 at 17:42 +0200, Ondřej Bílka wrote:

> On Tue, Jun 09, 2015 at 10:06:33AM -0500, Steven Munroe wrote:
> > On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
> > >
> > > On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> > > > The proposed patch adds a new feature for powerpc. In order to get
> > > > faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> > > > This enables users to write versioned code based on the HWCAP bits
> > > > without going through the overhead of reading them from the auxiliary
> > > > vector.
> >
> > > i assume this is for multi-versioning.
> >
> > The intent is for the compiler to implement the equivalent of
> > __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> > is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> > efficiently as getauxv and scanning the auxv is too slow for inline
> > optimizations.
> >
> > > i dont see how the compiler can generate code to access the
> > > hwcap bits currently (without making assumptions about libc
> > > interfaces).
> > >
> > These offset will become a durable part the PowerPC 64-bit ELF V2 ABI.
> >
> > The TCB offsets are already fixed and can not change from release to
> > release.
> >
> I don't have problem with this but why do you add tls, how can different
> threads have different ones when kernel could move them between cores.
>
> So instead we just add to libc api following two variables below. These would
> be initialized by linker as we will probably use them internally.
>
> extern int __hwcap, __hwcap2;
>
The Power ABI's address the TCB off a dedicated GPR (R2 or R13). This
guarantees one instruction load from TCB.

A Static variable would require a an indirect load via the TOC/GOT
(which can be megabytes for a large program/library). I really really
want the avoid that.

The point is to make fast decisions about which code the execute.
STT_GNU_IFUNC is just too complication for most application programmers
to use.

Now if the GLIBC community wants to provide a durable API for static
access to the HWCAP. I have not problem with that, but it does not solve
this problem.



Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Steven Munroe-2
In reply to this post by Szabolcs Nagy-2
On Tue, 2015-06-09 at 16:48 +0100, Szabolcs Nagy wrote:

>
> On 09/06/15 16:06, Steven Munroe wrote:
> > On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
> >> i assume this is for multi-versioning.
> >
> > The intent is for the compiler to implement the equivalent of
> > __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> > is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> > efficiently as getauxv and scanning the auxv is too slow for inline
> > optimizations.
>
> i think getauxv is not usable by the compiler anyway,
> it's not a standard api.
>
> >> i dont see how the compiler can generate code to access the
> >> hwcap bits currently (without making assumptions about libc
> >> interfaces).
> >>
> > These offset will become a durable part the PowerPC 64-bit ELF V2 ABI.
> >
> > The TCB offsets are already fixed and can not change from release to
> > release.
>
> hard coded arch specific tcb offsets make sure that
> targets need different tcb layout which means more
> target specific maintainance instead of common c code.
>
> >> if hwcap is useful abi between compiler and libc
> >> then why is this done in a powerpc specific way?
> >
> > Other platform are free use this technique.
>
> i think this is not a sustainable approach for
> compiler abi extensions.
>
> (it means juggling with magic offsets on the order
> of compilers * libcs * targets).
>
> unfortunately accessing the ssp canary is already
> broken this way, i'm not sure what's a better abi,
> but it's probably worth thinking about one before
> the tcb code gets too messy.
>

I have thought about it.

Based on my detailed knowledge of the PowerISA and PowerPC ABIs this the
simplest and fastest solution.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Rich Felker-2
In reply to this post by Carlos Eduardo Seo
On Mon, Jun 08, 2015 at 06:03:16PM -0300, Carlos Eduardo Seo wrote:

>
> The proposed patch adds a new feature for powerpc. In order to get
> faster access to the HWCAP/HWCAP2 bits, we now store them in the
> TCB. This enables users to write versioned code based on the HWCAP
> bits without going through the overhead of reading them from the
> auxiliary vector.
>
> A new API is published in ppc.h for get/set the bits in the
> aforementioned memory area (mainly for gcc to use to create
> builtins).

Do you have any justification (actual performance figures for a
real-world usage case) for adding ABI constraints like this? This is
not something that should be done lightly. My understanding is that
hwcap bits are normally used in initializing functions pointers (or
equivalent things like ifunc resolvers), not again and again at
runtime, so I'm having a hard time seeing how this could help even if
it does make the individual hwcap accesses measurably faster.

It would also be nice to see some justification for the magic number
offsets. Will they be stable under changes to the TCB structure or
will preserving them require tip-toeing around them?

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Rich Felker-2
In reply to this post by Steven Munroe-2
On Tue, Jun 09, 2015 at 11:01:24AM -0500, Steven Munroe wrote:

> On Tue, 2015-06-09 at 17:42 +0200, Ondřej Bílka wrote:
> > On Tue, Jun 09, 2015 at 10:06:33AM -0500, Steven Munroe wrote:
> > > On Tue, 2015-06-09 at 15:47 +0100, Szabolcs Nagy wrote:
> > > >
> > > > On 08/06/15 22:03, Carlos Eduardo Seo wrote:
> > > > > The proposed patch adds a new feature for powerpc. In order to get
> > > > > faster access to the HWCAP/HWCAP2 bits, we now store them in the TCB.
> > > > > This enables users to write versioned code based on the HWCAP bits
> > > > > without going through the overhead of reading them from the auxiliary
> > > > > vector.
> > >
> > > > i assume this is for multi-versioning.
> > >
> > > The intent is for the compiler to implement the equivalent of
> > > __builtin_cpu_supports("feature"). X86 has the cpuid instruction, POWER
> > > is RISC so we use the HWCAP. The trick to access the HWCAP[2]
> > > efficiently as getauxv and scanning the auxv is too slow for inline
> > > optimizations.
> > >
> > > > i dont see how the compiler can generate code to access the
> > > > hwcap bits currently (without making assumptions about libc
> > > > interfaces).
> > > >
> > > These offset will become a durable part the PowerPC 64-bit ELF V2 ABI.
> > >
> > > The TCB offsets are already fixed and can not change from release to
> > > release.
> > >
> > I don't have problem with this but why do you add tls, how can different
> > threads have different ones when kernel could move them between cores.
> >
> > So instead we just add to libc api following two variables below. These would
> > be initialized by linker as we will probably use them internally.
> >
> > extern int __hwcap, __hwcap2;
> >
> The Power ABI's address the TCB off a dedicated GPR (R2 or R13). This
> guarantees one instruction load from TCB.
>
> A Static variable would require a an indirect load via the TOC/GOT

I do not see this as a justification. There are a lot more pressing
things with respect to performance that could be micro-optimized by
adding TCB ABI for them, but it's not done because it's the wrong
solution.

> (which can be megabytes for a large program/library). I really really
> want the avoid that.

The size of the GOT is utterly irrelevant to the performance reading
an element from it, so I don't see why you brought this up.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Rich Felker-2
In reply to this post by Szabolcs Nagy-2
On Tue, Jun 09, 2015 at 04:48:10PM +0100, Szabolcs Nagy wrote:

> >> if hwcap is useful abi between compiler and libc
> >> then why is this done in a powerpc specific way?
> >
> > Other platform are free use this technique.
>
> i think this is not a sustainable approach for
> compiler abi extensions.
>
> (it means juggling with magic offsets on the order
> of compilers * libcs * targets).
>
> unfortunately accessing the ssp canary is already
> broken this way, i'm not sure what's a better abi,
> but it's probably worth thinking about one before
> the tcb code gets too messy.

For the canary I think it makes sense, even though it's ugly -- the
compiler has to generate a reference in every single function (for
'all' mode, or just most non-trivial functions in 'strong' mode).
That's much different from a feature (hwcap) that should only be used
at init-time and where, even if programmers did abuse it and use it
over and over at runtime, it's only going to be a small constant
overhead in a presumably medium to large sized function, and the cost
is only the need to setup the GOT register and load from the GOT,
anyway.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Steven Munroe-2
On Tue, 2015-06-09 at 12:50 -0400, Rich Felker wrote:

> On Tue, Jun 09, 2015 at 04:48:10PM +0100, Szabolcs Nagy wrote:
> > >> if hwcap is useful abi between compiler and libc
> > >> then why is this done in a powerpc specific way?
> > >
> > > Other platform are free use this technique.
> >
> > i think this is not a sustainable approach for
> > compiler abi extensions.
> >
> > (it means juggling with magic offsets on the order
> > of compilers * libcs * targets).
> >
> > unfortunately accessing the ssp canary is already
> > broken this way, i'm not sure what's a better abi,
> > but it's probably worth thinking about one before
> > the tcb code gets too messy.
>
> For the canary I think it makes sense, even though it's ugly -- the
> compiler has to generate a reference in every single function (for
> 'all' mode, or just most non-trivial functions in 'strong' mode).
> That's much different from a feature (hwcap) that should only be used
> at init-time and where, even if programmers did abuse it and use it
> over and over at runtime, it's only going to be a small constant
> overhead in a presumably medium to large sized function, and the cost
> is only the need to setup the GOT register and load from the GOT,
> anyway.

You are entitled to you own opinion but you are not accounting for the
aggressive out of order execution the POWER processors and specifics of
the PowerISA. In the time it take to load indirect via the TOC (4 cycles
minimum) compare/branch we could have executed 12-16 useful
instructions.

Any indirection exposes the sequences to hazards (like cache miss) which
only make things worse.

As stated before I have thought about this and understand the options in
the context of the PowerISA, POWER micro-architecture, and the PowerPC
ABIs. This information is publicly available (if a little hard to find)
but I doubt you have taken the time to study it in detail, if at all.

I suspect you base your opinion on other architectures and hardware
implementations that do not apply to this situation.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Rich Felker-2
On Tue, Jun 09, 2015 at 12:37:04PM -0500, Steven Munroe wrote:

> On Tue, 2015-06-09 at 12:50 -0400, Rich Felker wrote:
> > On Tue, Jun 09, 2015 at 04:48:10PM +0100, Szabolcs Nagy wrote:
> > > >> if hwcap is useful abi between compiler and libc
> > > >> then why is this done in a powerpc specific way?
> > > >
> > > > Other platform are free use this technique.
> > >
> > > i think this is not a sustainable approach for
> > > compiler abi extensions.
> > >
> > > (it means juggling with magic offsets on the order
> > > of compilers * libcs * targets).
> > >
> > > unfortunately accessing the ssp canary is already
> > > broken this way, i'm not sure what's a better abi,
> > > but it's probably worth thinking about one before
> > > the tcb code gets too messy.
> >
> > For the canary I think it makes sense, even though it's ugly -- the
> > compiler has to generate a reference in every single function (for
> > 'all' mode, or just most non-trivial functions in 'strong' mode).
> > That's much different from a feature (hwcap) that should only be used
> > at init-time and where, even if programmers did abuse it and use it
> > over and over at runtime, it's only going to be a small constant
> > overhead in a presumably medium to large sized function, and the cost
> > is only the need to setup the GOT register and load from the GOT,
> > anyway.
>
> You are entitled to you own opinion but you are not accounting for the
> aggressive out of order execution the POWER processors and specifics of
> the PowerISA. In the time it take to load indirect via the TOC (4 cycles
> minimum) compare/branch we could have executed 12-16 useful
> instructions.
>
> Any indirection exposes the sequences to hazards (like cache miss) which
> only make things worse.
>
> As stated before I have thought about this and understand the options in
> the context of the PowerISA, POWER micro-architecture, and the PowerPC
> ABIs. This information is publicly available (if a little hard to find)
> but I doubt you have taken the time to study it in detail, if at all.
>
> I suspect you base your opinion on other architectures and hardware
> implementations that do not apply to this situation.

That's nice but all theoretical. I've seen countless such theoretical
claims from people who are coming from a standpoint of the vendor
manuals for the ISA they're working with, and more often than not,
they don't translate into measurable benefits. (I've been guilty of
this myself too, going to great lengths to tweak x86 codegen or even
write the asm by hand, only to find the resulting code to run the
exact same speed.) Creating a permanent ABI is an extremely high cost,
and unless you can justify the cost with actual measurements and a
reason to believe those measurements have anything to do with
real-world usage needs, I believe it's an unjustified cost.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Adhemerval Zanella-2
In reply to this post by Rich Felker-2


On 09-06-2015 13:38, Rich Felker wrote:

> On Mon, Jun 08, 2015 at 06:03:16PM -0300, Carlos Eduardo Seo wrote:
>>
>> The proposed patch adds a new feature for powerpc. In order to get
>> faster access to the HWCAP/HWCAP2 bits, we now store them in the
>> TCB. This enables users to write versioned code based on the HWCAP
>> bits without going through the overhead of reading them from the
>> auxiliary vector.
>>
>> A new API is published in ppc.h for get/set the bits in the
>> aforementioned memory area (mainly for gcc to use to create
>> builtins).
>
> Do you have any justification (actual performance figures for a
> real-world usage case) for adding ABI constraints like this? This is
> not something that should be done lightly. My understanding is that
> hwcap bits are normally used in initializing functions pointers (or
> equivalent things like ifunc resolvers), not again and again at
> runtime, so I'm having a hard time seeing how this could help even if
> it does make the individual hwcap accesses measurably faster.

I believe the idea is to provide a fast way to emulate a functionality
similar to __builtin_cpu_supports for powerpc.  For x86, this builtin
will create 'cpuid' instruction, but since powerpc lacks a similar one
it should rely on hardware capability information provided by kernel.

And using TCB is the fastest way to provide such functionality. By
exporting the symbol as a normal variable (extern int hwcap), it will
require a R_PPC64_ADDR64 relocation plus two load accesses and some
arithmetic (TOC materialization and load plus the variable load)

>
> It would also be nice to see some justification for the magic number
> offsets. Will they be stable under changes to the TCB structure or
> will preserving them require tip-toeing around them?

It requires not change TCB fields over releases and adding newer on top
(to not change previous offset).  And it has been done for a while,
since the ssp canary.

>
> Rich
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] powerpc: New feature - HWCAP/HWCAP2 bits in the TCB

Florian Weimer-5
In reply to this post by Steven Munroe-2
On 06/09/2015 06:01 PM, Steven Munroe wrote:

> A Static variable would require a an indirect load via the TOC/GOT
> (which can be megabytes for a large program/library). I really really
> want the avoid that.

Could you encode the information in the address itself?  Then the
indirection goes away.

--
Florian Weimer / Red Hat Product Security
1234 ... 6