[PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.

Adhemerval Zanella-2


On 05/11/2019 12:49, Stefan Liebler wrote:

> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> +  double y;
>>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> +  return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196.  I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>>       since I hardly doubt these micro-optimizations really pay off with
>>       recent architectures and compiler version.
>>
>>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>>             [...]
>>       double
>>       __nearbyint (double x)
>>       {
>>       #if __USE_NEARBYINT_BUILTIN
>>         return __builtin_nearbyint (x);
>>       #else
>>         /* Use generic implementation.  */
>>       #endif
>>       }
>>
>>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
>
> Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc.
>
> E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop.  I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used.  I have never build glibc with other compilers like clang.  Is there a special need to check this behavior?

I think google maintains some branches with clang support (google/grte/*),
but there is no know effort to sync these with master.  So I see there is
no need to focus on non-gcc compiler for now.

>
> In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series.
> This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign.
>
> Afterwards the same can be done for the remaining functions.
>
> I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h.
> The generic version contains all USE_XYZ_BUILTIN macros defined to 0
> and each architecture can provide its own file with other settings.
> For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
> How about this?
>

I think it is fair start, with the adjustments pointed out by Joseph.
I will check out the worksize-64 refactor to avoid duplicate the
implementations.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.

Stefan Liebler-2
In reply to this post by Adhemerval Zanella-2
On 11/4/19 7:22 PM, Adhemerval Zanella wrote:

>
>
> On 04/11/2019 12:27, Stefan Liebler wrote:
>> If compiled with z196 zarch support, the load-fp-integer instruction
>> is used to implement nearbyint, nearbyintf, nearbyintl.
>> Otherwise the common-code implementation is used.
>
>> +
>> +double
>> +__nearbyint (double x)
>> +{
>> +  double y;
>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>> +  return y;
>> +}
>> +libm_alias_double (__nearbyint, nearbyint)
>
> At least with recent gcc __builtin_nearbyint generates the expected fidbra
> instruction for -march=z196.  I wonder if we could start to simplify some
> math symbols implementation where new architectures/extensions provide
> direct implementation by a direct mapping implemented by compiler builtins.
>
> I would expect to:
>
>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>       since I hardly doubt these micro-optimizations really pay off with
>       recent architectures and compiler version.
>
>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>
>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>      
>       [...]
>       double
>       __nearbyint (double x)
>       {
>       #if __USE_NEARBYINT_BUILTIN
>         return __builtin_nearbyint (x);
>       #else
>         /* Use generic implementation.  */
>       #endif
>       }
>
>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>
> It would allow to simplify some architectures, aarch64 for instance.
>
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 03/17] S390: Use load-fp-integer instruction for floor functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 14/17] S390: Use libc_fe* macros in fe* functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 04/17] S390: Use load-fp-integer instruction for ceil functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 10/17] S390: Use convert-to-fixed instruction for lround functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 15/17] S390: Implement math-barriers math_opt_barrier and math_force_eval.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 02/17] S390: Use load-fp-integer instruction for rint functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 17/17] S390: Use sysdeps/ieee754/dbl-64/wordsize-64 on s390x.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 12/17] S390: Use copy-sign instruction for copysign functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 16/17] S390: Implement roundtoint and converttoint and define TOINT_INTRINSICS.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 11/17] S390: Use convert-to-fixed instruction for llround functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 09/17] S390: Use convert-to-fixed instruction for llrint functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 05/17] S390: Use load-fp-integer instruction for trunc functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 08/17] S390: Use convert-to-fixed instruction for lrint functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 13/17] S390: Implement libc_fe* macros.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 06/17] S390: Use load-fp-integer instruction for round functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
This patch is superseded by the patch-series which is always using
wordsize-64 version and allows to use the GCC builtins in common-code
implementation:
"[PATCH 00/13] Use GCC builtins for some math functions if desired."
https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 07/17] S390: Use load-fp-integer instruction for roundeven functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
PING.
This patch is still needed. If there are no further comments, I'll
commit this s390-only patch in a few days.

Bye,
Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.

Adhemerval Zanella-2
In reply to this post by Stefan Liebler-2


On 02/12/2019 11:56, Stefan Liebler wrote:

> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> +  double y;
>>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> +  return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196.  I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>>       since I hardly doubt these micro-optimizations really pay off with
>>       recent architectures and compiler version.
>>
>>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>>             [...]
>>       double
>>       __nearbyint (double x)
>>       {
>>       #if __USE_NEARBYINT_BUILTIN
>>         return __builtin_nearbyint (x);
>>       #else
>>         /* Use generic implementation.  */
>>       #endif
>>       }
>>
>>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
> This patch is superseded by the patch-series which is always using wordsize-64 version and allows to use the GCC builtins in common-code implementation:
> "[PATCH 00/13] Use GCC builtins for some math functions if desired."
> https://www.sourceware.org/ml/libc-alpha/2019-12/msg00029.html
>
> Bye,
> Stefan
>

Thanks, I will review these set.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 07/17] S390: Use load-fp-integer instruction for roundeven functions.

Stefan Liebler-2
In reply to this post by Stefan Liebler-2
On 12/2/19 4:04 PM, Stefan Liebler wrote:
> PING.
> This patch is still needed. If there are no further comments, I'll
> commit this s390-only patch in a few days.
>
> Bye,
> Stefan
>
as information, I've just committed the remaining s390-only patches.

Bye,
Stefan

12