[PATCH] Speed up libm on MIPS

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Speed up libm on MIPS

Steve Ellcey -2
This patch defines various inline routines and macros used by the math
library in order to speed up libm on MIPS.  It does not affect
soft-float builds but for hard-float builds 'make bench' shows a
speed-up.  With an o32 little-endian glibc, sin() went from 27792.6
iter/s to 31293.6 iter/s.  On n32 it went from 32955.2 to 36179.7 and on
n64 from 33074.7 to 36242. exp() went from 45742.4 to 56511.2 on o32 and
pow() went from 19008.8 to 20508.7.  I have attached the original and
new bench.out files for o32, n32, and n64 ABIs in case you want to see
more of the data.  These are all little-endian hard-float runs.

I ran 'make check' and 'make bench' using the o32, n32, and n64 ABIs
with big and little endian and with hard and soft float to verify there
were no failures.  I did run into an unrelated problem that is being
fixed (https://sourceware.org/ml/libc-alpha/2013-09/msg00601.html) but
there were no other failures except the expected ones for MIPS.

OK for checkin?

Steve Ellcey
[hidden email]


2013-09-18  Steve Ellcey  <[hidden email]>

        * sysdeps/mips/math_private.h (libc_feholdexcept_mips): New function.
        (libc_feholdexcept): New macro.
        (libc_feholdexceptf): New macro.
        (libc_feholdexceptl): New macro.
        (libc_fesetround_mips): New function.
        (libc_fesetround): New macro.
        (libc_fesetroundf): New macro.
        (libc_fesetroundl): New macro.
        (libc_feholdexcept_setround_mips): New function.
        (libc_feholdexcept_setround): New macro.
        (libc_feholdexcept_setroundf): New macro.
        (libc_feholdexcept_setroundl): New macro.
        (libc_fesetenv_mips): New function.
        (libc_fesetenv): New macro.
        (libc_fesetenvf): New macro.
        (libc_fesetenvl): New macro.
        (libc_feupdateenv_mips): New function.
        (libc_feupdateenv): New macro.
        (libc_feupdateenvf): New macro.
        (libc_feupdateenvl): New macro.


mips-libm.patch (3K) Download Attachment
bench.out.n32.new (1K) Download Attachment
bench.out.n32.orig (1K) Download Attachment
bench.out.n64.new (1K) Download Attachment
bench.out.n64.orig (1K) Download Attachment
bench.out.o32.new (1K) Download Attachment
bench.out.o32.orig (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Carlos O'Donell-6
On 09/19/2013 06:56 PM, Steve Ellcey wrote:

> This patch defines various inline routines and macros used by the math
> library in order to speed up libm on MIPS.  It does not affect
> soft-float builds but for hard-float builds 'make bench' shows a
> speed-up.  With an o32 little-endian glibc, sin() went from 27792.6
> iter/s to 31293.6 iter/s.  On n32 it went from 32955.2 to 36179.7 and on
> n64 from 33074.7 to 36242. exp() went from 45742.4 to 56511.2 on o32 and
> pow() went from 19008.8 to 20508.7.  I have attached the original and
> new bench.out files for o32, n32, and n64 ABIs in case you want to see
> more of the data.  These are all little-endian hard-float runs.
>
> I ran 'make check' and 'make bench' using the o32, n32, and n64 ABIs
> with big and little endian and with hard and soft float to verify there
> were no failures.  I did run into an unrelated problem that is being
> fixed (https://sourceware.org/ml/libc-alpha/2013-09/msg00601.html) but
> there were no other failures except the expected ones for MIPS.

Thank you for running `make bench' and posting the results. I appreciate
you going out of your way to make the measurements and post them.

> OK for checkin?

Does MIPS have a slow fpu save/restore?

Does using HAVE_RM_CTX speed anything up for you?

For example see 2506109403de69bd454de27835d42e6eb6ec3abc

Two nits below.

> Steve Ellcey
> [hidden email]
>
>
> 2013-09-18  Steve Ellcey  <[hidden email]>
>
> * sysdeps/mips/math_private.h (libc_feholdexcept_mips): New function.
> (libc_feholdexcept): New macro.
> (libc_feholdexceptf): New macro.
> (libc_feholdexceptl): New macro.
> (libc_fesetround_mips): New function.
> (libc_fesetround): New macro.
> (libc_fesetroundf): New macro.
> (libc_fesetroundl): New macro.
> (libc_feholdexcept_setround_mips): New function.
> (libc_feholdexcept_setround): New macro.
> (libc_feholdexcept_setroundf): New macro.
> (libc_feholdexcept_setroundl): New macro.
> (libc_fesetenv_mips): New function.
> (libc_fesetenv): New macro.
> (libc_fesetenvf): New macro.
> (libc_fesetenvl): New macro.
> (libc_feupdateenv_mips): New function.
> (libc_feupdateenv): New macro.
> (libc_feupdateenvf): New macro.
> (libc_feupdateenvl): New macro.
>
>
> mips-libm.patch
>
>
> diff --git a/ports/sysdeps/mips/math_private.h b/ports/sysdeps/mips/math_private.h
> index 6b99957..0ac18fd 100644
> --- a/ports/sysdeps/mips/math_private.h
> +++ b/ports/sysdeps/mips/math_private.h
> @@ -26,6 +26,119 @@
>  # define HIGH_ORDER_BIT_IS_SET_FOR_SNAN
>  #endif
>  
> +/* Inline functions to speed up the math library implementation.  The
> +   default versions of these routines are in generic/math_private.h
> +   and call fesetround, feholdexcept, etc.  These routines use inlined
> +   code instead.  */
> +
> +#ifdef __mips_hard_float
> +
> +#include <fenv.h>
> +#include <fenv_libc.h>
> +#include <fpu_control.h>
> +
> +static __always_inline void
> +libc_feholdexcept_mips (fenv_t *envp)
> +{
> +  fpu_control_t cw;
> +
> +  /* Save the current state.  */
> +  _FPU_GETCW (cw);
> +  envp->__fp_control_register = cw;
> +
> +  /* Clear all exception enable bits and flags.  */
> +  cw &= ~(_FPU_MASK_V|_FPU_MASK_Z|_FPU_MASK_O|_FPU_MASK_U|_FPU_MASK_I|FE_ALL_EXCEPT);
> +  _FPU_SETCW (cw);
> +}
> +#define libc_feholdexcept libc_feholdexcept_mips
> +#define libc_feholdexceptf libc_feholdexcept_mips
> +#define libc_feholdexceptl libc_feholdexcept_mips
> +
> +static __always_inline void
> +libc_fesetround_mips (int round)
> +{
> +  fpu_control_t cw;
> +
> +  /* Get current state.  */
> +  _FPU_GETCW (cw);
> +
> +  /* Set rounding bits.  */
> +  cw &= ~0x3;

What's the magic ~0x3? Should it be a new macro?

> +  cw |= round;
> +
> +  /* Set new state.  */
> +  _FPU_SETCW (cw);
> +}
> +#define libc_fesetround libc_fesetround_mips
> +#define libc_fesetroundf libc_fesetround_mips
> +#define libc_fesetroundl libc_fesetround_mips
> +
> +static __always_inline void
> +libc_feholdexcept_setround_mips (fenv_t *envp, int round)
> +{
> +  fpu_control_t cw;
> +
> +  /* Save the current state.  */
> +  _FPU_GETCW (cw);
> +  envp->__fp_control_register = cw;
> +
> +  /* Clear all exception enable bits and flags.  */
> +  cw &= ~(_FPU_MASK_V|_FPU_MASK_Z|_FPU_MASK_O|_FPU_MASK_U|_FPU_MASK_I|FE_ALL_EXCEPT);
> +
> +  /* Set rounding bits.  */
> +  cw &= ~0x3;

Likewise?

> +  cw |= round;
> +
> +  /* Set new state.  */
> +  _FPU_SETCW (cw);
> +}
> +#define libc_feholdexcept_setround libc_feholdexcept_setround_mips
> +#define libc_feholdexcept_setroundf libc_feholdexcept_setround_mips
> +#define libc_feholdexcept_setroundl libc_feholdexcept_setround_mips
> +
> +static __always_inline void
> +libc_fesetenv_mips (fenv_t *envp)
> +{
> +  fpu_control_t cw;
> +
> +  /* Read first current state to flush fpu pipeline.  */
> +  _FPU_GETCW (cw);
> +
> +  if (envp == FE_DFL_ENV)
> +    _FPU_SETCW (_FPU_DEFAULT);
> +  else if (envp == FE_NOMASK_ENV)
> +    _FPU_SETCW (_FPU_IEEE);
> +  else
> +    _FPU_SETCW (envp->__fp_control_register);
> +}
> +#define libc_fesetenv libc_fesetenv_mips
> +#define libc_fesetenvf libc_fesetenv_mips
> +#define libc_fesetenvl libc_fesetenv_mips
> +
> +static __always_inline void
> +libc_feupdateenv_mips (fenv_t *envp)
> +{
> +  int temp;
> +
> +  /* Save current exceptions.  */
> +  _FPU_GETCW (temp);
> +
> +  /* Set flag bits (which are accumulative), and *also* set the
> +     cause bits. The setting of the cause bits is what actually causes

Two spaces after period.

> +     the hardware to generate the exception, if the corresponding enable
> +     bit is set as well.  */
> +  temp &= FE_ALL_EXCEPT;
> +  temp |= envp->__fp_control_register | (temp << CAUSE_SHIFT);
> +
> +  /* Set new state.  */
> +  _FPU_SETCW (temp);
> +}
> +#define libc_feupdateenv libc_feupdateenv_mips
> +#define libc_feupdateenvf libc_feupdateenv_mips
> +#define libc_feupdateenvl libc_feupdateenv_mips
> +
> +#endif
> +
>  #include_next <math_private.h>
>  
>  #endif

Otherwise looks good to me.

Cheers,
Carlos.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Richard Henderson
In reply to this post by Steve Ellcey -2
On 09/19/2013 03:56 PM, Steve Ellcey wrote:

> +libc_fesetenv_mips (fenv_t *envp)
> +{
> +  fpu_control_t cw;
> +
> +  /* Read first current state to flush fpu pipeline.  */
> +  _FPU_GETCW (cw);
> +
> +  if (envp == FE_DFL_ENV)
> +    _FPU_SETCW (_FPU_DEFAULT);
> +  else if (envp == FE_NOMASK_ENV)
> +    _FPU_SETCW (_FPU_IEEE);
> +  else
> +    _FPU_SETCW (envp->__fp_control_register);
> +}

You shouldn't need the two default env checks, since this private
interface will always be used in pairs.


r~
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Joseph Myers
In reply to this post by Steve Ellcey -2
On Thu, 19 Sep 2013, Steve Ellcey wrote:

> This patch defines various inline routines and macros used by the math
> library in order to speed up libm on MIPS.  It does not affect

There are at least two other obvious opportunities to speed up MIPS libm
(though I don't know if the benchmarks cover them at present).  The
hardware sqrt instruction isn't being used - bug 15632 - and
sysdeps/ieee754/dbl-64/wordsize-64 should be used for n64 (all functions)
and n32 (all functions except for those where the aliasing of "long" and
"long long" functions would be wrong for n32 - llround and lround - see
the wiki todo list).

> diff --git a/ports/sysdeps/mips/math_private.h b/ports/sysdeps/mips/math_private.h
> index 6b99957..0ac18fd 100644
> --- a/ports/sysdeps/mips/math_private.h
> +++ b/ports/sysdeps/mips/math_private.h
> @@ -26,6 +26,119 @@
>  # define HIGH_ORDER_BIT_IS_SET_FOR_SNAN
>  #endif
>  
> +/* Inline functions to speed up the math library implementation.  The
> +   default versions of these routines are in generic/math_private.h
> +   and call fesetround, feholdexcept, etc.  These routines use inlined
> +   code instead.  */
> +
> +#ifdef __mips_hard_float
> +
> +#include <fenv.h>

Use "# include" etc. indentation inside this #ifdef.

OK with that fixed and the other fixes other people have noted.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Steve Ellcey -2
In reply to this post by Carlos O'Donell-6
On Thu, 2013-09-19 at 23:32 -0400, Carlos O'Donell wrote:

> > +
> > +  /* Set rounding bits.  */
> > +  cw &= ~0x3;
>
> What's the magic ~0x3? Should it be a new macro?

0x3 is a mask to access the two bits in the FPU control register that
define the rounding mode.  It probably should be a macro and it seems
like it should go into fpu_control.h where it could be used by the
'real' fegetround and fesetround as well these new routines.  What do
you think of this patch as a precursor to my original patch so that I
can change all the code to use _FPU_RC_MASK instead of 0x3?

Steve Ellcey
[hidden email]



2013-09-19  Steve Ellcey  <[hidden email]>

        * sysdeps/mips/fpu_control.h (_FPU_RC_MASK): New.
        * sysdeps/mips/fpu/fegetround.c (fegetround): Use _FPU_RC_MASK.
        * sysdeps/mips/fpu/fesetround.c (fesetround): Use _FPU_RC_MASK.



diff --git a/ports/sysdeps/mips/fpu/fegetround.c b/ports/sysdeps/mips/fpu/fegetround.c
index 61217a7..17cd3e9 100644
--- a/ports/sysdeps/mips/fpu/fegetround.c
+++ b/ports/sysdeps/mips/fpu/fegetround.c
@@ -28,5 +28,5 @@ fegetround (void)
   /* Get control word.  */
   _FPU_GETCW (cw);
 
-  return cw & 0x3;
+  return cw & _FPU_RC_MASK;
 }
diff --git a/ports/sysdeps/mips/fpu/fesetround.c b/ports/sysdeps/mips/fpu/fesetround.c
index 7c25f43..c6fdd66 100644
--- a/ports/sysdeps/mips/fpu/fesetround.c
+++ b/ports/sysdeps/mips/fpu/fesetround.c
@@ -25,7 +25,7 @@ fesetround (int round)
 {
   fpu_control_t cw;
 
-  if ((round & ~0x3) != 0)
+  if ((round & ~_FPU_RC_MASK) != 0)
     /* ROUND is no valid rounding mode.  */
     return 1;
 
@@ -33,7 +33,7 @@ fesetround (int round)
   _FPU_GETCW (cw);
 
   /* Set rounding bits.  */
-  cw &= ~0x3;
+  cw &= ~_FPU_RC_MASK;
   cw |= round;
   /* Set new state.  */
   _FPU_SETCW (cw);
diff --git a/ports/sysdeps/mips/fpu_control.h b/ports/sysdeps/mips/fpu_control.h
index 4046962..f26b736 100644
--- a/ports/sysdeps/mips/fpu_control.h
+++ b/ports/sysdeps/mips/fpu_control.h
@@ -90,6 +90,8 @@ extern fpu_control_t __fpu_control;
 #define _FPU_RC_ZERO    0x1
 #define _FPU_RC_UP      0x2
 #define _FPU_RC_DOWN    0x3
+/* mask for rounding control */
+#define _FPU_RC_MASK    0x3
 
 #define _FPU_RESERVED 0xfe840000  /* Reserved bits in cw, incl NAN2008.  */
 



Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Carlos O'Donell-6
On 09/20/2013 12:51 PM, Steve Ellcey wrote:

> On Thu, 2013-09-19 at 23:32 -0400, Carlos O'Donell wrote:
>
>>> +
>>> +  /* Set rounding bits.  */
>>> +  cw &= ~0x3;
>>
>> What's the magic ~0x3? Should it be a new macro?
>
> 0x3 is a mask to access the two bits in the FPU control register that
> define the rounding mode.  It probably should be a macro and it seems
> like it should go into fpu_control.h where it could be used by the
> 'real' fegetround and fesetround as well these new routines.  What do
> you think of this patch as a precursor to my original patch so that I
> can change all the code to use _FPU_RC_MASK instead of 0x3?

Perfect. Exactly the kind of thing that makes it easier to read,
understand, and maintain this code.
 

> Steve Ellcey
> [hidden email]
>
>
>
> 2013-09-19  Steve Ellcey  <[hidden email]>
>
> * sysdeps/mips/fpu_control.h (_FPU_RC_MASK): New.
> * sysdeps/mips/fpu/fegetround.c (fegetround): Use _FPU_RC_MASK.
> * sysdeps/mips/fpu/fesetround.c (fesetround): Use _FPU_RC_MASK.

Looks good to me. You should just check this in as an obvious cleanup.

Cheers,
Carlos.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Maciej W. Rozycki-4
On Fri, 20 Sep 2013, Carlos O'Donell wrote:

> > 2013-09-19  Steve Ellcey  <[hidden email]>
> >
> > * sysdeps/mips/fpu_control.h (_FPU_RC_MASK): New.
> > * sysdeps/mips/fpu/fegetround.c (fegetround): Use _FPU_RC_MASK.
> > * sysdeps/mips/fpu/fesetround.c (fesetround): Use _FPU_RC_MASK.
>
> Looks good to me. You should just check this in as an obvious cleanup.

 Except with a small formatting fix to comply with the GNU Coding
Standards:

On Fri, 20 Sep 2013, Steve Ellcey wrote:

> diff --git a/ports/sysdeps/mips/fpu_control.h b/ports/sysdeps/mips/fpu_control.h
> index 4046962..f26b736 100644
> --- a/ports/sysdeps/mips/fpu_control.h
> +++ b/ports/sysdeps/mips/fpu_control.h
> @@ -90,6 +90,8 @@ extern fpu_control_t __fpu_control;
>  #define _FPU_RC_ZERO    0x1
>  #define _FPU_RC_UP      0x2
>  #define _FPU_RC_DOWN    0x3
> +/* mask for rounding control */
> +#define _FPU_RC_MASK    0x3

-- the comment is expected to say:

/* Mask for rounding control.  */

i.e. start with a capital letter, end with a full stop and put two spaces
exactly after the full stop.

 According to the GNU Coding Standards all comments have to be proper
sentences with two spaces following every punctuation mark ending a
sentence.  For further details please refer to:

http://www.gnu.org/prep/standards/standards.html#Comments

  Maciej
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Carlos O'Donell-6
On 09/21/2013 02:47 PM, Maciej W. Rozycki wrote:

> On Fri, 20 Sep 2013, Carlos O'Donell wrote:
>
>>> 2013-09-19  Steve Ellcey  <[hidden email]>
>>>
>>> * sysdeps/mips/fpu_control.h (_FPU_RC_MASK): New.
>>> * sysdeps/mips/fpu/fegetround.c (fegetround): Use _FPU_RC_MASK.
>>> * sysdeps/mips/fpu/fesetround.c (fesetround): Use _FPU_RC_MASK.
>>
>> Looks good to me. You should just check this in as an obvious cleanup.
>
>  Except with a small formatting fix to comply with the GNU Coding
> http://www.gnu.org/prep/standards/standards.html#Comments

That's right, thanks for catching that :-)

Cheers,
Carlos.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Speed up libm on MIPS

Steve Ellcey -2
On Sun, 2013-09-22 at 13:40 -0400, Carlos O'Donell wrote:

> On 09/21/2013 02:47 PM, Maciej W. Rozycki wrote:
> > On Fri, 20 Sep 2013, Carlos O'Donell wrote:
> >
> >>> 2013-09-19  Steve Ellcey  <[hidden email]>
> >>>
> >>> * sysdeps/mips/fpu_control.h (_FPU_RC_MASK): New.
> >>> * sysdeps/mips/fpu/fegetround.c (fegetround): Use _FPU_RC_MASK.
> >>> * sysdeps/mips/fpu/fesetround.c (fesetround): Use _FPU_RC_MASK.
> >>
> >> Looks good to me. You should just check this in as an obvious cleanup.
> >
> >  Except with a small formatting fix to comply with the GNU Coding
> > http://www.gnu.org/prep/standards/standards.html#Comments
>
> That's right, thanks for catching that :-)
>
> Cheers,
> Carlos.

I was originally trying to match the existing comments, but I went ahead
and did a separate patch to make the existing comments more standard and
then checked in this patch after updating its comment to match the
standard.

Steve Ellcey
[hidden email]