[PATCH 0/4] various FPSCR optimizations

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/4] various FPSCR optimizations

Paul A. Clarke
From: "Paul A. Clarke" <[hidden email]>

Patches 1 & 2 are resends with changes, addressing comments from Paul Murphy.
Patches 3 & 4 are new.

Paul A. Clarke (4):
  [powerpc] fe{en,dis}ableexcept, fesetmode: optimize FPSCR accesses
  [powerpc] SET_RESTORE_ROUND improvements
  [powerpc] fesetenv: optimize FPSCR access
  [powerpc] fegetenv_status: simplify instruction generation

 sysdeps/powerpc/fpu/fedisblxcpt.c  | 14 ++++++++------
 sysdeps/powerpc/fpu/feenablxcpt.c  | 15 ++++++++-------
 sysdeps/powerpc/fpu/fenv_libc.h    | 24 ++++++++++--------------
 sysdeps/powerpc/fpu/fenv_private.h | 38 ++++++++++++++++++++++++++++++++++++--
 sysdeps/powerpc/fpu/fesetenv.c     | 12 ++++--------
 sysdeps/powerpc/fpu/fesetmode.c    | 15 +++++----------
 6 files changed, 71 insertions(+), 47 deletions(-)

--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/4] [powerpc] fe{en,dis}ableexcept, fesetmode: optimize FPSCR accesses

Paul A. Clarke
From: "Paul A. Clarke" <[hidden email]>

Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR.  The net is better performance.

In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR.  This local variable can be used instead of again reading
the FPSCR.

Also, that value of the FPSCR which is read the second time is validated
against the requested enables.  Since the write can't fail, this validation
step is unnecessary, and can be removed.  Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set.  For example, FE_INVALID_SQRT can be
requested, but cannot actually be set.  This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.

Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.

2019-08-20  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): New.
        (FPSCR_FPRF_MASK): New. (FPSCR_STATUS_MASK): New.
        * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Use lighter-
        weight access to FPSCR; remove unnecessary second FPSCR read and
        validate.
        * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Likewise.
        * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Use lighter-weight
        access to FPSCR; Use macros in fenv_libc.h in favor of local.
---
v2:
- Address issue raised by Paul Murphy.  If the specified set of exceptions
  cannot be enabled (or disabled), then the function will return failure.
- The current version of the code will enable (or disable) what it can
  _and_ return failure.  This version will just return failure.

 sysdeps/powerpc/fpu/fedisblxcpt.c | 14 ++++++++------
 sysdeps/powerpc/fpu/feenablxcpt.c | 15 ++++++++-------
 sysdeps/powerpc/fpu/fenv_libc.h   | 10 +++++++++-
 sysdeps/powerpc/fpu/fesetmode.c   | 15 +++++----------
 4 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c
index 5cc8799..a2b7add 100644
--- a/sysdeps/powerpc/fpu/fedisblxcpt.c
+++ b/sysdeps/powerpc/fpu/fedisblxcpt.c
@@ -26,23 +26,25 @@ fedisableexcept (int excepts)
   int result, new;
 
   /* Get current exception mask to return.  */
-  fe.fenv = curr.fenv = fegetenv_register ();
+  fe.fenv = curr.fenv = fegetenv_status ();
   result = fenv_reg_to_exceptions (fe.l);
 
   if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID)
     excepts = (excepts | FE_INVALID) & ~ FE_ALL_INVALID;
 
+  new = fenv_exceptions_to_reg (excepts);
+
+  if (fenv_reg_to_exceptions (new) != excepts)
+    return -1;
+
   /* Sets the new exception mask.  */
-  fe.l &= ~ fenv_exceptions_to_reg (excepts);
+  fe.l &= ~new;
 
   if (fe.l != curr.l)
-    fesetenv_register (fe.fenv);
+    fesetenv_mode (fe.fenv);
 
-  new = __fegetexcept ();
   if (new == 0 && result != 0)
     (void)__fe_mask_env ();
 
-  if ((new & excepts) != 0)
-    result = -1;
   return result;
 }
diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c
index 3b64398..c06a7fd 100644
--- a/sysdeps/powerpc/fpu/feenablxcpt.c
+++ b/sysdeps/powerpc/fpu/feenablxcpt.c
@@ -26,24 +26,25 @@ feenableexcept (int excepts)
   int result, new;
 
   /* Get current exception mask to return.  */
-  fe.fenv = curr.fenv = fegetenv_register ();
+  fe.fenv = curr.fenv = fegetenv_status ();
   result = fenv_reg_to_exceptions (fe.l);
 
   if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID)
     excepts = (excepts | FE_INVALID) & ~ FE_ALL_INVALID;
 
+  new = fenv_exceptions_to_reg (excepts);
+
+  if (fenv_reg_to_exceptions (new) != excepts)
+    return -1;
+
   /* Sets the new exception mask.  */
-  fe.l |= fenv_exceptions_to_reg (excepts);
+  fe.l |= new;
 
   if (fe.l != curr.l)
-    fesetenv_register (fe.fenv);
+    fesetenv_mode (fe.fenv);
 
-  new = __fegetexcept ();
   if (new != 0 && result == 0)
     (void) __fe_nomask_env_priv ();
 
-  if ((new & excepts) != excepts)
-    result = -1;
-
   return result;
 }
diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 853239f..8ba4832 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -70,6 +70,11 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
     __builtin_mtfsf (0xff, d); \
  } while(0)
 
+/* Set the last 2 nibbles of the FPSCR, which contain the
+   exception enables and the rounding mode.
+   'fegetenv_status' retrieves these bits by reading the FPSCR.  */
+#define fesetenv_mode(env) __builtin_mtfsf (0b00000011, (env));
+
 /* This very handy macro:
    - Sets the rounding mode to 'round to nearest';
    - Sets the processor into IEEE mode; and
@@ -206,8 +211,11 @@ enum {
   (FPSCR_VE_MASK|FPSCR_OE_MASK|FPSCR_UE_MASK|FPSCR_ZE_MASK|FPSCR_XE_MASK)
 #define FPSCR_BASIC_EXCEPTIONS_MASK \
   (FPSCR_VX_MASK|FPSCR_OX_MASK|FPSCR_UX_MASK|FPSCR_ZX_MASK|FPSCR_XX_MASK)
-
+#define FPSCR_FPRF_MASK \
+  (FPSCR_FPRF_C_MASK|FPSCR_FPRF_FL_MASK|FPSCR_FPRF_FG_MASK| \
+   FPSCR_FPRF_FE_MASK|FPSCR_FPRF_FU_MASK)
 #define FPSCR_CONTROL_MASK (FPSCR_ENABLES_MASK|FPSCR_NI_MASK|FPSCR_RN_MASK)
+#define FPSCR_STATUS_MASK (FPSCR_FR_MASK|FPSCR_FI_MASK|FPSCR_FPRF_MASK)
 
 /* The bits in the FENV(1) ABI for exceptions correspond one-to-one with bits
    in the FPSCR, albeit shifted to different but corresponding locations.
diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c
index 4f4f71a..e92559b 100644
--- a/sysdeps/powerpc/fpu/fesetmode.c
+++ b/sysdeps/powerpc/fpu/fesetmode.c
@@ -19,11 +19,6 @@
 #include <fenv_libc.h>
 #include <fpu_control.h>
 
-#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM \
-       | _FPU_MASK_XM | _FPU_MASK_IM)
-
-#define FPU_STATUS 0xbffff700ULL
-
 int
 fesetmode (const femode_t *modep)
 {
@@ -32,18 +27,18 @@ fesetmode (const femode_t *modep)
   /* Logic regarding enabled exceptions as in fesetenv.  */
 
   new.fenv = *modep;
-  old.fenv = fegetenv_register ();
-  new.l = (new.l & ~FPU_STATUS) | (old.l & FPU_STATUS);
+  old.fenv = fegetenv_status ();
+  new.l = (new.l & ~FPSCR_STATUS_MASK) | (old.l & FPSCR_STATUS_MASK);
 
   if (old.l == new.l)
     return 0;
 
-  if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
+  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
     (void) __fe_nomask_env_priv ();
 
-  if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
+  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
     (void) __fe_mask_env ();
 
-  fesetenv_register (new.fenv);
+  fesetenv_mode (new.fenv);
   return 0;
 }
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 2/4] [powerpc] SET_RESTORE_ROUND improvements

Paul A. Clarke
In reply to this post by Paul A. Clarke
From: "Paul A. Clarke" <[hidden email]>

SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.

For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
   a. If transitioning from a state where some exceptions enabled,
      enter "ignore exceptions / non-stop" mode.
   b. Write new value to FPSCR.
   c. Put a mark on the wall indicating the FPSCR was changed.

(1) uses the 'mffs' instruction.  On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.

For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
   - Summary bits and exception indicators = current OR saved.
   - Rounding mode and enables = saved.
   - Status bits = current.
4. If transitioning from some exceptions enabled to none,
   enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
   enter "catch exceptions" mode.
6. Write new value to FPSCR.

The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them.  This is certainly not expected.

So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR.  Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.

We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.

2019-08-20  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
        Utilize lightweight FPSCR read if possible, set fewer FPSCR bits
        if possible.
        (libc_feresetround_ppc):  Replace call to __libc_femergeenv_ppc
        with simpler required steps, set fewer FPSCR bits if possible.
        (libc_feresetround_noex_ppc_ctx):  New.
        (libc_feresetround_noex_ctx):  New.
        (libc_feresetround_noexf_ctx):  New.
        (libc_feresetround_noexl_ctx):  New.
---
v2:
- Address issue raised by Paul Murphy.  The first version of the patch was
  broken with respect to the "no exceptions" (NOEX) versions of the macros.

 sysdeps/powerpc/fpu/fenv_private.h | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
index 8c126f9..5ebe6cd 100644
--- a/sysdeps/powerpc/fpu/fenv_private.h
+++ b/sysdeps/powerpc/fpu/fenv_private.h
@@ -132,7 +132,17 @@ libc_fesetenv_ppc (const fenv_t *envp)
 static __always_inline void
 libc_feresetround_ppc (fenv_t *envp)
 {
-  __libc_femergeenv_ppc (envp, _FPU_MASK_TRAPS_RN, _FPU_MASK_FRAC_INEX_RET_CC);
+  fenv_union_t new = { .fenv = *envp };
+
+  /* If the old env has no enabled exceptions and the new env has any enabled
+     exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
+     hardware into "precise mode" and may cause the FPU to run slower on some
+     hardware.  */
+  if ((new.l & _FPU_ALL_TRAPS) != 0)
+    (void) __fe_nomask_env_priv ();
+
+  /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
+  fesetenv_mode (new.fenv);
 }
 
 static __always_inline int
@@ -176,9 +186,30 @@ libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
 {
   fenv_union_t old, new;
 
+  old.fenv = fegetenv_status ();
+
+  new.l = (old.l & ~(FPSCR_ENABLES_MASK|FPSCR_RN_MASK)) | r;
+
+  ctx->env = old.fenv;
+  if (__glibc_unlikely (new.l != old.l))
+    {
+      if ((old.l & _FPU_ALL_TRAPS) != 0)
+ (void) __fe_mask_env ();
+      fesetenv_mode (new.fenv);
+      ctx->updated_status = true;
+    }
+  else
+    ctx->updated_status = false;
+}
+
+static __always_inline void
+libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
+{
+  fenv_union_t old, new;
+
   old.fenv = fegetenv_register ();
 
-  new.l = (old.l & _FPU_MASK_TRAPS_RN) | r;
+  new.l = (old.l & ~(FPSCR_ENABLES_MASK|FPSCR_RN_MASK)) | r;
 
   ctx->env = old.fenv;
   if (__glibc_unlikely (new.l != old.l))
@@ -218,6 +249,9 @@ libc_feresetround_ppc_ctx (struct rm_ctx *ctx)
 #define libc_feholdsetround_ctx          libc_feholdsetround_ppc_ctx
 #define libc_feholdsetroundf_ctx         libc_feholdsetround_ppc_ctx
 #define libc_feholdsetroundl_ctx         libc_feholdsetround_ppc_ctx
+#define libc_feholdsetround_noex_ctx     libc_feholdsetround_noex_ppc_ctx
+#define libc_feholdsetround_noexf_ctx    libc_feholdsetround_noex_ppc_ctx
+#define libc_feholdsetround_noexl_ctx    libc_feholdsetround_noex_ppc_ctx
 #define libc_feresetround_ctx            libc_feresetround_ppc_ctx
 #define libc_feresetroundf_ctx           libc_feresetround_ppc_ctx
 #define libc_feresetroundl_ctx           libc_feresetround_ppc_ctx
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 3/4] [powerpc] fesetenv: optimize FPSCR access

Paul A. Clarke
In reply to this post by Paul A. Clarke
From: "Paul A. Clarke" <[hidden email]>

fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state.  All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.

Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.

Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').

2019-08-20  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv):  Utilize lightweight
        FPSCR read.
        (_FPU_MASK_ALL):  Delete.
---
 sysdeps/powerpc/fpu/fesetenv.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c
index 009a4f0..5ca15c7 100644
--- a/sysdeps/powerpc/fpu/fesetenv.c
+++ b/sysdeps/powerpc/fpu/fesetenv.c
@@ -19,8 +19,6 @@
 #include <fenv_libc.h>
 #include <fpu_control.h>
 
-#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM)
-
 int
 __fesetenv (const fenv_t *envp)
 {
@@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp)
 
   /* get the currently set exceptions.  */
   new.fenv = *envp;
-  old.fenv = fegetenv_register ();
-  if (old.l == new.l)
-    return 0;
+  old.fenv = fegetenv_status ();
 
   /* If the old env has no enabled exceptions and the new env has any enabled
      exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
      hardware into "precise mode" and may cause the FPU to run slower on some
      hardware.  */
-  if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
+  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
     (void) __fe_nomask_env_priv ();
 
   /* If the old env had any enabled exceptions and the new env has no enabled
      exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
      FPU to run faster because it always takes the default action and can not
      generate SIGFPE. */
-  if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
+  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
     (void)__fe_mask_env ();
 
-  fesetenv_register (*envp);
+  fesetenv_register (new.fenv);
 
   /* Success.  */
   return 0;
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH 4/4] [powerpc] fegetenv_status: simplify instruction generation

Paul A. Clarke
In reply to this post by Paul A. Clarke
From: "Paul A. Clarke" <[hidden email]>

fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.

Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'.  'mffs' is a superset
of 'mffsl'.

So, just generate 'mffsl'.

2019-08-20  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status_ISA300):  Delete.
        (fegetenv_status):  Generate 'mffsl' unconditionally.
---
 sysdeps/powerpc/fpu/fenv_libc.h | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 8ba4832..186612b 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -37,7 +37,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 /* Equivalent to fegetenv_register, but only returns bits for
    status, exception enables, and mode.  */
 
-#define fegetenv_status_ISA300() \
+#define fegetenv_status() \
   ({register double __fr; \
     __asm__ __volatile__ ( \
       ".machine push; .machine \"power9\"; mffsl %0; .machine pop" \
@@ -45,18 +45,6 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
     __fr; \
   })
 
-#ifdef _ARCH_PWR9
-# define fegetenv_status() fegetenv_status_ISA300()
-#elif defined __BUILTIN_CPU_SUPPORTS__
-# define fegetenv_status() \
-  (__glibc_likely (__builtin_cpu_supports ("arch_3_00")) \
-   ? fegetenv_status_ISA300() \
-   : fegetenv_register() \
-  )
-#else
-# define fegetenv_status() fegetenv_register ()
-#endif
-
 /* Equivalent to fesetenv, but takes a fenv_t instead of a pointer.  */
 #define fesetenv_register(env) \
  do { \
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 4/4] [powerpc] fegetenv_status: simplify instruction generation

Paul E Murphy


On 8/20/19 4:19 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> fegetenv_status() wants to use the lighter weight instruction 'mffsl'
> for reading the Floating-Point Status and Control Register (FPSCR).
> It currently will use it directly if compiled '-mcpu=power9', and will
> perform a runtime check (cpu_supports("arch_3_00")) otherwise.
>
> Nicely, it turns out that the 'mffsl' instruction will decode to
> 'mffs' on architectures older than "arch_3_00" because the additional
> bits set for 'mffsl' are "don't care" for 'mffs'.  'mffs' is a superset
> of 'mffsl'.

That is a pretty neat trick. I would recommend inlining the above
comment. Otherwise, LGTM.

> -#define fegetenv_status_ISA300() \
> +#define fegetenv_status() \
>     ({register double __fr; \
>       __asm__ __volatile__ ( \
>         ".machine push; .machine \"power9\"; mffsl %0; .machine pop" \
> @@ -45,18 +45,6 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
>       __fr; \
>     })
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/4] [powerpc] fe{en,dis}ableexcept, fesetmode: optimize FPSCR accesses

Paul E Murphy
In reply to this post by Paul A. Clarke


On 8/20/19 4:19 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
> "mode" (exception enable and rounding mode) bits of the Floating Point Status
> Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
> to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
> used to write just those bits back to the FPSCR.  The net is better performance.
>
> In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
> they determine that it doesn't need to be written because it is not changing.
> In either case, the local variable holds the current values of the enable
> bits in the FPSCR.  This local variable can be used instead of again reading
> the FPSCR.
>
> Also, that value of the FPSCR which is read the second time is validated
> against the requested enables.  Since the write can't fail, this validation
> step is unnecessary, and can be removed.  Instead, the exceptions to be
> enabled (or disabled) are transformed into available bits in the FPSCR,
> then validated after being transformed back, to ensure that all requested
> bits are actually being set.  For example, FE_INVALID_SQRT can be
> requested, but cannot actually be set.  This bit is not mapped during the
> transformations, so a test for that bit being set before and after
> transformations will show the bit would not be set, and the function will
> return -1 for failure.
>
> Finally, convert the local macros in fesetmode.c to more generally useful
> macros in fenv_libc.h.
>
> 2019-08-20  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): New.
> (FPSCR_FPRF_MASK): New. (FPSCR_STATUS_MASK): New.
> * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Use lighter-
> weight access to FPSCR; remove unnecessary second FPSCR read and
> validate.
> * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Likewise.
> * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Use lighter-weight
> access to FPSCR; Use macros in fenv_libc.h in favor of local.
> ---
> v2:
> - Address issue raised by Paul Murphy.  If the specified set of exceptions
>    cannot be enabled (or disabled), then the function will return failure.
> - The current version of the code will enable (or disable) what it can
>    _and_ return failure.  This version will just return failure.

Thanks. LGTM.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/4] [powerpc] SET_RESTORE_ROUND improvements

Paul E Murphy
In reply to this post by Paul A. Clarke


On 8/20/19 4:19 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
> libc_feresetround_ppc_ctx to bracket a block of code where the floating point
> rounding mode must be set to a certain value.
>
> For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
> 1. Read/save FPSCR.
> 2. Create new value for FPSCR with new rounding mode and enables cleared.
> 3. If new value is different than current value,
>     a. If transitioning from a state where some exceptions enabled,
>        enter "ignore exceptions / non-stop" mode.
>     b. Write new value to FPSCR.
>     c. Put a mark on the wall indicating the FPSCR was changed.
>
> (1) uses the 'mffs' instruction.  On POWER9, the lighter weight 'mffsl'
> instruction can be used, but it doesn't return all of the bits in the FPSCR.
> fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
> used instead of fegetenv_register.
> (3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
> instead use 'mtfsf 0b00000011' to write just the enables and the mode,
> because some of the rest of the bits are not valid if 'mffsl' was used.
> fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
> otherwise.
>
> For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
> calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
> parameters such that it performs:
> 1. Retreive saved value of FPSCR, saved in prologue above.
> 2. Read FPSCR.
> 3. Create new value of FPSCR where:
>     - Summary bits and exception indicators = current OR saved.
>     - Rounding mode and enables = saved.
>     - Status bits = current.
> 4. If transitioning from some exceptions enabled to none,
>     enter "ignore exceptions / non-stop" mode.
> 5. If transitioning from no exceptions enabled to some,
>     enter "catch exceptions" mode.
> 6. Write new value to FPSCR.
>
> The summary bits are hardwired to the exception indicators, so there is no
> need to restore any saved summary bits.
> The exception indicator bits, which are sticky and remain set unless
> explicitly cleared, would only need to be restored if the code block
> might explicitly clear any of them.  This is certainly not expected.
>
> So, the only bits that need to be restored are the enables and the mode.
> If it is the case that only those bits are to be restored, there is no need to
> read the FPSCR.  Steps (2) and (3) are unnecessary, and step (6) only needs to
> write the bits being restored.
>
> We know we are transitioning out of "ignore exceptions" mode, so step (4) is
> unnecessary, and in step (6), we only need to check the state we are
> entering.
>
> 2019-08-20  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
> Utilize lightweight FPSCR read if possible, set fewer FPSCR bits
> if possible.
> (libc_feresetround_ppc):  Replace call to __libc_femergeenv_ppc
> with simpler required steps, set fewer FPSCR bits if possible.
> (libc_feresetround_noex_ppc_ctx):  New.
> (libc_feresetround_noex_ctx):  New.
> (libc_feresetround_noexf_ctx):  New.
> (libc_feresetround_noexl_ctx):  New.
> ---
> v2:
> - Address issue raised by Paul Murphy.  The first version of the patch was
>    broken with respect to the "no exceptions" (NOEX) versions of the macros.
>
>   sysdeps/powerpc/fpu/fenv_private.h | 38 ++++++++++++++++++++++++++++++++++++--
>   1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
> index 8c126f9..5ebe6cd 100644
> --- a/sysdeps/powerpc/fpu/fenv_private.h
> +++ b/sysdeps/powerpc/fpu/fenv_private.h
> @@ -132,7 +132,17 @@ libc_fesetenv_ppc (const fenv_t *envp)
>   static __always_inline void
>   libc_feresetround_ppc (fenv_t *envp)
>   {
> -  __libc_femergeenv_ppc (envp, _FPU_MASK_TRAPS_RN, _FPU_MASK_FRAC_INEX_RET_CC);
> +  fenv_union_t new = { .fenv = *envp };
> +
> +  /* If the old env has no enabled exceptions and the new env has any enabled
> +     exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
> +     hardware into "precise mode" and may cause the FPU to run slower on some
> +     hardware.  */
> +  if ((new.l & _FPU_ALL_TRAPS) != 0)
> +    (void) __fe_nomask_env_priv ();
> +
> +  /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
> +  fesetenv_mode (new.fenv);
>   }
>  
>   static __always_inline int
> @@ -176,9 +186,30 @@ libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
>   {
>     fenv_union_t old, new;
>  
> +  old.fenv = fegetenv_status ();
> +
> +  new.l = (old.l & ~(FPSCR_ENABLES_MASK|FPSCR_RN_MASK)) | r;
> +
> +  ctx->env = old.fenv;
> +  if (__glibc_unlikely (new.l != old.l))
> +    {
> +      if ((old.l & _FPU_ALL_TRAPS) != 0)
> + (void) __fe_mask_env ();
> +      fesetenv_mode (new.fenv);
> +      ctx->updated_status = true;
> +    }
> +  else
> +    ctx->updated_status = false;
> +}
> +
> +static __always_inline void
> +libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
> +{
> +  fenv_union_t old, new;
> +
>     old.fenv = fegetenv_register ();
>  
> -  new.l = (old.l & _FPU_MASK_TRAPS_RN) | r;
> +  new.l = (old.l & ~(FPSCR_ENABLES_MASK|FPSCR_RN_MASK)) | r;
>  
>     ctx->env = old.fenv;
>     if (__glibc_unlikely (new.l != old.l))
> @@ -218,6 +249,9 @@ libc_feresetround_ppc_ctx (struct rm_ctx *ctx)
>   #define libc_feholdsetround_ctx          libc_feholdsetround_ppc_ctx
>   #define libc_feholdsetroundf_ctx         libc_feholdsetround_ppc_ctx
>   #define libc_feholdsetroundl_ctx         libc_feholdsetround_ppc_ctx
> +#define libc_feholdsetround_noex_ctx     libc_feholdsetround_noex_ppc_ctx
> +#define libc_feholdsetround_noexf_ctx    libc_feholdsetround_noex_ppc_ctx
> +#define libc_feholdsetround_noexl_ctx    libc_feholdsetround_noex_ppc_ctx
>   #define libc_feresetround_ctx            libc_feresetround_ppc_ctx
>   #define libc_feresetroundf_ctx           libc_feresetround_ppc_ctx
>   #define libc_feresetroundl_ctx           libc_feresetround_ppc_ctx
>

LGTM. Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 3/4] [powerpc] fesetenv: optimize FPSCR access

Paul E Murphy
In reply to this post by Paul A. Clarke


On 8/20/19 4:19 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> fesetenv() reads the current value of the Floating-Point Status and Control
> Register (FPSCR) to determine the difference between the current state of
> exception enables and the newly requested state.  All of these bits are also
> returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
> Use that instead.
>
> Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
> FPU_ENABLES_MASK from fenv_libc.h.
>
> Finally, use a local variable ('new') in favor of a pointer dereference
> ('*envp').
>
> 2019-08-20  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv):  Utilize lightweight
> FPSCR read.
> (_FPU_MASK_ALL):  Delete.
> ---
>   sysdeps/powerpc/fpu/fesetenv.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c
> index 009a4f0..5ca15c7 100644
> --- a/sysdeps/powerpc/fpu/fesetenv.c
> +++ b/sysdeps/powerpc/fpu/fesetenv.c
> @@ -19,8 +19,6 @@
>   #include <fenv_libc.h>
>   #include <fpu_control.h>
>  
> -#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM)
> -
>   int
>   __fesetenv (const fenv_t *envp)
>   {
> @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp)
>  
>     /* get the currently set exceptions.  */
>     new.fenv = *envp;
> -  old.fenv = fegetenv_register ();
> -  if (old.l == new.l)
> -    return 0;
> +  old.fenv = fegetenv_status ();
>  
>     /* If the old env has no enabled exceptions and the new env has any enabled
>        exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
>        hardware into "precise mode" and may cause the FPU to run slower on some
>        hardware.  */
> -  if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
> +  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
>       (void) __fe_nomask_env_priv (); >
>     /* If the old env had any enabled exceptions and the new env has no enabled
>        exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
>        FPU to run faster because it always takes the default action and can not
>        generate SIGFPE. */
> -  if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
> +  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
>       (void)__fe_mask_env ();
If you need to make another similar change, I might recommend
consolidating the code to toggle the MSR bits.

>  
> -  fesetenv_register (*envp);
> +  fesetenv_register (new.fenv);
>  
>     /* Success.  */
>     return 0;
>

LGTM, thanks.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 3/4] [powerpc] fesetenv: optimize FPSCR access

Paul A. Clarke
On 8/21/19 5:13 PM, Paul E Murphy wrote:
> On 8/20/19 4:19 PM, Paul A. Clarke wrote:

>> diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c
>> index 009a4f0..5ca15c7 100644
>> --- a/sysdeps/powerpc/fpu/fesetenv.c
>> +++ b/sysdeps/powerpc/fpu/fesetenv.c
>> @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp)

>>       /* If the old env has no enabled exceptions and the new env has any enabled
>>        exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
>>        hardware into "precise mode" and may cause the FPU to run slower on some
>>        hardware.  */
>> -  if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
>> +  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
>>       (void) __fe_nomask_env_priv (); >
>>     /* If the old env had any enabled exceptions and the new env has no enabled
>>        exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
>>        FPU to run faster because it always takes the default action and can not
>>        generate SIGFPE. */
>> -  if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
>> +  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
>>       (void)__fe_mask_env ();

> If you need to make another similar change, I might recommend consolidating the code to toggle the MSR bits.

Could you elaborate on what consolidation you are anticipating?

PC
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 3/4] [powerpc] fesetenv: optimize FPSCR access

Paul E Murphy


On 8/22/19 11:55 AM, Paul Clarke wrote:

> On 8/21/19 5:13 PM, Paul E Murphy wrote:
>> On 8/20/19 4:19 PM, Paul A. Clarke wrote:
>
>>> diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c
>>> index 009a4f0..5ca15c7 100644
>>> --- a/sysdeps/powerpc/fpu/fesetenv.c
>>> +++ b/sysdeps/powerpc/fpu/fesetenv.c
>>> @@ -28,25 +26,23 @@ __fesetenv (const fenv_t *envp)
>
>>>        /* If the old env has no enabled exceptions and the new env has any enabled
>>>         exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
>>>         hardware into "precise mode" and may cause the FPU to run slower on some
>>>         hardware.  */
>>> -  if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
>>> +  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
>>>        (void) __fe_nomask_env_priv (); >
>>>      /* If the old env had any enabled exceptions and the new env has no enabled
>>>         exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
>>>         FPU to run faster because it always takes the default action and can not
>>>         generate SIGFPE. */
>>> -  if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
>>> +  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
>>>        (void)__fe_mask_env ();
>
>> If you need to make another similar change, I might recommend consolidating the code to toggle the MSR bits.
>
> Could you elaborate on what consolidation you are anticipating?

Code similar to the above exists in at least four places (as of
f615e3fced) to toggle the MSR bits if the number of enabled exceptions
changes from 0.

git grep "([a-z]*[.]l & _FPU_[A-Z_]*) .= . && ([a-z]*.l & _FPU_[A-Z_]*)"

Maybe those can be moved into a single macro/inline function in
fenv_private.h if more fixups need to be made?

>
> PC
>