[PATCH v2 0/6] Various FPSCR-related changes

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH v2 0/6] Various FPSCR-related changes

Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

I collected a set of patches, some already posted but until now not
in a series.  Thus, the "v2".  I changed the order a bit, and there
are a couple of new changes included.  Only one has been previously
reviewed, marked with "Reviewed-by" therein.

Paul A. Clarke (6):
  [powerpc] fenv_private.h clean up
  [powerpc] No need to enter "Ignore Exceptions Mode"
  [powerpc] libc_feupdateenv_test: optimize FPSCR access
  [powerpc] libc_feholdsetround_noex_ppc_ctx: optimize FPSCR write
  [powerpc] __fesetround_inline optimizations
  [powerpc] Rename fegetenv_status to fegetenv_control

 sysdeps/powerpc/fpu/fedisblxcpt.c  |   5 +-
 sysdeps/powerpc/fpu/feenablxcpt.c  |   5 +-
 sysdeps/powerpc/fpu/fegetexcept.c  |   2 +-
 sysdeps/powerpc/fpu/fegetmode.c    |   2 +-
 sysdeps/powerpc/fpu/feholdexcpt.c  |   7 +--
 sysdeps/powerpc/fpu/fenv_libc.h    |  56 +++++++++++++++++---
 sysdeps/powerpc/fpu/fenv_private.h | 103 +++++++++----------------------------
 sysdeps/powerpc/fpu/fesetenv.c     |  19 ++-----
 sysdeps/powerpc/fpu/fesetmode.c    |   9 ++--
 sysdeps/powerpc/fpu/feupdateenv.c  |  17 +-----
 10 files changed, 91 insertions(+), 134 deletions(-)

--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 1/6] [powerpc] fenv_private.h clean up

Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

fenv_private.h includes unused functions, magic macro constants, and
some replicated common code fragments.

Remove unused functions, replace magic constants with constants from
fenv_libc.h, and refactor replicated code.

Suggested-by: Paul E. Murphy <[hidden email]>

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h:
        (__TEST_AND_ENTER_NON_STOP): New.
        (__TEST_AND_EXIT_NON_STOP): New.
        * sysdeps/powerpc/fpu/fenv_private.h
        (_FPU_ALL_TRAPS): Delete, replace with FPSCR_ENABLES_MASK.
        (_FPU_MASK_RN): Delete.
        (_FPU_MASK_NOT_RN_NI): Delete.
        (_FPU_MASK_TRAPS_RN): Delete, replace with ~FPSCR_CONTROL_MASK.
        (_FPU_MASK_FRAC_INEX_RET_CC): Delete, replace with ~FPSCR_STATUS_MASK.
        (__libc_feholdbits_ppc): Delete, move code into
        libc_feholdexcept_setround_ppc.
        (libc_feholdexcept_ppc): Delete.
        (libc_fesetround_ppc): Delete.
        (libc_fetestexcept_ppc): Delete.
        (libc_feholdsetround_ppc): Delete.
        (__libc_femergeenv_ppc): Use __TEST_AND_ENTER/EXIT_NON_STOP.
        (libc_feholdsetround_noex_ppc_ctx): Likewise.
        (libc_feupdateenv_test_ppc): Use FPSCR defines.
        * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Use
        __TEST_AND_ENTER_NON_STOP.
        * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Likewise.
        * sysdeps/powerpc/fpu/feholdexcpt.c (__feholdexcept): Likewise.
        * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Likewise.
        * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Likewise.
        * sysdeps/powerpc/fpu/feupdateenv.c (__feupdateenv): Likewise.
        (_FPU_MASK_ALL): Delete.
---
v2:
- Use new __TEST_AND_ENTER/EXIT_NON_STOP macros everywhere.
- Remove more local _FPU macros.

 sysdeps/powerpc/fpu/fedisblxcpt.c  |  3 +-
 sysdeps/powerpc/fpu/feenablxcpt.c  |  3 +-
 sysdeps/powerpc/fpu/feholdexcpt.c  |  7 +---
 sysdeps/powerpc/fpu/fenv_libc.h    | 20 +++++++++
 sysdeps/powerpc/fpu/fenv_private.h | 83 +++++---------------------------------
 sysdeps/powerpc/fpu/fesetenv.c     | 15 +------
 sysdeps/powerpc/fpu/fesetmode.c    |  7 +---
 sysdeps/powerpc/fpu/feupdateenv.c  | 17 +-------
 8 files changed, 38 insertions(+), 117 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c
index d0f7fe6..0d9bf00 100644
--- a/sysdeps/powerpc/fpu/fedisblxcpt.c
+++ b/sysdeps/powerpc/fpu/fedisblxcpt.c
@@ -43,8 +43,7 @@ fedisableexcept (int excepts)
   if (fe.l != curr.l)
     fesetenv_mode (fe.fenv);
 
-  if (new == 0 && result != 0)
-    (void)__fe_mask_env ();
+  __TEST_AND_ENTER_NON_STOP (-1ULL, fe.l);
 
   return result;
 }
diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c
index fc96b24..cf670b8 100644
--- a/sysdeps/powerpc/fpu/feenablxcpt.c
+++ b/sysdeps/powerpc/fpu/feenablxcpt.c
@@ -43,8 +43,7 @@ feenableexcept (int excepts)
   if (fe.l != curr.l)
     fesetenv_mode (fe.fenv);
 
-  if (new != 0 && result == 0)
-    (void) __fe_nomask_env_priv ();
+  __TEST_AND_EXIT_NON_STOP (0ULL, fe.l);
 
   return result;
 }
diff --git a/sysdeps/powerpc/fpu/feholdexcpt.c b/sysdeps/powerpc/fpu/feholdexcpt.c
index 2939d64..bcd09f6 100644
--- a/sysdeps/powerpc/fpu/feholdexcpt.c
+++ b/sysdeps/powerpc/fpu/feholdexcpt.c
@@ -18,7 +18,6 @@
 
 #include <fenv_libc.h>
 #include <fpu_control.h>
-#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM)
 
 int
 __feholdexcept (fenv_t *envp)
@@ -35,11 +34,7 @@ __feholdexcept (fenv_t *envp)
   if (new.l == old.l)
     return 0;
 
-  /* If the old env had any enabled exceptions, then mask SIGFPE in the
-     MSR FE0/FE1 bits.  This may allow the FPU to run faster because it
-     always takes the default action and can not generate SIGFPE. */
-  if ((old.l & _FPU_MASK_ALL) != 0)
-    (void)__fe_mask_env ();
+  __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
 
   /* Put the new state in effect.  */
   fesetenv_register (new.fenv);
diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 59c3d57..bc2684e 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -27,6 +27,26 @@ extern const fenv_t *__fe_nomask_env_priv (void);
 
 extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 
+/* If the old env had any enabled exceptions and the new env has no enabled
+   exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
+   FPU to run faster because it always takes the default action and can not
+   generate SIGFPE.  */
+#define __TEST_AND_ENTER_NON_STOP(old, new) \
+  do { \
+    if (((old) & FPSCR_ENABLES_MASK) != 0 && ((new) & FPSCR_ENABLES_MASK) == 0) \
+      (void) __fe_mask_env (); \
+  } while (0)
+
+/* If the old env has no enabled exceptions and the new env has any enabled
+   exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
+   hardware into "precise mode" and may cause the FPU to run slower on some
+   hardware.  */
+#define __TEST_AND_EXIT_NON_STOP(old, new) \
+  do { \
+    if (((old) & FPSCR_ENABLES_MASK) == 0 && ((new) & FPSCR_ENABLES_MASK) != 0) \
+      (void) __fe_nomask_env_priv (); \
+  } while (0)
+
 /* The sticky bits in the FPSCR indicating exceptions have occurred.  */
 #define FPSCR_STICKY_BITS ((FE_ALL_EXCEPT | FE_ALL_INVALID) & ~FE_INVALID)
 
diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
index 92a3e92..30cbf30 100644
--- a/sysdeps/powerpc/fpu/fenv_private.h
+++ b/sysdeps/powerpc/fpu/fenv_private.h
@@ -23,73 +23,20 @@
 #include <fenv_libc.h>
 #include <fpu_control.h>
 
-/* Mask for the exception enable bits.  */
-#define _FPU_ALL_TRAPS (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM \
-                      | _FPU_MASK_XM | _FPU_MASK_IM)
-
-/* Mask the rounding mode bits.  */
-#define _FPU_MASK_RN 0xfffffffffffffffcLL
-
-/* Mask everything but the rounding modes and non-IEEE arithmetic flags.  */
-#define _FPU_MASK_NOT_RN_NI 0xffffffff00000807LL
-
-/* Mask restore rounding mode and exception enabled.  */
-#define _FPU_MASK_TRAPS_RN 0xffffffffffffff00LL
-
-/* Mask FP result flags, preserve fraction rounded/inexact bits.  */
-#define _FPU_MASK_FRAC_INEX_RET_CC 0xfffffffffff80fffLL
-
 static __always_inline void
-__libc_feholdbits_ppc (fenv_t *envp, unsigned long long mask,
- unsigned long long bits)
+libc_feholdexcept_setround_ppc (fenv_t *envp, int r)
 {
   fenv_union_t old, new;
 
   old.fenv = *envp = fegetenv_register ();
 
-  new.l = (old.l & mask) | bits;
-
-  /* If the old env had any enabled exceptions, then mask SIGFPE in the
-     MSR FE0/FE1 bits.  This may allow the FPU to run faster because it
-     always takes the default action and can not generate SIGFPE.  */
-  if ((old.l & _FPU_ALL_TRAPS) != 0)
-    (void) __fe_mask_env ();
+  __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
 
+  /* Clear everything and set the rounding mode.  */
+  new.l = r;
   fesetenv_register (new.fenv);
 }
 
-static __always_inline void
-libc_feholdexcept_ppc (fenv_t *envp)
-{
-  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI, 0LL);
-}
-
-static __always_inline void
-libc_feholdexcept_setround_ppc (fenv_t *envp, int r)
-{
-  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI & _FPU_MASK_RN, r);
-}
-
-static __always_inline void
-libc_fesetround_ppc (int r)
-{
-  __fesetround_inline (r);
-}
-
-static __always_inline int
-libc_fetestexcept_ppc (int e)
-{
-  fenv_union_t u;
-  u.fenv = fegetenv_register ();
-  return u.l & e;
-}
-
-static __always_inline void
-libc_feholdsetround_ppc (fenv_t *e, int r)
-{
-  __libc_feholdbits_ppc (e, _FPU_MASK_TRAPS_RN, r);
-}
-
 static __always_inline unsigned long long
 __libc_femergeenv_ppc (const fenv_t *envp, unsigned long long old_mask,
  unsigned long long new_mask)
@@ -102,19 +49,8 @@ __libc_femergeenv_ppc (const fenv_t *envp, unsigned long long old_mask,
   /* Merge bits while masking unwanted bits from new and old env.  */
   new.l = (old.l & old_mask) | (new.l & new_mask);
 
-  /* If the old env has no enabled exceptions and the new env has any enabled
-     exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
-     hardware into "precise mode" and may cause the FPU to run slower on some
-     hardware.  */
-  if ((old.l & _FPU_ALL_TRAPS) == 0 && (new.l & _FPU_ALL_TRAPS) != 0)
-    (void) __fe_nomask_env_priv ();
-
-  /* If the old env had any enabled exceptions and the new env has no enabled
-     exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
-     FPU to run faster because it always takes the default action and can not
-     generate SIGFPE.  */
-  if ((old.l & _FPU_ALL_TRAPS) != 0 && (new.l & _FPU_ALL_TRAPS) == 0)
-    (void) __fe_mask_env ();
+  __TEST_AND_EXIT_NON_STOP (old.l, new.l);
+  __TEST_AND_ENTER_NON_STOP (old.l, new.l);
 
   /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
   fesetenv_register (new.fenv);
@@ -139,8 +75,8 @@ libc_feresetround_ppc (fenv_t *envp)
 static __always_inline int
 libc_feupdateenv_test_ppc (fenv_t *envp, int ex)
 {
-  return __libc_femergeenv_ppc (envp, _FPU_MASK_TRAPS_RN,
- _FPU_MASK_FRAC_INEX_RET_CC) & ex;
+  return __libc_femergeenv_ppc (envp, ~FPSCR_CONTROL_MASK,
+ ~FPSCR_STATUS_MASK) & ex;
 }
 
 static __always_inline void
@@ -193,8 +129,7 @@ libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
   ctx->env = old.fenv;
   if (__glibc_unlikely (new.l != old.l))
     {
-      if ((old.l & _FPU_ALL_TRAPS) != 0)
- (void) __fe_mask_env ();
+      __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
       fesetenv_register (new.fenv);
       ctx->updated_status = true;
     }
diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c
index 96f8d99..949d916 100644
--- a/sysdeps/powerpc/fpu/fesetenv.c
+++ b/sysdeps/powerpc/fpu/fesetenv.c
@@ -28,19 +28,8 @@ __fesetenv (const fenv_t *envp)
   new.fenv = *envp;
   old.fenv = fegetenv_status ();
 
-  /* If the old env has no enabled exceptions and the new env has any enabled
-     exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put the
-     hardware into "precise mode" and may cause the FPU to run slower on some
-     hardware.  */
-  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
-    (void) __fe_nomask_env_priv ();
-
-  /* If the old env had any enabled exceptions and the new env has no enabled
-     exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
-     FPU to run faster because it always takes the default action and can not
-     generate SIGFPE. */
-  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
-    (void)__fe_mask_env ();
+  __TEST_AND_EXIT_NON_STOP (old.l, new.l);
+  __TEST_AND_ENTER_NON_STOP (old.l, new.l);
 
   fesetenv_register (new.fenv);
 
diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c
index 92fc15c..90d86a9 100644
--- a/sysdeps/powerpc/fpu/fesetmode.c
+++ b/sysdeps/powerpc/fpu/fesetmode.c
@@ -33,11 +33,8 @@ fesetmode (const femode_t *modep)
   if (old.l == new.l)
     return 0;
 
-  if ((old.l & FPSCR_ENABLES_MASK) == 0 && (new.l & FPSCR_ENABLES_MASK) != 0)
-    (void) __fe_nomask_env_priv ();
-
-  if ((old.l & FPSCR_ENABLES_MASK) != 0 && (new.l & FPSCR_ENABLES_MASK) == 0)
-    (void) __fe_mask_env ();
+  __TEST_AND_EXIT_NON_STOP (old.l, new.l);
+  __TEST_AND_ENTER_NON_STOP (old.l, new.l);
 
   fesetenv_mode (new.fenv);
   return 0;
diff --git a/sysdeps/powerpc/fpu/feupdateenv.c b/sysdeps/powerpc/fpu/feupdateenv.c
index 931de60..d5c7394 100644
--- a/sysdeps/powerpc/fpu/feupdateenv.c
+++ b/sysdeps/powerpc/fpu/feupdateenv.c
@@ -20,8 +20,6 @@
 #include <fenv_libc.h>
 #include <fpu_control.h>
 
-#define _FPU_MASK_ALL (_FPU_MASK_ZM | _FPU_MASK_OM | _FPU_MASK_UM | _FPU_MASK_XM | _FPU_MASK_IM)
-
 int
 __feupdateenv (const fenv_t *envp)
 {
@@ -36,19 +34,8 @@ __feupdateenv (const fenv_t *envp)
      unchanged.  */
   new.l = (old.l & 0xffffffff1fffff00LL) | (new.l & 0x1ff80fff);
 
-  /* If the old env has no enabled exceptions and the new env has any enabled
-     exceptions, then unmask SIGFPE in the MSR FE0/FE1 bits.  This will put
-     the hardware into "precise mode" and may cause the FPU to run slower on
-     some hardware.  */
-  if ((old.l & _FPU_MASK_ALL) == 0 && (new.l & _FPU_MASK_ALL) != 0)
-    (void) __fe_nomask_env_priv ();
-
-  /* If the old env had any enabled exceptions and the new env has no enabled
-     exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
-     FPU to run faster because it always takes the default action and can not
-     generate SIGFPE. */
-  if ((old.l & _FPU_MASK_ALL) != 0 && (new.l & _FPU_MASK_ALL) == 0)
-    (void)__fe_mask_env ();
+  __TEST_AND_EXIT_NON_STOP (old.l, new.l);
+  __TEST_AND_ENTER_NON_STOP (old.l, new.l);
 
   /* Atomically enable and raise (if appropriate) exceptions set in `new'. */
   fesetenv_register (new.fenv);
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 2/6] [powerpc] No need to enter "Ignore Exceptions Mode"

Paul Clarke
In reply to this post by Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

Since at least POWER8, there is no performance advantage to entering
"Ignore Exceptions Mode", and doing so conditionally requires the
conditional logic as well as a system call.  Make it a no-op.

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h:
        (__ENTER_NON_STOP): New.
        (__EXIT_NON_STOP): New.
        (__TEST_AND_ENTER_NON_STOP): Use __ENTER_NON_STOP.
        (__TEST_AND_EXIT_NON_STOP): Use __EXIT_NON_STOP.
---
v2: This is a new patch in the series.

 sysdeps/powerpc/fpu/fenv_libc.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index bc2684e..549defa 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -27,6 +27,14 @@ extern const fenv_t *__fe_nomask_env_priv (void);
 
 extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 
+#ifdef _ARCH_PWR8
+/* There is no performance advantage to non-stop mode.  */
+#define __ENTER_NON_STOP() do {} while (0)
+#else
+#define __ENTER_NON_STOP() do { (void) __fe_mask_env (); } while (0)
+#endif
+#define __EXIT_NON_STOP() do { (void) __fe_nomask_env_priv (); } while (0)
+
 /* If the old env had any enabled exceptions and the new env has no enabled
    exceptions, then mask SIGFPE in the MSR FE0/FE1 bits.  This may allow the
    FPU to run faster because it always takes the default action and can not
@@ -34,7 +42,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 #define __TEST_AND_ENTER_NON_STOP(old, new) \
   do { \
     if (((old) & FPSCR_ENABLES_MASK) != 0 && ((new) & FPSCR_ENABLES_MASK) == 0) \
-      (void) __fe_mask_env (); \
+      __ENTER_NON_STOP (); \
   } while (0)
 
 /* If the old env has no enabled exceptions and the new env has any enabled
@@ -44,7 +52,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 #define __TEST_AND_EXIT_NON_STOP(old, new) \
   do { \
     if (((old) & FPSCR_ENABLES_MASK) == 0 && ((new) & FPSCR_ENABLES_MASK) != 0) \
-      (void) __fe_nomask_env_priv (); \
+      __EXIT_NON_STOP (); \
   } while (0)
 
 /* The sticky bits in the FPSCR indicating exceptions have occurred.  */
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 3/6] [powerpc] libc_feupdateenv_test: optimize FPSCR access

Paul Clarke
In reply to this post by Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

ROUND_TO_ODD and a couple of other places use libc_feupdateenv_test to
restore the rounding mode and exception enables, preserve exception flags,
and test whether given exception(s) were generated.

If the exception flags haven't changed, then it is sufficient and a bit
more efficient to just restore the rounding mode and enables, rather than
writing the full Floating-Point Status and Control Register (FPSCR).

Reviewed-by: Paul E. Murphy <[hidden email]>

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h (FPSCR_EXCEPTIONS_MASK):  New.
        * sysdeps/powerpc/fpu/fenv_private.h (__libc_femergeenv_ppc):  Optimize
        to write FPSCR control only, if exceptions have not changed.
---
v2:
- No changes, but respun after removing _FPU macros at the suggestion
  of Paul Murphy, to make it a bit easier to review.

 sysdeps/powerpc/fpu/fenv_libc.h    |  4 ++++
 sysdeps/powerpc/fpu/fenv_private.h | 16 ++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 549defa..53de1c8 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -264,6 +264,10 @@ enum {
   (FPSCR_VE_MASK|FPSCR_OE_MASK|FPSCR_UE_MASK|FPSCR_ZE_MASK|FPSCR_XE_MASK)
 #define FPSCR_BASIC_EXCEPTIONS_MASK \
   (FPSCR_VX_MASK|FPSCR_OX_MASK|FPSCR_UX_MASK|FPSCR_ZX_MASK|FPSCR_XX_MASK)
+#define FPSCR_EXCEPTIONS_MASK (FPSCR_BASIC_EXCEPTIONS_MASK| \
+  FPSCR_VXSNAN_MASK|FPSCR_VXISI_MASK|FPSCR_VXIDI_MASK|FPSCR_VXZDZ_MASK| \
+  FPSCR_VXIMZ_MASK|FPSCR_VXVC_MASK|FPSCR_VXSOFT_MASK|FPSCR_VXSQRT_MASK| \
+  FPSCR_VXCVI_MASK)
 #define FPSCR_FPRF_MASK \
   (FPSCR_FPRF_C_MASK|FPSCR_FPRF_FL_MASK|FPSCR_FPRF_FG_MASK| \
    FPSCR_FPRF_FE_MASK|FPSCR_FPRF_FU_MASK)
diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
index 30cbf30..9496026 100644
--- a/sysdeps/powerpc/fpu/fenv_private.h
+++ b/sysdeps/powerpc/fpu/fenv_private.h
@@ -52,8 +52,20 @@ __libc_femergeenv_ppc (const fenv_t *envp, unsigned long long old_mask,
   __TEST_AND_EXIT_NON_STOP (old.l, new.l);
   __TEST_AND_ENTER_NON_STOP (old.l, new.l);
 
-  /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
-  fesetenv_register (new.fenv);
+  /* If requesting to keep status, replace control, and merge exceptions,
+     and exceptions haven't changed, we can just set new control instead
+     of the whole FPSCR.  */
+  if ((old_mask & (FPSCR_CONTROL_MASK|FPSCR_STATUS_MASK|FPSCR_EXCEPTIONS_MASK))
+      == (FPSCR_STATUS_MASK|FPSCR_EXCEPTIONS_MASK) &&
+      (new_mask & (FPSCR_CONTROL_MASK|FPSCR_STATUS_MASK|FPSCR_EXCEPTIONS_MASK))
+      == (FPSCR_CONTROL_MASK|FPSCR_EXCEPTIONS_MASK) &&
+      (old.l & FPSCR_EXCEPTIONS_MASK) == (new.l & FPSCR_EXCEPTIONS_MASK))
+  {
+    fesetenv_mode (new.fenv);
+  }
+  else
+    /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
+    fesetenv_register (new.fenv);
 
   return old.l;
 }
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 4/6] [powerpc] libc_feholdsetround_noex_ppc_ctx: optimize FPSCR write

Paul Clarke
In reply to this post by Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

libc_feholdsetround_noex_ppc_ctx currently does, basically:
1. Read FPSCR, save to context.
2. Create new FPSCR value: clear enables and set new rounding mode.
3. Write new value to FPSCR.

Since other bits just pass through, there is no need to write them.

Instead, write just the changed values (enables and rounding mode),
which can be a bit more efficient.

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_private.h
        (libc_feholdsetround_noex_ppc_ctx): Call fesetenv_mode instead
        of fesetenv_register.
---
v2: No change.

 sysdeps/powerpc/fpu/fenv_private.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
index 9496026..ade0bfa 100644
--- a/sysdeps/powerpc/fpu/fenv_private.h
+++ b/sysdeps/powerpc/fpu/fenv_private.h
@@ -142,7 +142,7 @@ libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
   if (__glibc_unlikely (new.l != old.l))
     {
       __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
-      fesetenv_register (new.fenv);
+      fesetenv_mode (new.fenv);
       ctx->updated_status = true;
     }
   else
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 5/6] [powerpc] __fesetround_inline optimizations

Paul Clarke
In reply to this post by Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

On POWER9, use more efficient means to update the 2-bit rounding mode
via the 'mffscrn' instruction (instead of two 'mtfsb0/1' instructions
or one 'mtfsfi' instruction that modifies 4 bits).

Suggested-by: Paul E. Murphy  <[hidden email]>

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline): Use
        'mffscrn' instruction on POWER9.
        (__fesetround_inline_nocheck): Likewise.
---
v2: No change.

 sysdeps/powerpc/fpu/fenv_libc.h | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 53de1c8..3b91340 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -156,7 +156,12 @@ typedef union
 static inline int
 __fesetround_inline (int round)
 {
-  if ((unsigned int) round < 2)
+#ifdef _ARCH_PWR9
+  __fe_mffscrn (round);
+#else
+  if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00))
+    __fe_mffscrn (round);
+  else if ((unsigned int) round < 2)
     {
        asm volatile ("mtfsb0 30");
        if ((unsigned int) round == 0)
@@ -172,7 +177,7 @@ __fesetround_inline (int round)
        else
          asm volatile ("mtfsb1 31");
     }
-
+#endif
   return 0;
 }
 
@@ -181,7 +186,14 @@ __fesetround_inline (int round)
 static inline void
 __fesetround_inline_nocheck (const int round)
 {
-  asm volatile ("mtfsfi 7,%0" : : "i" (round));
+#ifdef _ARCH_PWR9
+  __fe_mffscrn (round);
+#else
+  if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00))
+    __fe_mffscrn (round);
+  else
+    asm volatile ("mtfsfi 7,%0" : : "i" (round));
+#endif
 }
 
 #define FPSCR_MASK(bit) (1 << (31 - (bit)))
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 6/6] [powerpc] Rename fegetenv_status to fegetenv_control

Paul Clarke
In reply to this post by Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

fegetenv_status is used variously to retrieve the FPSCR exception enable
bits, rounding mode bits, or both.  These are referred to as the control
bits in the POWER ISA.  FPSCR status bits are also returned by the
'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
uses of fegetenv_status.  Change the name to be reflective of its
current and expected use.

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status): Rename to
        fegetenv_control.
        * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Accommodate
        rename of fegetenv_status to fegetenv_control.
        * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Likewise.
        * sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.
        * sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Likewise.
        * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Likewise.
        * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Likewise.
---
v2: This is a new patch in the series.

 sysdeps/powerpc/fpu/fedisblxcpt.c | 2 +-
 sysdeps/powerpc/fpu/feenablxcpt.c | 2 +-
 sysdeps/powerpc/fpu/fegetexcept.c | 2 +-
 sysdeps/powerpc/fpu/fegetmode.c   | 2 +-
 sysdeps/powerpc/fpu/fenv_libc.h   | 6 +++---
 sysdeps/powerpc/fpu/fesetenv.c    | 2 +-
 sysdeps/powerpc/fpu/fesetmode.c   | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c
index 0d9bf00..870cfc8 100644
--- a/sysdeps/powerpc/fpu/fedisblxcpt.c
+++ b/sysdeps/powerpc/fpu/fedisblxcpt.c
@@ -26,7 +26,7 @@ fedisableexcept (int excepts)
   int result, new;
 
   /* Get current exception mask to return.  */
-  fe.fenv = curr.fenv = fegetenv_status ();
+  fe.fenv = curr.fenv = fegetenv_control ();
   result = fenv_reg_to_exceptions (fe.l);
 
   if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID)
diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c
index cf670b8..43f6482 100644
--- a/sysdeps/powerpc/fpu/feenablxcpt.c
+++ b/sysdeps/powerpc/fpu/feenablxcpt.c
@@ -26,7 +26,7 @@ feenableexcept (int excepts)
   int result, new;
 
   /* Get current exception mask to return.  */
-  fe.fenv = curr.fenv = fegetenv_status ();
+  fe.fenv = curr.fenv = fegetenv_control ();
   result = fenv_reg_to_exceptions (fe.l);
 
   if ((excepts & FE_ALL_INVALID) == FE_ALL_INVALID)
diff --git a/sysdeps/powerpc/fpu/fegetexcept.c b/sysdeps/powerpc/fpu/fegetexcept.c
index bd27a80..179e3c4 100644
--- a/sysdeps/powerpc/fpu/fegetexcept.c
+++ b/sysdeps/powerpc/fpu/fegetexcept.c
@@ -24,7 +24,7 @@ __fegetexcept (void)
 {
   fenv_union_t fe;
 
-  fe.fenv = fegetenv_status ();
+  fe.fenv = fegetenv_control ();
 
   return fenv_reg_to_exceptions (fe.l);
 }
diff --git a/sysdeps/powerpc/fpu/fegetmode.c b/sysdeps/powerpc/fpu/fegetmode.c
index 0e0a01c..65c5ebe 100644
--- a/sysdeps/powerpc/fpu/fegetmode.c
+++ b/sysdeps/powerpc/fpu/fegetmode.c
@@ -21,6 +21,6 @@
 int
 fegetmode (femode_t *modep)
 {
-  *modep = fegetenv_status ();
+  *modep = fegetenv_control ();
   return 0;
 }
diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 3b91340..231d264 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -68,7 +68,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
    'mffs' on architectures older than "power9" because the additional
    bits set for 'mffsl' are "don't care" for 'mffs'.  'mffs' is a superset
    of 'mffsl'.  */
-#define fegetenv_status() \
+#define fegetenv_control() \
   ({register double __fr; \
     __asm__ __volatile__ ( \
       ".machine push; .machine \"power9\"; mffsl %0; .machine pop" \
@@ -92,7 +92,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
     __fr.fenv; \
   })
 
-/* Like fegetenv_status, but also sets the rounding mode.  */
+/* Like fegetenv_control, but also sets the rounding mode.  */
 #ifdef _ARCH_PWR9
 #define fegetenv_and_set_rn(rn) __fe_mffscrn (rn)
 #else
@@ -123,7 +123,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 
 /* Set the last 2 nibbles of the FPSCR, which contain the
    exception enables and the rounding mode.
-   'fegetenv_status' retrieves these bits by reading the FPSCR.  */
+   'fegetenv_control' retrieves these bits by reading the FPSCR.  */
 #define fesetenv_mode(env) __builtin_mtfsf (0b00000011, (env));
 
 /* This very handy macro:
diff --git a/sysdeps/powerpc/fpu/fesetenv.c b/sysdeps/powerpc/fpu/fesetenv.c
index 949d916..fc7b819 100644
--- a/sysdeps/powerpc/fpu/fesetenv.c
+++ b/sysdeps/powerpc/fpu/fesetenv.c
@@ -26,7 +26,7 @@ __fesetenv (const fenv_t *envp)
 
   /* get the currently set exceptions.  */
   new.fenv = *envp;
-  old.fenv = fegetenv_status ();
+  old.fenv = fegetenv_control ();
 
   __TEST_AND_EXIT_NON_STOP (old.l, new.l);
   __TEST_AND_ENTER_NON_STOP (old.l, new.l);
diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c
index 90d86a9..1e9a874 100644
--- a/sysdeps/powerpc/fpu/fesetmode.c
+++ b/sysdeps/powerpc/fpu/fesetmode.c
@@ -27,7 +27,7 @@ fesetmode (const femode_t *modep)
   /* Logic regarding enabled exceptions as in fesetenv.  */
 
   new.fenv = *modep;
-  old.fenv = fegetenv_status ();
+  old.fenv = fegetenv_control ();
   new.l = (new.l & ~FPSCR_STATUS_MASK) | (old.l & FPSCR_STATUS_MASK);
 
   if (old.l == new.l)
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

[PATCH v2 7/6] [powerpc] Rename fesetenv_mode to fesetenv_control

Paul Clarke
In reply to this post by Paul Clarke
From: "Paul A. Clarke" <[hidden email]>

fesetenv_mode is used variously to write the FPSCR exception enable
bits and rounding mode bits.  These are referred to as the control
bits in the POWER ISA.  Change the name to be reflective of its
current and expected use, and match up well with fegetenv_control.

2019-09-19  Paul A. Clarke  <[hidden email]>

        * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): Rename to
        fesetenv_control.
        * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Accommodate
        rename of fesetenv_mode to fegetenv_control.
        * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Likewise.
        * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Likewise.
        * sysdeps/powerpc/fpu/fenv_private.h (__libc_femergeenv_ppc): Likewise.
        (libc_feholdsetround_noex_ppc_ctx): Likewise.
---
This patch should've been tacked onto the series that I just posted
"[PATCH v2 0/6] Various FPSCR-related changes", thus the "7/6".  :-?
This is a new patch.

 sysdeps/powerpc/fpu/fedisblxcpt.c  | 2 +-
 sysdeps/powerpc/fpu/feenablxcpt.c  | 2 +-
 sysdeps/powerpc/fpu/fenv_libc.h    | 2 +-
 sysdeps/powerpc/fpu/fenv_private.h | 4 ++--
 sysdeps/powerpc/fpu/fesetmode.c    | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/sysdeps/powerpc/fpu/fedisblxcpt.c b/sysdeps/powerpc/fpu/fedisblxcpt.c
index 870cfc8..9f86c5f 100644
--- a/sysdeps/powerpc/fpu/fedisblxcpt.c
+++ b/sysdeps/powerpc/fpu/fedisblxcpt.c
@@ -41,7 +41,7 @@ fedisableexcept (int excepts)
   fe.l &= ~new;
 
   if (fe.l != curr.l)
-    fesetenv_mode (fe.fenv);
+    fesetenv_control (fe.fenv);
 
   __TEST_AND_ENTER_NON_STOP (-1ULL, fe.l);
 
diff --git a/sysdeps/powerpc/fpu/feenablxcpt.c b/sysdeps/powerpc/fpu/feenablxcpt.c
index 43f6482..141cfb4 100644
--- a/sysdeps/powerpc/fpu/feenablxcpt.c
+++ b/sysdeps/powerpc/fpu/feenablxcpt.c
@@ -41,7 +41,7 @@ feenableexcept (int excepts)
   fe.l |= new;
 
   if (fe.l != curr.l)
-    fesetenv_mode (fe.fenv);
+    fesetenv_control (fe.fenv);
 
   __TEST_AND_EXIT_NON_STOP (0ULL, fe.l);
 
diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
index 231d264..ed6cfeb 100644
--- a/sysdeps/powerpc/fpu/fenv_libc.h
+++ b/sysdeps/powerpc/fpu/fenv_libc.h
@@ -124,7 +124,7 @@ extern const fenv_t *__fe_mask_env (void) attribute_hidden;
 /* Set the last 2 nibbles of the FPSCR, which contain the
    exception enables and the rounding mode.
    'fegetenv_control' retrieves these bits by reading the FPSCR.  */
-#define fesetenv_mode(env) __builtin_mtfsf (0b00000011, (env));
+#define fesetenv_control(env) __builtin_mtfsf (0b00000011, (env));
 
 /* This very handy macro:
    - Sets the rounding mode to 'round to nearest';
diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
index ade0bfa..5eedc3b 100644
--- a/sysdeps/powerpc/fpu/fenv_private.h
+++ b/sysdeps/powerpc/fpu/fenv_private.h
@@ -61,7 +61,7 @@ __libc_femergeenv_ppc (const fenv_t *envp, unsigned long long old_mask,
       == (FPSCR_CONTROL_MASK|FPSCR_EXCEPTIONS_MASK) &&
       (old.l & FPSCR_EXCEPTIONS_MASK) == (new.l & FPSCR_EXCEPTIONS_MASK))
   {
-    fesetenv_mode (new.fenv);
+    fesetenv_control (new.fenv);
   }
   else
     /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
@@ -142,7 +142,7 @@ libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
   if (__glibc_unlikely (new.l != old.l))
     {
       __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
-      fesetenv_mode (new.fenv);
+      fesetenv_control (new.fenv);
       ctx->updated_status = true;
     }
   else
diff --git a/sysdeps/powerpc/fpu/fesetmode.c b/sysdeps/powerpc/fpu/fesetmode.c
index 1e9a874..a7ead9f 100644
--- a/sysdeps/powerpc/fpu/fesetmode.c
+++ b/sysdeps/powerpc/fpu/fesetmode.c
@@ -36,6 +36,6 @@ fesetmode (const femode_t *modep)
   __TEST_AND_EXIT_NON_STOP (old.l, new.l);
   __TEST_AND_ENTER_NON_STOP (old.l, new.l);
 
-  fesetenv_mode (new.fenv);
+  fesetenv_control (new.fenv);
   return 0;
 }
--
1.8.3.1

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 1/6] [powerpc] fenv_private.h clean up

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 1:46 PM, Paul A. Clarke wrote:
> From: "Paul A. Clarke" <[hidden email]>

> -static __always_inline void
> -libc_feholdexcept_ppc (fenv_t *envp)
> -{
> -  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI, 0LL);
> -}
> -
> -static __always_inline void
> -libc_feholdexcept_setround_ppc (fenv_t *envp, int r)
> -{
> -  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI & _FPU_MASK_RN, r);
> -}
> -
> -static __always_inline void
> -libc_fesetround_ppc (int r)
> -{
> -  __fesetround_inline (r);
> -}
> -
> -static __always_inline int
> -libc_fetestexcept_ppc (int e)
> -{
> -  fenv_union_t u;
> -  u.fenv = fegetenv_register ();
> -  return u.l & e;
> -}
> -
> -static __always_inline void
> -libc_feholdsetround_ppc (fenv_t *e, int r)
> -{
> -  __libc_feholdbits_ppc (e, _FPU_MASK_TRAPS_RN, r);
> -}

Are these unused? I still see the macro redirections calling out to them
after applying this patch.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 1/6] [powerpc] fenv_private.h clean up

Paul Clarke
On 9/20/19 10:36 AM, Paul E Murphy wrote:

> On 9/19/19 1:46 PM, Paul A. Clarke wrote:
>> -static __always_inline void
>> -libc_feholdexcept_ppc (fenv_t *envp)
>> -{
>> -  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI, 0LL);
>> -}
>> -
>> -static __always_inline void
>> -libc_feholdexcept_setround_ppc (fenv_t *envp, int r)
>> -{
>> -  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI & _FPU_MASK_RN, r);
>> -}
>> -
>> -static __always_inline void
>> -libc_fesetround_ppc (int r)
>> -{
>> -  __fesetround_inline (r);
>> -}
>> -
>> -static __always_inline int
>> -libc_fetestexcept_ppc (int e)
>> -{
>> -  fenv_union_t u;
>> -  u.fenv = fegetenv_register ();
>> -  return u.l & e;
>> -}
>> -
>> -static __always_inline void
>> -libc_feholdsetround_ppc (fenv_t *e, int r)
>> -{
>> -  __libc_feholdbits_ppc (e, _FPU_MASK_TRAPS_RN, r);
>> -}
>
> Are these unused? I still see the macro redirections calling out to them after applying this patch.

Fair question.  I'll admit I'm not an expert, especially in the corner cases, but here's what I see:

>> -libc_feholdexcept_ppc

This shows up in the ieee754 version of s_nearbyint{f}.c, but that is not used in favor of sysdeps/powerpc/fpu/s_nearbyint{f}.c which does not use it.

>> -libc_feholdexcept_setround_ppc

This is still used, but my patch doesn't remove it, in spite of how it looks above.  As noted in the Changelog, I moved the code from __libc_feholdbits_ppc into libc_feholdexcept_setround_ppc, which makes the patch look like it was removed.

>> -libc_fesetround_ppc
>> -libc_fetestexcept_ppc

These show up in the ieee754 version of s_fma.c, but there is a powerpc version which uses different code (__builtin_fma) instead.

>> -libc_feholdsetround_ppc

This _appears_ to be referenced in some uses of SET_RESTORE_ROUND_GENERIC, but actually gets mapped to libc_feholdsetround_ppc_ctx.

PC
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 1/6] [powerpc] fenv_private.h clean up

Paul E Murphy


On 9/22/19 12:11 AM, Paul Clarke wrote:

> On 9/20/19 10:36 AM, Paul E Murphy wrote:
>> On 9/19/19 1:46 PM, Paul A. Clarke wrote:
>>> -static __always_inline void
>>> -libc_feholdexcept_ppc (fenv_t *envp)
>>> -{
>>> -  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI, 0LL);
>>> -}
>>> -
>>> -static __always_inline void
>>> -libc_feholdexcept_setround_ppc (fenv_t *envp, int r)
>>> -{
>>> -  __libc_feholdbits_ppc (envp, _FPU_MASK_NOT_RN_NI & _FPU_MASK_RN, r);
>>> -}
>>> -
>>> -static __always_inline void
>>> -libc_fesetround_ppc (int r)
>>> -{
>>> -  __fesetround_inline (r);
>>> -}
>>> -
>>> -static __always_inline int
>>> -libc_fetestexcept_ppc (int e)
>>> -{
>>> -  fenv_union_t u;
>>> -  u.fenv = fegetenv_register ();
>>> -  return u.l & e;
>>> -}
>>> -
>>> -static __always_inline void
>>> -libc_feholdsetround_ppc (fenv_t *e, int r)
>>> -{
>>> -  __libc_feholdbits_ppc (e, _FPU_MASK_TRAPS_RN, r);
>>> -}
>>
>> Are these unused? I still see the macro redirections calling out to them after applying this patch.
>
> Fair question.  I'll admit I'm not an expert, especially in the corner cases, but here's what I see:
>
>>> -libc_feholdexcept_ppc
>
> This shows up in the ieee754 version of s_nearbyint{f}.c, but that is not used in favor of sysdeps/powerpc/fpu/s_nearbyint{f}.c which does not use it.
>
>>> -libc_feholdexcept_setround_ppc
>
> This is still used, but my patch doesn't remove it, in spite of how it looks above.  As noted in the Changelog, I moved the code from __libc_feholdbits_ppc into libc_feholdexcept_setround_ppc, which makes the patch look like it was removed.
>
>>> -libc_fesetround_ppc
>>> -libc_fetestexcept_ppc
>
> These show up in the ieee754 version of s_fma.c, but there is a powerpc version which uses different code (__builtin_fma) instead.
>
>>> -libc_feholdsetround_ppc
>
> This _appears_ to be referenced in some uses of SET_RESTORE_ROUND_GENERIC, but actually gets mapped to libc_feholdsetround_ppc_ctx.
>
> PC
>

OK. I agree with your conclusions.

Reviewed-By: Paul E Murphy <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 2/6] [powerpc] No need to enter "Ignore Exceptions Mode"

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 1:46 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> Since at least POWER8, there is no performance advantage to entering
> "Ignore Exceptions Mode", and doing so conditionally requires the
> conditional logic as well as a system call.  Make it a no-op.
>
> 2019-09-19  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_libc.h:
> (__ENTER_NON_STOP): New.
> (__EXIT_NON_STOP): New.
> (__TEST_AND_ENTER_NON_STOP): Use __ENTER_NON_STOP.
> (__TEST_AND_EXIT_NON_STOP): Use __EXIT_NON_STOP.
> ---
> v2: This is a new patch in the series.
>
>   sysdeps/powerpc/fpu/fenv_libc.h | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
>

OK.

Reviewed-By: Paul E Murphy <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 3/6] [powerpc] libc_feupdateenv_test: optimize FPSCR access

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 1:46 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> ROUND_TO_ODD and a couple of other places use libc_feupdateenv_test to
> restore the rounding mode and exception enables, preserve exception flags,
> and test whether given exception(s) were generated.
>
> If the exception flags haven't changed, then it is sufficient and a bit
> more efficient to just restore the rounding mode and enables, rather than
> writing the full Floating-Point Status and Control Register (FPSCR).
>
> Reviewed-by: Paul E. Murphy <[hidden email]>
>
> 2019-09-19  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_libc.h (FPSCR_EXCEPTIONS_MASK):  New.
> * sysdeps/powerpc/fpu/fenv_private.h (__libc_femergeenv_ppc):  Optimize
> to write FPSCR control only, if exceptions have not changed.
> ---
> v2:
> - No changes, but respun after removing _FPU macros at the suggestion
>    of Paul Murphy, to make it a bit easier to review.
>
>   sysdeps/powerpc/fpu/fenv_libc.h    |  4 ++++
>   sysdeps/powerpc/fpu/fenv_private.h | 16 ++++++++++++++--
>   2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
> index 549defa..53de1c8 100644
> --- a/sysdeps/powerpc/fpu/fenv_libc.h
> +++ b/sysdeps/powerpc/fpu/fenv_libc.h
> @@ -264,6 +264,10 @@ enum {
>     (FPSCR_VE_MASK|FPSCR_OE_MASK|FPSCR_UE_MASK|FPSCR_ZE_MASK|FPSCR_XE_MASK)
>   #define FPSCR_BASIC_EXCEPTIONS_MASK \
>     (FPSCR_VX_MASK|FPSCR_OX_MASK|FPSCR_UX_MASK|FPSCR_ZX_MASK|FPSCR_XX_MASK)
> +#define FPSCR_EXCEPTIONS_MASK (FPSCR_BASIC_EXCEPTIONS_MASK| \
> +  FPSCR_VXSNAN_MASK|FPSCR_VXISI_MASK|FPSCR_VXIDI_MASK|FPSCR_VXZDZ_MASK| \
> +  FPSCR_VXIMZ_MASK|FPSCR_VXVC_MASK|FPSCR_VXSOFT_MASK|FPSCR_VXSQRT_MASK| \
> +  FPSCR_VXCVI_MASK)
>   #define FPSCR_FPRF_MASK \
>     (FPSCR_FPRF_C_MASK|FPSCR_FPRF_FL_MASK|FPSCR_FPRF_FG_MASK| \
>      FPSCR_FPRF_FE_MASK|FPSCR_FPRF_FU_MASK)
> diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
> index 30cbf30..9496026 100644
> --- a/sysdeps/powerpc/fpu/fenv_private.h
> +++ b/sysdeps/powerpc/fpu/fenv_private.h
> @@ -52,8 +52,20 @@ __libc_femergeenv_ppc (const fenv_t *envp, unsigned long long old_mask,
>     __TEST_AND_EXIT_NON_STOP (old.l, new.l);
>     __TEST_AND_ENTER_NON_STOP (old.l, new.l);
>  
> -  /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
> -  fesetenv_register (new.fenv);
> +  /* If requesting to keep status, replace control, and merge exceptions,
> +     and exceptions haven't changed, we can just set new control instead
> +     of the whole FPSCR.  */
> +  if ((old_mask & (FPSCR_CONTROL_MASK|FPSCR_STATUS_MASK|FPSCR_EXCEPTIONS_MASK))
> +      == (FPSCR_STATUS_MASK|FPSCR_EXCEPTIONS_MASK) &&
> +      (new_mask & (FPSCR_CONTROL_MASK|FPSCR_STATUS_MASK|FPSCR_EXCEPTIONS_MASK))
> +      == (FPSCR_CONTROL_MASK|FPSCR_EXCEPTIONS_MASK) &&
> +      (old.l & FPSCR_EXCEPTIONS_MASK) == (new.l & FPSCR_EXCEPTIONS_MASK))
> +  {
> +    fesetenv_mode (new.fenv);
> +  }
> +  else
> +    /* Atomically enable and raise (if appropriate) exceptions set in `new'.  */
> +    fesetenv_register (new.fenv);
>  
>     return old.l;
>   }
>

OK.

Reviewed-By: Paul E Murphy <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 4/6] [powerpc] libc_feholdsetround_noex_ppc_ctx: optimize FPSCR write

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 1:46 PM, Paul A. Clarke wrote:
> From: "Paul A. Clarke" <[hidden email]>
>
> libc_feholdsetround_noex_ppc_ctx currently does, basically:
The listing reads awkwardly for me. I suggest a little cleanup. There is
no need to post a new patch against my suggestion.

> 1. Read FPSCR, save to context.
> 2. Create new FPSCR value: clear enables and set new rounding mode.
> 3. Write new value to FPSCR.
>
> Since other bits just pass through, there is no need to write them.
>
> Instead, write just the changed values (enables and rounding mode),
> which can be a bit more efficient.
>
> 2019-09-19  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_private.h
> (libc_feholdsetround_noex_ppc_ctx): Call fesetenv_mode instead
> of fesetenv_register.
> ---
> v2: No change.
>
>   sysdeps/powerpc/fpu/fenv_private.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
> index 9496026..ade0bfa 100644
> --- a/sysdeps/powerpc/fpu/fenv_private.h
> +++ b/sysdeps/powerpc/fpu/fenv_private.h
> @@ -142,7 +142,7 @@ libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
>     if (__glibc_unlikely (new.l != old.l))
>       {
>         __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
> -      fesetenv_register (new.fenv);
> +      fesetenv_mode (new.fenv);
>         ctx->updated_status = true;
>       }
>     else
>

OK, with suggested cleanup to commit message.

Reviewed-By: Paul E Murphy <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 5/6] [powerpc] __fesetround_inline optimizations

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 1:46 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> On POWER9, use more efficient means to update the 2-bit rounding mode
> via the 'mffscrn' instruction (instead of two 'mtfsb0/1' instructions
> or one 'mtfsfi' instruction that modifies 4 bits).
>
> Suggested-by: Paul E. Murphy  <[hidden email]>
>
> 2019-09-19  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline): Use
> 'mffscrn' instruction on POWER9.
> (__fesetround_inline_nocheck): Likewise.
> ---
> v2: No change.
>
>   sysdeps/powerpc/fpu/fenv_libc.h | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/sysdeps/powerpc/fpu/fenv_libc.h b/sysdeps/powerpc/fpu/fenv_libc.h
> index 53de1c8..3b91340 100644
> --- a/sysdeps/powerpc/fpu/fenv_libc.h
> +++ b/sysdeps/powerpc/fpu/fenv_libc.h
> @@ -156,7 +156,12 @@ typedef union
>   static inline int
>   __fesetround_inline (int round)
>   {
> -  if ((unsigned int) round < 2)
> +#ifdef _ARCH_PWR9
> +  __fe_mffscrn (round);
> +#else
> +  if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00))
> +    __fe_mffscrn (round);
> +  else if ((unsigned int) round < 2)
>       {
>          asm volatile ("mtfsb0 30");
>          if ((unsigned int) round == 0)
> @@ -172,7 +177,7 @@ __fesetround_inline (int round)
>          else
>            asm volatile ("mtfsb1 31");
>       }
> -
> +#endif
>     return 0;
>   }
>  
> @@ -181,7 +186,14 @@ __fesetround_inline (int round)
>   static inline void
>   __fesetround_inline_nocheck (const int round)
>   {
> -  asm volatile ("mtfsfi 7,%0" : : "i" (round));
> +#ifdef _ARCH_PWR9
> +  __fe_mffscrn (round);
> +#else
> +  if (__glibc_likely (GLRO(dl_hwcap2) & PPC_FEATURE2_ARCH_3_00))
> +    __fe_mffscrn (round);
> +  else
> +    asm volatile ("mtfsfi 7,%0" : : "i" (round));
> +#endif
>   }
>  
>   #define FPSCR_MASK(bit) (1 << (31 - (bit)))
>

OK.

Reviewed-By: Paul E Murphy <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 6/6] [powerpc] Rename fegetenv_status to fegetenv_control

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 1:46 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> fegetenv_status is used variously to retrieve the FPSCR exception enable
> bits, rounding mode bits, or both.  These are referred to as the control
> bits in the POWER ISA.  FPSCR status bits are also returned by the
> 'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
> uses of fegetenv_status.  Change the name to be reflective of its
> current and expected use.
>
> 2019-09-19  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status): Rename to
> fegetenv_control.
> * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Accommodate
> rename of fegetenv_status to fegetenv_control.
> * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Likewise.
> * sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.
> * sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Likewise.
> * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Likewise.
> * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Likewise.
> ---
> v2: This is a new patch in the series.

OK.

Reviewed-By: Paul E Murphy <[hidden email]>

> diff --git a/sysdeps/powerpc/fpu/fegetmode.c b/sysdeps/powerpc/fpu/fegetmode.c
> index 0e0a01c..65c5ebe 100644
> --- a/sysdeps/powerpc/fpu/fegetmode.c
> +++ b/sysdeps/powerpc/fpu/fegetmode.c
> @@ -21,6 +21,6 @@
>   int
>   fegetmode (femode_t *modep)
>   {
> -  *modep = fegetenv_status ();
> +  *modep = fegetenv_control ();
>     return 0;
>   }

Slightly off-topic, is fegetmode documented?
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 7/6] [powerpc] Rename fesetenv_mode to fesetenv_control

Paul E Murphy
In reply to this post by Paul Clarke


On 9/19/19 2:14 PM, Paul A. Clarke wrote:

> From: "Paul A. Clarke" <[hidden email]>
>
> fesetenv_mode is used variously to write the FPSCR exception enable
> bits and rounding mode bits.  These are referred to as the control
> bits in the POWER ISA.  Change the name to be reflective of its
> current and expected use, and match up well with fegetenv_control.
>
> 2019-09-19  Paul A. Clarke  <[hidden email]>
>
> * sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_mode): Rename to
> fesetenv_control.
> * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Accommodate
> rename of fesetenv_mode to fegetenv_control.
> * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Likewise.
> * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Likewise.
> * sysdeps/powerpc/fpu/fenv_private.h (__libc_femergeenv_ppc): Likewise.
> (libc_feholdsetround_noex_ppc_ctx): Likewise.
> ---
> This patch should've been tacked onto the series that I just posted
> "[PATCH v2 0/6] Various FPSCR-related changes", thus the "7/6".  :-?
> This is a new patch
OK.

Reviewed-By: Paul E Murphy <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 6/6] [powerpc] Rename fegetenv_status to fegetenv_control

Paul Clarke
In reply to this post by Paul E Murphy
On 9/23/19 11:21 AM, Paul E Murphy wrote:

> On 9/19/19 1:46 PM, Paul A. Clarke wrote:
>> fegetenv_status is used variously to retrieve the FPSCR exception enable
>> bits, rounding mode bits, or both.  These are referred to as the control
>> bits in the POWER ISA.  FPSCR status bits are also returned by the
>> 'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
>> uses of fegetenv_status.  Change the name to be reflective of its
>> current and expected use.
>>
>> 2019-09-19  Paul A. Clarke  <[hidden email]>
>>
>>     * sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status): Rename to
>>     fegetenv_control.
>>     * sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept): Accommodate
>>     rename of fegetenv_status to fegetenv_control.
>>     * sysdeps/powerpc/fpu/feenablxcpt.c (feenableexcept): Likewise.
>>     * sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.
>>     * sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Likewise.
>>     * sysdeps/powerpc/fpu/fesetenv.c (__fesetenv): Likewise.
>>     * sysdeps/powerpc/fpu/fesetmode.c (fesetmode): Likewise.
>> ---
>> v2: This is a new patch in the series.
>
> OK.

Thanks!

> Reviewed-By: Paul E Murphy <[hidden email]>
>
>> diff --git a/sysdeps/powerpc/fpu/fegetmode.c b/sysdeps/powerpc/fpu/fegetmode.c
>> index 0e0a01c..65c5ebe 100644
>> --- a/sysdeps/powerpc/fpu/fegetmode.c
>> +++ b/sysdeps/powerpc/fpu/fegetmode.c
>> @@ -21,6 +21,6 @@
>>   int
>>   fegetmode (femode_t *modep)
>>   {
>> -  *modep = fegetenv_status ();
>> +  *modep = fegetenv_control ();
>>     return 0;
>>   }
>
> Slightly off-topic, is fegetmode documented?

Again, not an expert here, but "git blame math/fegetmode.c" shows this:
--
commit ec94343f592df68ba1ba49bb2c558f7d2629387c
Author: Joseph Myers <[hidden email]>
Date:   Wed Sep 7 16:40:09 2016 +0000

    Add femode_t functions.
   
    TS 18661-1 defines a type femode_t to represent the set of dynamic
    floating-point control modes (such as the rounding mode and trap
    enablement modes), and functions fegetmode and fesetmode to manipulate
    those modes (without affecting other state such as the raised
    exception flags) and a corresponding macro FE_DFL_MODE.
[...]
--

PC
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 4/6] [powerpc] libc_feholdsetround_noex_ppc_ctx: optimize FPSCR write

Paul Clarke
In reply to this post by Paul E Murphy
Thanks for the review, Paul!  Question...

On 9/23/19 10:54 AM, Paul E Murphy wrote:
> On 9/19/19 1:46 PM, Paul A. Clarke wrote:
>> libc_feholdsetround_noex_ppc_ctx currently does, basically:
> The listing reads awkwardly for me. I suggest a little cleanup. There is no need to post a new patch against my suggestion.

Here you say "listing", and there is a list (1-2-3) immediately below this text, and below you say "commit message", .  Where would you like to see improvement?  Happy to change, but it's hard for me to see what you're seeing.  (And, of course, it's all perfectly clear to me since I wrote it!  ;-)

>> 1. Read FPSCR, save to context.
>> 2. Create new FPSCR value: clear enables and set new rounding mode.
>> 3. Write new value to FPSCR.
>>
>> Since other bits just pass through, there is no need to write them.
>>
>> Instead, write just the changed values (enables and rounding mode),
>> which can be a bit more efficient.
>>
>> 2019-09-19  Paul A. Clarke  <[hidden email]>
>>
>>     * sysdeps/powerpc/fpu/fenv_private.h
>>     (libc_feholdsetround_noex_ppc_ctx): Call fesetenv_mode instead
>>     of fesetenv_register.
>> ---
>> v2: No change.
>>
>>   sysdeps/powerpc/fpu/fenv_private.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sysdeps/powerpc/fpu/fenv_private.h b/sysdeps/powerpc/fpu/fenv_private.h
>> index 9496026..ade0bfa 100644
>> --- a/sysdeps/powerpc/fpu/fenv_private.h
>> +++ b/sysdeps/powerpc/fpu/fenv_private.h
>> @@ -142,7 +142,7 @@ libc_feholdsetround_noex_ppc_ctx (struct rm_ctx *ctx, int r)
>>     if (__glibc_unlikely (new.l != old.l))
>>       {
>>         __TEST_AND_ENTER_NON_STOP (old.l, 0ULL);
>> -      fesetenv_register (new.fenv);
>> +      fesetenv_mode (new.fenv);
>>         ctx->updated_status = true;
>>       }
>>     else
>>
>
> OK, with suggested cleanup to commit message.
>
> Reviewed-By: Paul E Murphy <[hidden email]>

PC
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2 4/6] [powerpc] libc_feholdsetround_noex_ppc_ctx: optimize FPSCR write

Paul E Murphy


On 9/23/19 12:54 PM, Paul Clarke wrote:

> Thanks for the review, Paul!  Question...
>
> On 9/23/19 10:54 AM, Paul E Murphy wrote:
>> On 9/19/19 1:46 PM, Paul A. Clarke wrote:
>>> libc_feholdsetround_noex_ppc_ctx currently does, basically:
>> The listing reads awkwardly for me. I suggest a little cleanup. There is no need to post a new patch against my suggestion.
>
> Here you say "listing", and there is a list (1-2-3) immediately below this text, and below you say "commit message", .  Where would you like to see improvement?  Happy to change, but it's hard for me to see what you're seeing.  (And, of course, it's all perfectly clear to me since I wrote it!  ;-)
>
>>> 1. Read FPSCR, save to context.
>>> 2. Create new FPSCR value: clear enables and set new rounding mode.
>>> 3. Write new value to FPSCR.
>>>
I think the change of tense and verb leading into the list threw me off.
Feel free to commit with or without changes as the message does convey
the intent of the patch.