[PATCH 0/5] sin/cos/sincos cleanups

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PATCH 0/5] sin/cos/sincos cleanups

Siddhesh Poyarekar-9
Hi,

Here is another set of patches to clean up the sin/cos code further.  The focus
of these patches was to consolidate and simplify code in an attempt to reduce
duplicates and introduce some consistency in computation.  For example, there
are places that use fabs(x) and others that use if(x > 0) {... x ...} else {...
 -x ...}.  As a final patch, I inlined all of the support functions.

The cumulative effect of these patches is a 16% improvement in sincos in the
min case and 3% in the mean case in the microbenchmark.  sin regresses by 4% in
the min case and is largely unaffected by the mean case.  cos is faster by 3%
in the min case and unchanged in the mean case.  In addition to the
microbenchmark, I also tested SPEC2006, which gives about a 2% improvement on
aarch64 and similar (about 1.5%) on x86_64.

Tested on x86_64 and aarch64 to verify that there are no regressions.  There is
further scope for consolidation in these functions and I intend to continue
working on them on top of these changes.  While the primary effect will be
readability of the code, I also expect the changes to have a positive impact on
performance, especially for sincos.

Siddhesh

Siddhesh Poyarekar (5):
  Consolidate reduce_and_compute code
  Use fabs(x) instead of branching on signedness of input to sin and cos
  Consolidate input partitioning into do_cos and do_sin
  Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469
  Inline all support functions for sin and cos

 sysdeps/ieee754/dbl-64/s_sin.c | 420 ++++++++++++++++-------------------------
 1 file changed, 158 insertions(+), 262 deletions(-)

--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 1/5] Consolidate reduce_and_compute code

Siddhesh Poyarekar-9
This patch reshuffles the reduce_and_compute code so that the
structure matches other code structures of the same type elsewhere in
s_sin.c and s_sincos.c.  This is the beginning of an attempt to
consolidate and reduce code duplication in functions in s_sin.c to
make it easier to read and possibly also easier for the compiler to
optimize.

        * sysdeps/ieee754/dbl-64/s_sin.c (reduce_and_compute):
        Consolidate switch cases 0 and 2.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 7c9a079..e1ee7a9 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -249,23 +249,20 @@ reduce_and_compute (double x, unsigned int k)
   k = (n + k) % 4;
   switch (k)
     {
-      case 0:
- if (a * a < 0.01588)
-  retval = bsloww (a, da, x, n);
- else
-  retval = bsloww1 (a, da, x, n);
- break;
-      case 2:
- if (a * a < 0.01588)
-  retval = bsloww (-a, -da, x, n);
- else
-  retval = bsloww1 (-a, -da, x, n);
- break;
+    case 2:
+      a = -a;
+      da = -da;
+    case 0:
+      if (a * a < 0.01588)
+ retval = bsloww (a, da, x, n);
+      else
+ retval = bsloww1 (a, da, x, n);
+      break;
 
-      case 1:
-      case 3:
- retval = bsloww2 (a, da, x, n);
- break;
+    case 1:
+    case 3:
+      retval = bsloww2 (a, da, x, n);
+      break;
     }
   return retval;
 }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos

Siddhesh Poyarekar-9
In reply to this post by Siddhesh Poyarekar-9
The sin and cos code is inconsistent about its use of fabs to get the
absolute value of X where in some places it conditionalizes the code
while in others it uses fabs.  fabs seems to be a better candidate in
most cases because it avoids a branch.  Similarly there is an attempt
to make it easier for the compiler to emit conditional assignment
instructions (like fcsel on aarch64) where it can, by isolating
conditional assignment constructs from the rest of the expression.

A further benefit of this change is to identify common constructs
across functions and consolidate them in future patches.

        * sysdeps/ieee754/dbl-64/s_sin.c (do_cos_slow): Use ternary
        instead of if/else.
        (do_sin_slow): Likewise.
        (do_sincos_1): Use fabs instead of if/else.
        (do_sincos_2): Likewise.
        (__sin): Likewise.
        (__cos): Likewise.
        (slow2): Likewise.
        (sloww): Likewise.
        (sloww1): Likewise.  Drop argument M.
        (sloww2): Use fabs instead of if/else.
        (bsloww): Likewise.
        (bsloww1): Likewise.
        (bsloww2): Likewise.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 233 +++++++++++++++--------------------------
 1 file changed, 85 insertions(+), 148 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index e1ee7a9..7f6cd09 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -133,7 +133,7 @@ static double slow (double x);
 static double slow1 (double x);
 static double slow2 (double x);
 static double sloww (double x, double dx, double orig, int n);
-static double sloww1 (double x, double dx, double orig, int m, int n);
+static double sloww1 (double x, double dx, double orig, int n);
 static double sloww2 (double x, double dx, double orig, int n);
 static double bsloww (double x, double dx, double orig, int n);
 static double bsloww1 (double x, double dx, double orig, int n);
@@ -181,10 +181,7 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
   cor = cor + ((cs - y) - e1 * x1);
   res = y + cor;
   cor = (y - res) + cor;
-  if (cor > 0)
-    cor = 1.0005 * cor + eps;
-  else
-    cor = 1.0005 * cor - eps;
+  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
   *corp = cor;
   return res;
 }
@@ -229,10 +226,7 @@ do_sin_slow (mynumber u, double x, double dx, double eps, double *corp)
   cor = cor + ((sn - y) + c1 * x1);
   res = y + cor;
   cor = (y - res) + cor;
-  if (cor > 0)
-    cor = 1.0005 * cor + eps;
-  else
-    cor = 1.0005 * cor - eps;
+  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
   *corp = cor;
   return res;
 }
@@ -296,7 +290,6 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 {
   double xx, retval, res, cor, y;
   mynumber u;
-  int m;
   double eps = fabs (x) * 1.2e-30;
 
   int k1 = (n + k) & 3;
@@ -316,37 +309,28 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
  }
       else
  {
-  if (a > 0)
-    m = 1;
-  else
-    {
-      m = 0;
-      a = -a;
-      da = -da;
-    }
-  u.x = big + a;
-  y = a - (u.x - big);
-  res = do_sin (u, y, da, &cor);
+  double db = (a > 0 ? da : -da);
+  u.x = big + fabs (a);
+  y = fabs (a) - (u.x - big);
+  res = do_sin (u, y, db, &cor);
   cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
-  retval = ((res == res + cor) ? ((m) ? res : -res)
-    : sloww1 (a, da, x, m, k));
+  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
+    : sloww1 (a, da, x, k));
  }
       break;
 
     case 1:
     case 3:
-      if (a < 0)
  {
-  a = -a;
-  da = -da;
+  double db = (a > 0 ? da : -da);
+  u.x = big + fabs (a);
+  y = fabs (a) - (u.x - big) + db;
+  res = do_cos (u, y, &cor);
+  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+  retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
+    : sloww2 (a, da, x, n));
+  break;
  }
-      u.x = big + a;
-      y = a - (u.x - big) + da;
-      res = do_cos (u, y, &cor);
-      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-      retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
- : sloww2 (a, da, x, n));
-      break;
     }
 
   return retval;
@@ -408,43 +392,28 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
  }
       else
  {
-  double t, db, y;
-  int m;
-  if (a > 0)
-    {
-      m = 1;
-      t = a;
-      db = da;
-    }
-  else
-    {
-      m = 0;
-      t = -a;
-      db = -da;
-    }
-  u.x = big + t;
-  y = t - (u.x - big);
+  double db = (a > 0 ? da : -da);
+  u.x = big + fabs (a);
+  double y = fabs (a) - (u.x - big);
   res = do_sin (u, y, db, &cor);
   cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
-  retval = ((res == res + cor) ? ((m) ? res : -res)
+  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
     : bsloww1 (a, da, x, n));
  }
       break;
 
     case 1:
     case 3:
-      if (a < 0)
  {
-  a = -a;
-  da = -da;
+  double db = (a > 0 ? da : -da);
+  u.x = big + fabs (a);
+  double y = fabs (a) - (u.x - big) + db;
+  res = do_cos (u, y, &cor);
+  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+  retval = ((res == res + cor) ? ((n & 2) ? -res : res)
+    : bsloww2 (a, da, x, n));
+  break;
  }
-      u.x = big + a;
-      double y = a - (u.x - big) + da;
-      res = do_cos (u, y, &cor);
-      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
- : bsloww2 (a, da, x, n));
-      break;
     }
 
   return retval;
@@ -492,8 +461,10 @@ __sin (double x)
 /*---------------------------- 0.25<|x|< 0.855469---------------------- */
   else if (k < 0x3feb6000)
     {
-      u.x = (m > 0) ? big + x : big - x;
-      y = (m > 0) ? x - (u.x - big) : x + (u.x - big);
+      u.x = big + fabs (x);
+      y = fabs (x) - (u.x - big);
+      y = (x > 0 ? y : -y);
+
       xx = y * y;
       s = y + y * xx * (sn3 + xx * sn5);
       c = xx * (cs2 + xx * (cs4 + xx * cs6));
@@ -513,17 +484,11 @@ __sin (double x)
   else if (k < 0x400368fd)
     {
 
-      y = (m > 0) ? hp0 - x : hp0 + x;
-      if (y >= 0)
- {
-  u.x = big + y;
-  y = (y - (u.x - big)) + hp1;
- }
-      else
- {
-  u.x = big - y;
-  y = (-hp1) - (y + (u.x - big));
- }
+      t = hp0 - fabs (x);
+      u.x = big + fabs (t);
+      y = fabs (t) - (u.x - big);
+      y = ((t >= 0) ? hp1 : -hp1) + y;
+
       res = do_cos (u, y, &cor);
       retval = (res == res + 1.020 * cor) ? ((m > 0) ? res : -res) : slow2 (x);
     } /*   else  if (k < 0x400368fd)    */
@@ -617,22 +582,13 @@ __cos (double x)
  }
       else
  {
-  if (a > 0)
-    {
-      m = 1;
-    }
-  else
-    {
-      m = 0;
-      a = -a;
-      da = -da;
-    }
-  u.x = big + a;
-  y = a - (u.x - big);
-  res = do_sin (u, y, da, &cor);
+  double db = (a > 0 ? da : -da);
+  u.x = big + fabs (a);
+  y = fabs (a) - (u.x - big);
+  res = do_sin (u, y, db, &cor);
   cor = (cor > 0) ? 1.035 * cor + 1.0e-31 : 1.035 * cor - 1.0e-31;
-  retval = ((res == res + cor) ? ((m) ? res : -res)
-    : sloww1 (a, da, x, m, 1));
+  retval = ((res == res + cor) ? ((a > 0) ? res : -res)
+    : sloww1 (a, da, x, 1));
  }
 
     } /*   else  if (k < 0x400368fd)    */
@@ -726,20 +682,11 @@ slow2 (double x)
   mynumber u;
   double w[2], y, y1, y2, cor, res, del;
 
-  y = fabs (x);
-  y = hp0 - y;
-  if (y >= 0)
-    {
-      u.x = big + y;
-      y = y - (u.x - big);
-      del = hp1;
-    }
-  else
-    {
-      u.x = big - y;
-      y = -(y + (u.x - big));
-      del = -hp1;
-    }
+  double t = hp0 - fabs (x);
+  u.x = big + fabs (t);
+  y = fabs (t) - (u.x - big);
+  del = (t >= 0) ? hp1 : -hp1;
+
   res = do_cos_slow (u, y, del, 0, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
@@ -771,19 +718,18 @@ sloww (double x, double dx, double orig, int k)
   int4 n;
   res = TAYLOR_SLOW (x, dx, cor);
 
-  if (cor > 0)
-    cor = 1.0005 * cor + fabs (orig) * 3.1e-30;
-  else
-    cor = 1.0005 * cor - fabs (orig) * 3.1e-30;
+  double eps = fabs (orig) * 3.1e-30;
+
+  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
 
   if (res == res + cor)
     return res;
 
-  (x > 0) ? __dubsin (x, dx, w) : __dubsin (-x, -dx, w);
-  if (w[1] > 0)
-    cor = 1.000000001 * w[1] + fabs (orig) * 1.1e-30;
-  else
-    cor = 1.000000001 * w[1] - fabs (orig) * 1.1e-30;
+  a = fabs (x);
+  da = (x > 0) ? dx : -dx;
+  __dubsin (a, da, w);
+  eps = fabs (orig) * 1.1e-30;
+  cor = 1.000000001 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
     return (x > 0) ? w[0] : -w[0];
@@ -805,11 +751,11 @@ sloww (double x, double dx, double orig, int k)
       a = -a;
       da = -da;
     }
-  (a > 0) ? __dubsin (a, da, w) : __dubsin (-a, -da, w);
-  if (w[1] > 0)
-    cor = 1.000000001 * w[1] + fabs (orig) * 1.1e-40;
-  else
-    cor = 1.000000001 * w[1] - fabs (orig) * 1.1e-40;
+  x = fabs (a);
+  dx = (a > 0) ? da : -da;
+  __dubsin (x, dx, w);
+  eps = fabs (orig) * 1.1e-40;
+  cor = 1.000000001 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
     return (a > 0) ? w[0] : -w[0];
@@ -826,27 +772,26 @@ sloww (double x, double dx, double orig, int k)
 
 static double
 SECTION
-sloww1 (double x, double dx, double orig, int m, int k)
+sloww1 (double x, double dx, double orig, int k)
 {
   mynumber u;
   double w[2], y, cor, res;
 
-  u.x = big + x;
-  y = x - (u.x - big);
+  u.x = big + fabs (x);
+  y = fabs (x) - (u.x - big);
+  dx = (x > 0 ? dx : -dx);
   res = do_sin_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
-    return (m > 0) ? res : -res;
+    return (x > 0) ? res : -res;
 
-  __dubsin (x, dx, w);
+  __dubsin (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-30 * fabs (orig);
-  else
-    cor = 1.000000005 * w[1] - 1.1e-30 * fabs (orig);
+  double eps = 1.1e-30 * fabs (orig);
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
-    return (m > 0) ? w[0] : -w[0];
+    return (x > 0) ? w[0] : -w[0];
 
   return (k == 1) ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true);
 }
@@ -865,19 +810,18 @@ sloww2 (double x, double dx, double orig, int n)
   mynumber u;
   double w[2], y, cor, res;
 
-  u.x = big + x;
-  y = x - (u.x - big);
+  u.x = big + fabs (x);
+  y = fabs (x) - (u.x - big);
+  dx = (x > 0 ? dx : -dx);
   res = do_cos_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
     return (n & 2) ? -res : res;
 
-  __docos (x, dx, w);
+  __docos (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-30 * fabs (orig);
-  else
-    cor = 1.000000005 * w[1] - 1.1e-30 * fabs (orig);
+  double eps = 1.1e-30 * fabs (orig);
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? eps : -eps);
 
   if (w[0] == w[0] + cor)
     return (n & 2) ? -w[0] : w[0];
@@ -897,18 +841,17 @@ static double
 SECTION
 bsloww (double x, double dx, double orig, int n)
 {
-  double res, cor, w[2];
+  double res, cor, w[2], a, da;
 
   res = TAYLOR_SLOW (x, dx, cor);
-  cor = (cor > 0) ? 1.0005 * cor + 1.1e-24 : 1.0005 * cor - 1.1e-24;
+  cor = 1.0005 * cor + ((cor > 0) ? 1.1e-24 : -1.1e-24);
   if (res == res + cor)
     return res;
 
-  (x > 0) ? __dubsin (x, dx, w) : __dubsin (-x, -dx, w);
-  if (w[1] > 0)
-    cor = 1.000000001 * w[1] + 1.1e-24;
-  else
-    cor = 1.000000001 * w[1] - 1.1e-24;
+  a = fabs (x);
+  da = (x > 0) ? dx : -dx;
+  __dubsin (a, da, w);
+  cor = 1.000000001 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
 
   if (w[0] == w[0] + cor)
     return (x > 0) ? w[0] : -w[0];
@@ -940,10 +883,7 @@ bsloww1 (double x, double dx, double orig, int n)
 
   __dubsin (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-24;
-  else
-    cor = 1.000000005 * w[1] - 1.1e-24;
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
 
   if (w[0] == w[0] + cor)
     return (x > 0) ? w[0] : -w[0];
@@ -975,10 +915,7 @@ bsloww2 (double x, double dx, double orig, int n)
 
   __docos (fabs (x), dx, w);
 
-  if (w[1] > 0)
-    cor = 1.000000005 * w[1] + 1.1e-24;
-  else
-    cor = 1.000000005 * w[1] - 1.1e-24;
+  cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
 
   if (w[0] == w[0] + cor)
     return (n & 2) ? -w[0] : w[0];
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 3/5] Consolidate input partitioning into do_cos and do_sin

Siddhesh Poyarekar-9
In reply to this post by Siddhesh Poyarekar-9
All calls to do_cos are preceded by code that partitions x into a
larger double that gives an offset into the sincos table and a smaller
double that is used in a polynomial computation.  Consolidate all of
them into do_cos and do_sin to reduce code duplication.

        * sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Accept X and DX as input
        arguments.  Consolidate input partitioning from callers here.
        (do_cos_slow): Likewise.
        (do_sin): Likewise.
        (do_sin_slow): Likewise.
        (do_sincos_1): Remove the no longer necessary input partitioning.
        (do_sincos_2): Likewise.
        (__sin): Likewise.
        (__cos): Likewise.
        (slow1): Likewise.
        (slow2): Likewise.
        (sloww1): Likewise.
        (sloww2): Likewise.
        (bsloww1): Likewise.
        (bsloww2): Likewise.
        (cslow2): Likewise.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 191 ++++++++++++++++++-----------------------
 1 file changed, 82 insertions(+), 109 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 7f6cd09..e03c75a 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -141,14 +141,21 @@ static double bsloww2 (double x, double dx, double orig, int n);
 int __branred (double x, double *a, double *aa);
 static double cslow2 (double x);
 
-/* Given a number partitioned into U and X such that U is an index into the
-   sin/cos table, this macro computes the cosine of the number by combining
-   the sin and cos of X (as computed by a variation of the Taylor series) with
-   the values looked up from the sin/cos table to get the result in RES and a
-   correction value in COR.  */
+/* Given a number partitioned into X and DX, this function computes the cosine
+   of the number by combining the sin and cos of X (as computed by a variation
+   of the Taylor series) with the values looked up from the sin/cos table to
+   get the result in RES and a correction value in COR.  */
 static double
-do_cos (mynumber u, double x, double *corp)
+do_cos (double x, double dx, double *corp)
 {
+  mynumber u;
+
+  if (x < 0)
+    dx = -dx;
+
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big) + dx;
+
   double xx, s, sn, ssn, c, cs, ccs, res, cor;
   xx = x * x;
   s = x + x * xx * (sn3 + xx * sn5);
@@ -161,11 +168,19 @@ do_cos (mynumber u, double x, double *corp)
   return res;
 }
 
-/* A more precise variant of DO_COS where the number is partitioned into U, X
-   and DX.  EPS is the adjustment to the correction COR.  */
+/* A more precise variant of DO_COS.  EPS is the adjustment to the correction
+   COR.  */
 static double
-do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
+do_cos_slow (double x, double dx, double eps, double *corp)
 {
+  mynumber u;
+
+  if (x <= 0)
+    dx = -dx;
+
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big);
+
   double xx, y, x1, x2, e1, e2, res, cor;
   double s, sn, ssn, c, cs, ccs;
   xx = x * x;
@@ -186,14 +201,20 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
   return res;
 }
 
-/* Given a number partitioned into U and X and DX such that U is an index into
-   the sin/cos table, this macro computes the sine of the number by combining
-   the sin and cos of X (as computed by a variation of the Taylor series) with
-   the values looked up from the sin/cos table to get the result in RES and a
-   correction value in COR.  */
+/* Given a number partitioned into X and DX, this function computes the sine of
+   the number by combining the sin and cos of X (as computed by a variation of
+   the Taylor series) with the values looked up from the sin/cos table to get
+   the result in RES and a correction value in COR.  */
 static double
-do_sin (mynumber u, double x, double dx, double *corp)
+do_sin (double x, double dx, double *corp)
 {
+  mynumber u;
+
+  if (x <= 0)
+    dx = -dx;
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big);
+
   double xx, s, sn, ssn, c, cs, ccs, cor, res;
   xx = x * x;
   s = x + (dx + x * xx * (sn3 + xx * sn5));
@@ -206,11 +227,18 @@ do_sin (mynumber u, double x, double dx, double *corp)
   return res;
 }
 
-/* A more precise variant of res = do_sin where the number is partitioned into U, X
-   and DX.  EPS is the adjustment to the correction COR.  */
+/* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
+   COR.  */
 static double
-do_sin_slow (mynumber u, double x, double dx, double eps, double *corp)
+do_sin_slow (double x, double dx, double eps, double *corp)
 {
+  mynumber u;
+
+  if (x <= 0)
+    dx = -dx;
+  u.x = big + fabs (x);
+  x = fabs (x) - (u.x - big);
+
   double xx, y, x1, x2, c1, c2, res, cor;
   double s, sn, ssn, c, cs, ccs;
   xx = x * x;
@@ -288,8 +316,7 @@ static double
 __always_inline
 do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 {
-  double xx, retval, res, cor, y;
-  mynumber u;
+  double xx, retval, res, cor;
   double eps = fabs (x) * 1.2e-30;
 
   int k1 = (n + k) & 3;
@@ -309,10 +336,7 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
  }
       else
  {
-  double db = (a > 0 ? da : -da);
-  u.x = big + fabs (a);
-  y = fabs (a) - (u.x - big);
-  res = do_sin (u, y, db, &cor);
+  res = do_sin (a, da, &cor);
   cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
   retval = ((res == res + cor) ? ((a > 0) ? res : -res)
     : sloww1 (a, da, x, k));
@@ -321,16 +345,11 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
 
     case 1:
     case 3:
- {
-  double db = (a > 0 ? da : -da);
-  u.x = big + fabs (a);
-  y = fabs (a) - (u.x - big) + db;
-  res = do_cos (u, y, &cor);
-  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-  retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
-    : sloww2 (a, da, x, n));
-  break;
- }
+      res = do_cos (a, da, &cor);
+      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+      retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
+ : sloww2 (a, da, x, n));
+      break;
     }
 
   return retval;
@@ -369,7 +388,6 @@ __always_inline
 do_sincos_2 (double a, double da, double x, int4 n, int4 k)
 {
   double res, retval, cor, xx;
-  mynumber u;
 
   double eps = 1.0e-24;
 
@@ -392,10 +410,7 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
  }
       else
  {
-  double db = (a > 0 ? da : -da);
-  u.x = big + fabs (a);
-  double y = fabs (a) - (u.x - big);
-  res = do_sin (u, y, db, &cor);
+  res = do_sin (a, da, &cor);
   cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
   retval = ((res == res + cor) ? ((a > 0) ? res : -res)
     : bsloww1 (a, da, x, n));
@@ -404,16 +419,11 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
 
     case 1:
     case 3:
- {
-  double db = (a > 0 ? da : -da);
-  u.x = big + fabs (a);
-  double y = fabs (a) - (u.x - big) + db;
-  res = do_cos (u, y, &cor);
-  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
-  retval = ((res == res + cor) ? ((n & 2) ? -res : res)
-    : bsloww2 (a, da, x, n));
-  break;
- }
+      res = do_cos (a, da, &cor);
+      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
+      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
+ : bsloww2 (a, da, x, n));
+      break;
     }
 
   return retval;
@@ -485,11 +495,7 @@ __sin (double x)
     {
 
       t = hp0 - fabs (x);
-      u.x = big + fabs (t);
-      y = fabs (t) - (u.x - big);
-      y = ((t >= 0) ? hp1 : -hp1) + y;
-
-      res = do_cos (u, y, &cor);
+      res = do_cos (t, hp1, &cor);
       retval = (res == res + 1.020 * cor) ? ((m > 0) ? res : -res) : slow2 (x);
     } /*   else  if (k < 0x400368fd)    */
 
@@ -561,10 +567,7 @@ __cos (double x)
 
   else if (k < 0x3feb6000)
     { /* 2^-27 < |x| < 0.855469 */
-      y = fabs (x);
-      u.x = big + y;
-      y = y - (u.x - big);
-      res = do_cos (u, y, &cor);
+      res = do_cos (x, 0, &cor);
       retval = (res == res + 1.020 * cor) ? res : cslow2 (x);
     } /*   else  if (k < 0x3feb6000)    */
 
@@ -582,10 +585,7 @@ __cos (double x)
  }
       else
  {
-  double db = (a > 0 ? da : -da);
-  u.x = big + fabs (a);
-  y = fabs (a) - (u.x - big);
-  res = do_sin (u, y, db, &cor);
+  res = do_sin (a, da, &cor);
   cor = (cor > 0) ? 1.035 * cor + 1.0e-31 : 1.035 * cor - 1.0e-31;
   retval = ((res == res + cor) ? ((a > 0) ? res : -res)
     : sloww1 (a, da, x, 1));
@@ -655,12 +655,9 @@ static double
 SECTION
 slow1 (double x)
 {
-  mynumber u;
-  double w[2], y, cor, res;
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  res = do_sin_slow (u, y, 0, 0, &cor);
+  double w[2], cor, res;
+
+  res = do_sin_slow (x, 0, 0, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
@@ -679,15 +676,10 @@ static double
 SECTION
 slow2 (double x)
 {
-  mynumber u;
-  double w[2], y, y1, y2, cor, res, del;
+  double w[2], y, y1, y2, cor, res;
 
   double t = hp0 - fabs (x);
-  u.x = big + fabs (t);
-  y = fabs (t) - (u.x - big);
-  del = (t >= 0) ? hp1 : -hp1;
-
-  res = do_cos_slow (u, y, del, 0, &cor);
+  res = do_cos_slow (t, hp1, 0, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
@@ -774,17 +766,14 @@ static double
 SECTION
 sloww1 (double x, double dx, double orig, int k)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  u.x = big + fabs (x);
-  y = fabs (x) - (u.x - big);
-  dx = (x > 0 ? dx : -dx);
-  res = do_sin_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
+  res = do_sin_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
+  dx = (x > 0 ? dx : -dx);
   __dubsin (fabs (x), dx, w);
 
   double eps = 1.1e-30 * fabs (orig);
@@ -807,17 +796,14 @@ static double
 SECTION
 sloww2 (double x, double dx, double orig, int n)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  u.x = big + fabs (x);
-  y = fabs (x) - (u.x - big);
-  dx = (x > 0 ? dx : -dx);
-  res = do_cos_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
+  res = do_cos_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
 
   if (res == res + cor)
     return (n & 2) ? -res : res;
 
+  dx = x > 0 ? dx : -dx;
   __docos (fabs (x), dx, w);
 
   double eps = 1.1e-30 * fabs (orig);
@@ -870,17 +856,13 @@ static double
 SECTION
 bsloww1 (double x, double dx, double orig, int n)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  dx = (x > 0) ? dx : -dx;
-  res = do_sin_slow (u, y, dx, 1.1e-24, &cor);
+  res = do_sin_slow (x, dx, 1.1e-24, &cor);
   if (res == res + cor)
     return (x > 0) ? res : -res;
 
+  dx = (x > 0) ? dx : -dx;
   __dubsin (fabs (x), dx, w);
 
   cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
@@ -902,17 +884,13 @@ static double
 SECTION
 bsloww2 (double x, double dx, double orig, int n)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  dx = (x > 0) ? dx : -dx;
-  res = do_cos_slow (u, y, dx, 1.1e-24, &cor);
+  res = do_cos_slow (x, dx, 1.1e-24, &cor);
   if (res == res + cor)
     return (n & 2) ? -res : res;
 
+  dx = (x > 0) ? dx : -dx;
   __docos (fabs (x), dx, w);
 
   cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
@@ -932,18 +910,13 @@ static double
 SECTION
 cslow2 (double x)
 {
-  mynumber u;
-  double w[2], y, cor, res;
+  double w[2], cor, res;
 
-  y = fabs (x);
-  u.x = big + y;
-  y = y - (u.x - big);
-  res = do_cos_slow (u, y, 0, 0, &cor);
+  res = do_cos_slow (x, 0, 0, &cor);
   if (res == res + cor)
     return res;
 
-  y = fabs (x);
-  __docos (y, 0, w);
+  __docos (fabs (x), 0, w);
   if (w[0] == w[0] + 1.000000005 * w[1])
     return w[0];
 
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469

Siddhesh Poyarekar-9
In reply to this post by Siddhesh Poyarekar-9
The only code looks slightly different from DO_SIN but on closer
examination, should give exactly the same result.  Drop it in favour
of the DO_SIN function call.

        * sysdeps/ieee754/dbl-64/s_sin.c (__sin): Use DO_SIN.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index e03c75a..82f9345 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -441,7 +441,7 @@ SECTION
 #endif
 __sin (double x)
 {
-  double xx, res, t, cor, y, s, c, sn, ssn, cs, ccs;
+  double xx, res, t, cor;
   mynumber u;
   int4 k, m;
   double retval = 0;
@@ -471,23 +471,8 @@ __sin (double x)
 /*---------------------------- 0.25<|x|< 0.855469---------------------- */
   else if (k < 0x3feb6000)
     {
-      u.x = big + fabs (x);
-      y = fabs (x) - (u.x - big);
-      y = (x > 0 ? y : -y);
-
-      xx = y * y;
-      s = y + y * xx * (sn3 + xx * sn5);
-      c = xx * (cs2 + xx * (cs4 + xx * cs6));
-      SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
-      if (m <= 0)
-        {
-          sn = -sn;
-  ssn = -ssn;
- }
-      cor = (ssn + s * ccs - sn * c) + cs * s;
-      res = sn + cor;
-      cor = (sn - res) + cor;
-      retval = (res == res + 1.096 * cor) ? res : slow1 (x);
+      res = do_sin (x, 0, &cor);
+      retval = (res == res + 1.096 * cor) ? (m > 0 ? res : -res) : slow1 (x);
     } /*   else  if (k < 0x3feb6000)    */
 
 /*----------------------- 0.855469  <|x|<2.426265  ----------------------*/
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

[PATCH 5/5] Inline all support functions for sin and cos

Siddhesh Poyarekar-9
In reply to this post by Siddhesh Poyarekar-9
The support functions for sin and cos have a lot of identical
functionality, so inlining them gives a pretty decent jump in
functionality: ~19% in the sincos function.  On SPEC2006 this
translates to about 2.1% in the tonto test.

        * sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Mark as inline.
        (do_cos_slow): Likewise.
        (do_sin): Likewise.
        (do_sin_slow): Likewise.
        (slow): Likewise.
        (slow1): Likewise.
        (slow2): Likewise.
        (sloww): Likewise.
        (sloww1): Likewise.
        (sloww2): Likewise.
        (bsloww): Likewise.
        (bsloww1): Likewise.
        (bsloww2): Likewise.
        (cslow2): Likewise.
---
 sysdeps/ieee754/dbl-64/s_sin.c | 52 +++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
index 82f9345..c20ef4d 100644
--- a/sysdeps/ieee754/dbl-64/s_sin.c
+++ b/sysdeps/ieee754/dbl-64/s_sin.c
@@ -145,7 +145,8 @@ static double cslow2 (double x);
    of the number by combining the sin and cos of X (as computed by a variation
    of the Taylor series) with the values looked up from the sin/cos table to
    get the result in RES and a correction value in COR.  */
-static double
+static inline double
+__always_inline
 do_cos (double x, double dx, double *corp)
 {
   mynumber u;
@@ -170,7 +171,8 @@ do_cos (double x, double dx, double *corp)
 
 /* A more precise variant of DO_COS.  EPS is the adjustment to the correction
    COR.  */
-static double
+static inline double
+__always_inline
 do_cos_slow (double x, double dx, double eps, double *corp)
 {
   mynumber u;
@@ -205,7 +207,8 @@ do_cos_slow (double x, double dx, double eps, double *corp)
    the number by combining the sin and cos of X (as computed by a variation of
    the Taylor series) with the values looked up from the sin/cos table to get
    the result in RES and a correction value in COR.  */
-static double
+static inline double
+__always_inline
 do_sin (double x, double dx, double *corp)
 {
   mynumber u;
@@ -229,7 +232,8 @@ do_sin (double x, double dx, double *corp)
 
 /* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
    COR.  */
-static double
+static inline double
+__always_inline
 do_sin_slow (double x, double dx, double eps, double *corp)
 {
   mynumber u;
@@ -615,8 +619,8 @@ __cos (double x)
 /* precision  and if still doesn't accurate enough by mpsin   or dubsin */
 /************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 slow (double x)
 {
   double res, cor, w[2];
@@ -636,8 +640,8 @@ slow (double x)
 /* and if result still doesn't accurate enough by mpsin   or dubsin            */
 /*******************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 slow1 (double x)
 {
   double w[2], cor, res;
@@ -657,8 +661,8 @@ slow1 (double x)
 /*  Routine compute sin(x) for   0.855469  <|x|<2.426265  by  __sincostab.tbl  */
 /* and if result still doesn't accurate enough by mpsin   or dubsin       */
 /**************************************************************************/
-static double
-SECTION
+static inline double
+__always_inline
 slow2 (double x)
 {
   double w[2], y, y1, y2, cor, res;
@@ -686,8 +690,8 @@ slow2 (double x)
 /* result.And if result not accurate enough routine calls mpsin1 or dubsin */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 sloww (double x, double dx, double orig, int k)
 {
   double y, t, res, cor, w[2], a, da, xn;
@@ -747,8 +751,8 @@ sloww (double x, double dx, double orig, int k)
 /* accurate enough routine calls  mpsin1   or dubsin                       */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 sloww1 (double x, double dx, double orig, int k)
 {
   double w[2], cor, res;
@@ -777,8 +781,8 @@ sloww1 (double x, double dx, double orig, int k)
 /* accurate enough routine calls  mpsin1   or dubsin                       */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 sloww2 (double x, double dx, double orig, int n)
 {
   double w[2], cor, res;
@@ -808,8 +812,8 @@ sloww2 (double x, double dx, double orig, int n)
 /* result.And if result not accurate enough routine calls other routines    */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 bsloww (double x, double dx, double orig, int n)
 {
   double res, cor, w[2], a, da;
@@ -837,8 +841,8 @@ bsloww (double x, double dx, double orig, int n)
 /* And if result not  accurate enough routine calls  other routines         */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 bsloww1 (double x, double dx, double orig, int n)
 {
   double w[2], cor, res;
@@ -865,8 +869,8 @@ bsloww1 (double x, double dx, double orig, int n)
 /* And if result not accurate enough routine calls  other routines          */
 /***************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 bsloww2 (double x, double dx, double orig, int n)
 {
   double w[2], cor, res;
@@ -891,8 +895,8 @@ bsloww2 (double x, double dx, double orig, int n)
 /* precision  and if still doesn't accurate enough by mpcos   or docos  */
 /************************************************************************/
 
-static double
-SECTION
+static inline double
+__always_inline
 cslow2 (double x)
 {
   double w[2], cor, res;
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos

Maurizio Manfredini
In reply to this post by Siddhesh Poyarekar-9


On 08/23/2016 08:22 PM, Siddhesh Poyarekar wrote:

> @@ -181,10 +181,7 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
>    cor = cor + ((cs - y) - e1 * x1);
>    res = y + cor;
>    cor = (y - res) + cor;
> -  if (cor > 0)
> -    cor = 1.0005 * cor + eps;
> -  else
> -    cor = 1.0005 * cor - eps;
> +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
>    *corp = cor;
>    return res;

If eps is known to be >=0 then
 > +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
might be written as
cor = 1.0005 * cor + copysign(eps, cor);

Similarly to fabs(), copysign() avoids a branch - or a potential one
from the ternary.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos

Joseph Myers
On Tue, 23 Aug 2016, Manfred wrote:

> If eps is known to be >=0 then
> > +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
> might be written as
> cor = 1.0005 * cor + copysign(eps, cor);
>
> Similarly to fabs(), copysign() avoids a branch - or a potential one from the
> ternary.

That should be __copysign for namespace reasons (though they should
generally do the same thing because of inlines in math_private.h).

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] Consolidate reduce_and_compute code

Adhemerval Zanella-2
In reply to this post by Siddhesh Poyarekar-9
LGTM, this is mostly indentation.

> Em 23 de ago de 2016, às 15:22, Siddhesh Poyarekar <[hidden email]> escreveu:
>
> This patch reshuffles the reduce_and_compute code so that the
> structure matches other code structures of the same type elsewhere in
> s_sin.c and s_sincos.c.  This is the beginning of an attempt to
> consolidate and reduce code duplication in functions in s_sin.c to
> make it easier to read and possibly also easier for the compiler to
> optimize.
>
>    * sysdeps/ieee754/dbl-64/s_sin.c (reduce_and_compute):
>    Consolidate switch cases 0 and 2.
> ---
> sysdeps/ieee754/dbl-64/s_sin.c | 29 +++++++++++++----------------
> 1 file changed, 13 insertions(+), 16 deletions(-)
>
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index 7c9a079..e1ee7a9 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -249,23 +249,20 @@ reduce_and_compute (double x, unsigned int k)
>   k = (n + k) % 4;
>   switch (k)
>     {
> -      case 0:
> -    if (a * a < 0.01588)
> -      retval = bsloww (a, da, x, n);
> -    else
> -      retval = bsloww1 (a, da, x, n);
> -    break;
> -      case 2:
> -    if (a * a < 0.01588)
> -      retval = bsloww (-a, -da, x, n);
> -    else
> -      retval = bsloww1 (-a, -da, x, n);
> -    break;
> +    case 2:
> +      a = -a;
> +      da = -da;
> +    case 0:
> +      if (a * a < 0.01588)
> +    retval = bsloww (a, da, x, n);
> +      else
> +    retval = bsloww1 (a, da, x, n);
> +      break;
>
> -      case 1:
> -      case 3:
> -    retval = bsloww2 (a, da, x, n);
> -    break;
> +    case 1:
> +    case 3:
> +      retval = bsloww2 (a, da, x, n);
> +      break;
>     }
>   return retval;
> }
> --
> 2.7.4
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos

Siddhesh Poyarekar-8
In reply to this post by Maurizio Manfredini
On Wednesday 24 August 2016 02:23 AM, Manfred wrote:
> If eps is known to be >=0 then
>> +  cor = 1.0005 * cor + ((cor > 0) ? eps : -eps);
> might be written as
> cor = 1.0005 * cor + copysign(eps, cor);
>
> Similarly to fabs(), copysign() avoids a branch - or a potential one
> from the ternary.

Thanks, there are a lot of places in the code that can benefit from
this, so I'll post a separate patch to clean it all up.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] Consolidate reduce_and_compute code

Joseph Myers
In reply to this post by Siddhesh Poyarekar-9
On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> +    case 2:
> +      a = -a;
> +      da = -da;
> +    case 0:

OK with a comment on this fallthrough (we might want to use the
-Wimplicit-fallthrough being proposed for GCC 7...).

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 2/5] Use fabs(x) instead of branching on signedness of input to sin and cos

Joseph Myers
In reply to this post by Siddhesh Poyarekar-9
On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> The sin and cos code is inconsistent about its use of fabs to get the
> absolute value of X where in some places it conditionalizes the code
> while in others it uses fabs.  fabs seems to be a better candidate in
> most cases because it avoids a branch.  Similarly there is an attempt
> to make it easier for the compiler to emit conditional assignment
> instructions (like fcsel on aarch64) where it can, by isolating
> conditional assignment constructs from the rest of the expression.
>
> A further benefit of this change is to identify common constructs
> across functions and consolidate them in future patches.
>
> * sysdeps/ieee754/dbl-64/s_sin.c (do_cos_slow): Use ternary
> instead of if/else.
> (do_sin_slow): Likewise.
> (do_sincos_1): Use fabs instead of if/else.
> (do_sincos_2): Likewise.
> (__sin): Likewise.
> (__cos): Likewise.
> (slow2): Likewise.
> (sloww): Likewise.
> (sloww1): Likewise.  Drop argument M.
> (sloww2): Use fabs instead of if/else.
> (bsloww): Likewise.
> (bsloww1): Likewise.
> (bsloww2): Likewise.

OK.

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

[PING][PATCH 3/5] Consolidate input partitioning into do_cos and do_sin

Siddhesh Poyarekar-8
In reply to this post by Siddhesh Poyarekar-9
Ping!

On Tuesday 23 August 2016 11:52 PM, Siddhesh Poyarekar wrote:

> All calls to do_cos are preceded by code that partitions x into a
> larger double that gives an offset into the sincos table and a smaller
> double that is used in a polynomial computation.  Consolidate all of
> them into do_cos and do_sin to reduce code duplication.
>
> * sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Accept X and DX as input
> arguments.  Consolidate input partitioning from callers here.
> (do_cos_slow): Likewise.
> (do_sin): Likewise.
> (do_sin_slow): Likewise.
> (do_sincos_1): Remove the no longer necessary input partitioning.
> (do_sincos_2): Likewise.
> (__sin): Likewise.
> (__cos): Likewise.
> (slow1): Likewise.
> (slow2): Likewise.
> (sloww1): Likewise.
> (sloww2): Likewise.
> (bsloww1): Likewise.
> (bsloww2): Likewise.
> (cslow2): Likewise.
> ---
>  sysdeps/ieee754/dbl-64/s_sin.c | 191 ++++++++++++++++++-----------------------
>  1 file changed, 82 insertions(+), 109 deletions(-)
>
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index 7f6cd09..e03c75a 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -141,14 +141,21 @@ static double bsloww2 (double x, double dx, double orig, int n);
>  int __branred (double x, double *a, double *aa);
>  static double cslow2 (double x);
>  
> -/* Given a number partitioned into U and X such that U is an index into the
> -   sin/cos table, this macro computes the cosine of the number by combining
> -   the sin and cos of X (as computed by a variation of the Taylor series) with
> -   the values looked up from the sin/cos table to get the result in RES and a
> -   correction value in COR.  */
> +/* Given a number partitioned into X and DX, this function computes the cosine
> +   of the number by combining the sin and cos of X (as computed by a variation
> +   of the Taylor series) with the values looked up from the sin/cos table to
> +   get the result in RES and a correction value in COR.  */
>  static double
> -do_cos (mynumber u, double x, double *corp)
> +do_cos (double x, double dx, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x < 0)
> +    dx = -dx;
> +
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big) + dx;
> +
>    double xx, s, sn, ssn, c, cs, ccs, res, cor;
>    xx = x * x;
>    s = x + x * xx * (sn3 + xx * sn5);
> @@ -161,11 +168,19 @@ do_cos (mynumber u, double x, double *corp)
>    return res;
>  }
>  
> -/* A more precise variant of DO_COS where the number is partitioned into U, X
> -   and DX.  EPS is the adjustment to the correction COR.  */
> +/* A more precise variant of DO_COS.  EPS is the adjustment to the correction
> +   COR.  */
>  static double
> -do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
> +do_cos_slow (double x, double dx, double eps, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x <= 0)
> +    dx = -dx;
> +
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big);
> +
>    double xx, y, x1, x2, e1, e2, res, cor;
>    double s, sn, ssn, c, cs, ccs;
>    xx = x * x;
> @@ -186,14 +201,20 @@ do_cos_slow (mynumber u, double x, double dx, double eps, double *corp)
>    return res;
>  }
>  
> -/* Given a number partitioned into U and X and DX such that U is an index into
> -   the sin/cos table, this macro computes the sine of the number by combining
> -   the sin and cos of X (as computed by a variation of the Taylor series) with
> -   the values looked up from the sin/cos table to get the result in RES and a
> -   correction value in COR.  */
> +/* Given a number partitioned into X and DX, this function computes the sine of
> +   the number by combining the sin and cos of X (as computed by a variation of
> +   the Taylor series) with the values looked up from the sin/cos table to get
> +   the result in RES and a correction value in COR.  */
>  static double
> -do_sin (mynumber u, double x, double dx, double *corp)
> +do_sin (double x, double dx, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x <= 0)
> +    dx = -dx;
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big);
> +
>    double xx, s, sn, ssn, c, cs, ccs, cor, res;
>    xx = x * x;
>    s = x + (dx + x * xx * (sn3 + xx * sn5));
> @@ -206,11 +227,18 @@ do_sin (mynumber u, double x, double dx, double *corp)
>    return res;
>  }
>  
> -/* A more precise variant of res = do_sin where the number is partitioned into U, X
> -   and DX.  EPS is the adjustment to the correction COR.  */
> +/* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
> +   COR.  */
>  static double
> -do_sin_slow (mynumber u, double x, double dx, double eps, double *corp)
> +do_sin_slow (double x, double dx, double eps, double *corp)
>  {
> +  mynumber u;
> +
> +  if (x <= 0)
> +    dx = -dx;
> +  u.x = big + fabs (x);
> +  x = fabs (x) - (u.x - big);
> +
>    double xx, y, x1, x2, c1, c2, res, cor;
>    double s, sn, ssn, c, cs, ccs;
>    xx = x * x;
> @@ -288,8 +316,7 @@ static double
>  __always_inline
>  do_sincos_1 (double a, double da, double x, int4 n, int4 k)
>  {
> -  double xx, retval, res, cor, y;
> -  mynumber u;
> +  double xx, retval, res, cor;
>    double eps = fabs (x) * 1.2e-30;
>  
>    int k1 = (n + k) & 3;
> @@ -309,10 +336,7 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
>   }
>        else
>   {
> -  double db = (a > 0 ? da : -da);
> -  u.x = big + fabs (a);
> -  y = fabs (a) - (u.x - big);
> -  res = do_sin (u, y, db, &cor);
> +  res = do_sin (a, da, &cor);
>    cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
>    retval = ((res == res + cor) ? ((a > 0) ? res : -res)
>      : sloww1 (a, da, x, k));
> @@ -321,16 +345,11 @@ do_sincos_1 (double a, double da, double x, int4 n, int4 k)
>  
>      case 1:
>      case 3:
> - {
> -  double db = (a > 0 ? da : -da);
> -  u.x = big + fabs (a);
> -  y = fabs (a) - (u.x - big) + db;
> -  res = do_cos (u, y, &cor);
> -  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> -  retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
> -    : sloww2 (a, da, x, n));
> -  break;
> - }
> +      res = do_cos (a, da, &cor);
> +      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> +      retval = ((res == res + cor) ? ((k1 & 2) ? -res : res)
> + : sloww2 (a, da, x, n));
> +      break;
>      }
>  
>    return retval;
> @@ -369,7 +388,6 @@ __always_inline
>  do_sincos_2 (double a, double da, double x, int4 n, int4 k)
>  {
>    double res, retval, cor, xx;
> -  mynumber u;
>  
>    double eps = 1.0e-24;
>  
> @@ -392,10 +410,7 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
>   }
>        else
>   {
> -  double db = (a > 0 ? da : -da);
> -  u.x = big + fabs (a);
> -  double y = fabs (a) - (u.x - big);
> -  res = do_sin (u, y, db, &cor);
> +  res = do_sin (a, da, &cor);
>    cor = (cor > 0) ? 1.035 * cor + eps : 1.035 * cor - eps;
>    retval = ((res == res + cor) ? ((a > 0) ? res : -res)
>      : bsloww1 (a, da, x, n));
> @@ -404,16 +419,11 @@ do_sincos_2 (double a, double da, double x, int4 n, int4 k)
>  
>      case 1:
>      case 3:
> - {
> -  double db = (a > 0 ? da : -da);
> -  u.x = big + fabs (a);
> -  double y = fabs (a) - (u.x - big) + db;
> -  res = do_cos (u, y, &cor);
> -  cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> -  retval = ((res == res + cor) ? ((n & 2) ? -res : res)
> -    : bsloww2 (a, da, x, n));
> -  break;
> - }
> +      res = do_cos (a, da, &cor);
> +      cor = (cor > 0) ? 1.025 * cor + eps : 1.025 * cor - eps;
> +      retval = ((res == res + cor) ? ((n & 2) ? -res : res)
> + : bsloww2 (a, da, x, n));
> +      break;
>      }
>  
>    return retval;
> @@ -485,11 +495,7 @@ __sin (double x)
>      {
>  
>        t = hp0 - fabs (x);
> -      u.x = big + fabs (t);
> -      y = fabs (t) - (u.x - big);
> -      y = ((t >= 0) ? hp1 : -hp1) + y;
> -
> -      res = do_cos (u, y, &cor);
> +      res = do_cos (t, hp1, &cor);
>        retval = (res == res + 1.020 * cor) ? ((m > 0) ? res : -res) : slow2 (x);
>      } /*   else  if (k < 0x400368fd)    */
>  
> @@ -561,10 +567,7 @@ __cos (double x)
>  
>    else if (k < 0x3feb6000)
>      { /* 2^-27 < |x| < 0.855469 */
> -      y = fabs (x);
> -      u.x = big + y;
> -      y = y - (u.x - big);
> -      res = do_cos (u, y, &cor);
> +      res = do_cos (x, 0, &cor);
>        retval = (res == res + 1.020 * cor) ? res : cslow2 (x);
>      } /*   else  if (k < 0x3feb6000)    */
>  
> @@ -582,10 +585,7 @@ __cos (double x)
>   }
>        else
>   {
> -  double db = (a > 0 ? da : -da);
> -  u.x = big + fabs (a);
> -  y = fabs (a) - (u.x - big);
> -  res = do_sin (u, y, db, &cor);
> +  res = do_sin (a, da, &cor);
>    cor = (cor > 0) ? 1.035 * cor + 1.0e-31 : 1.035 * cor - 1.0e-31;
>    retval = ((res == res + cor) ? ((a > 0) ? res : -res)
>      : sloww1 (a, da, x, 1));
> @@ -655,12 +655,9 @@ static double
>  SECTION
>  slow1 (double x)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  res = do_sin_slow (u, y, 0, 0, &cor);
> +  double w[2], cor, res;
> +
> +  res = do_sin_slow (x, 0, 0, &cor);
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> @@ -679,15 +676,10 @@ static double
>  SECTION
>  slow2 (double x)
>  {
> -  mynumber u;
> -  double w[2], y, y1, y2, cor, res, del;
> +  double w[2], y, y1, y2, cor, res;
>  
>    double t = hp0 - fabs (x);
> -  u.x = big + fabs (t);
> -  y = fabs (t) - (u.x - big);
> -  del = (t >= 0) ? hp1 : -hp1;
> -
> -  res = do_cos_slow (u, y, del, 0, &cor);
> +  res = do_cos_slow (t, hp1, 0, &cor);
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> @@ -774,17 +766,14 @@ static double
>  SECTION
>  sloww1 (double x, double dx, double orig, int k)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  u.x = big + fabs (x);
> -  y = fabs (x) - (u.x - big);
> -  dx = (x > 0 ? dx : -dx);
> -  res = do_sin_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
> +  res = do_sin_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
>  
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> +  dx = (x > 0 ? dx : -dx);
>    __dubsin (fabs (x), dx, w);
>  
>    double eps = 1.1e-30 * fabs (orig);
> @@ -807,17 +796,14 @@ static double
>  SECTION
>  sloww2 (double x, double dx, double orig, int n)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  u.x = big + fabs (x);
> -  y = fabs (x) - (u.x - big);
> -  dx = (x > 0 ? dx : -dx);
> -  res = do_cos_slow (u, y, dx, 3.1e-30 * fabs (orig), &cor);
> +  res = do_cos_slow (x, dx, 3.1e-30 * fabs (orig), &cor);
>  
>    if (res == res + cor)
>      return (n & 2) ? -res : res;
>  
> +  dx = x > 0 ? dx : -dx;
>    __docos (fabs (x), dx, w);
>  
>    double eps = 1.1e-30 * fabs (orig);
> @@ -870,17 +856,13 @@ static double
>  SECTION
>  bsloww1 (double x, double dx, double orig, int n)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  dx = (x > 0) ? dx : -dx;
> -  res = do_sin_slow (u, y, dx, 1.1e-24, &cor);
> +  res = do_sin_slow (x, dx, 1.1e-24, &cor);
>    if (res == res + cor)
>      return (x > 0) ? res : -res;
>  
> +  dx = (x > 0) ? dx : -dx;
>    __dubsin (fabs (x), dx, w);
>  
>    cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
> @@ -902,17 +884,13 @@ static double
>  SECTION
>  bsloww2 (double x, double dx, double orig, int n)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  dx = (x > 0) ? dx : -dx;
> -  res = do_cos_slow (u, y, dx, 1.1e-24, &cor);
> +  res = do_cos_slow (x, dx, 1.1e-24, &cor);
>    if (res == res + cor)
>      return (n & 2) ? -res : res;
>  
> +  dx = (x > 0) ? dx : -dx;
>    __docos (fabs (x), dx, w);
>  
>    cor = 1.000000005 * w[1] + ((w[1] > 0) ? 1.1e-24 : -1.1e-24);
> @@ -932,18 +910,13 @@ static double
>  SECTION
>  cslow2 (double x)
>  {
> -  mynumber u;
> -  double w[2], y, cor, res;
> +  double w[2], cor, res;
>  
> -  y = fabs (x);
> -  u.x = big + y;
> -  y = y - (u.x - big);
> -  res = do_cos_slow (u, y, 0, 0, &cor);
> +  res = do_cos_slow (x, 0, 0, &cor);
>    if (res == res + cor)
>      return res;
>  
> -  y = fabs (x);
> -  __docos (y, 0, w);
> +  __docos (fabs (x), 0, w);
>    if (w[0] == w[0] + 1.000000005 * w[1])
>      return w[0];
>  
>
Reply | Threaded
Open this post in threaded view
|

[PING][PATCH 4/5] Use DO_SIN for sin(x) where 0.25 < |x| < 0.855469

Siddhesh Poyarekar-8
In reply to this post by Siddhesh Poyarekar-9
Ping!

On Tuesday 23 August 2016 11:52 PM, Siddhesh Poyarekar wrote:

> The only code looks slightly different from DO_SIN but on closer
> examination, should give exactly the same result.  Drop it in favour
> of the DO_SIN function call.
>
> * sysdeps/ieee754/dbl-64/s_sin.c (__sin): Use DO_SIN.
> ---
>  sysdeps/ieee754/dbl-64/s_sin.c | 21 +++------------------
>  1 file changed, 3 insertions(+), 18 deletions(-)
>
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index e03c75a..82f9345 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -441,7 +441,7 @@ SECTION
>  #endif
>  __sin (double x)
>  {
> -  double xx, res, t, cor, y, s, c, sn, ssn, cs, ccs;
> +  double xx, res, t, cor;
>    mynumber u;
>    int4 k, m;
>    double retval = 0;
> @@ -471,23 +471,8 @@ __sin (double x)
>  /*---------------------------- 0.25<|x|< 0.855469---------------------- */
>    else if (k < 0x3feb6000)
>      {
> -      u.x = big + fabs (x);
> -      y = fabs (x) - (u.x - big);
> -      y = (x > 0 ? y : -y);
> -
> -      xx = y * y;
> -      s = y + y * xx * (sn3 + xx * sn5);
> -      c = xx * (cs2 + xx * (cs4 + xx * cs6));
> -      SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs);
> -      if (m <= 0)
> -        {
> -          sn = -sn;
> -  ssn = -ssn;
> - }
> -      cor = (ssn + s * ccs - sn * c) + cs * s;
> -      res = sn + cor;
> -      cor = (sn - res) + cor;
> -      retval = (res == res + 1.096 * cor) ? res : slow1 (x);
> +      res = do_sin (x, 0, &cor);
> +      retval = (res == res + 1.096 * cor) ? (m > 0 ? res : -res) : slow1 (x);
>      } /*   else  if (k < 0x3feb6000)    */
>  
>  /*----------------------- 0.855469  <|x|<2.426265  ----------------------*/
>
Reply | Threaded
Open this post in threaded view
|

[PING][PATCH 5/5] Inline all support functions for sin and cos

Siddhesh Poyarekar-8
In reply to this post by Siddhesh Poyarekar-9
Ping!

On Tuesday 23 August 2016 11:52 PM, Siddhesh Poyarekar wrote:

> The support functions for sin and cos have a lot of identical
> functionality, so inlining them gives a pretty decent jump in
> functionality: ~19% in the sincos function.  On SPEC2006 this
> translates to about 2.1% in the tonto test.
>
> * sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Mark as inline.
> (do_cos_slow): Likewise.
> (do_sin): Likewise.
> (do_sin_slow): Likewise.
> (slow): Likewise.
> (slow1): Likewise.
> (slow2): Likewise.
> (sloww): Likewise.
> (sloww1): Likewise.
> (sloww2): Likewise.
> (bsloww): Likewise.
> (bsloww1): Likewise.
> (bsloww2): Likewise.
> (cslow2): Likewise.
> ---
>  sysdeps/ieee754/dbl-64/s_sin.c | 52 +++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 24 deletions(-)
>
> diff --git a/sysdeps/ieee754/dbl-64/s_sin.c b/sysdeps/ieee754/dbl-64/s_sin.c
> index 82f9345..c20ef4d 100644
> --- a/sysdeps/ieee754/dbl-64/s_sin.c
> +++ b/sysdeps/ieee754/dbl-64/s_sin.c
> @@ -145,7 +145,8 @@ static double cslow2 (double x);
>     of the number by combining the sin and cos of X (as computed by a variation
>     of the Taylor series) with the values looked up from the sin/cos table to
>     get the result in RES and a correction value in COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_cos (double x, double dx, double *corp)
>  {
>    mynumber u;
> @@ -170,7 +171,8 @@ do_cos (double x, double dx, double *corp)
>  
>  /* A more precise variant of DO_COS.  EPS is the adjustment to the correction
>     COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_cos_slow (double x, double dx, double eps, double *corp)
>  {
>    mynumber u;
> @@ -205,7 +207,8 @@ do_cos_slow (double x, double dx, double eps, double *corp)
>     the number by combining the sin and cos of X (as computed by a variation of
>     the Taylor series) with the values looked up from the sin/cos table to get
>     the result in RES and a correction value in COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_sin (double x, double dx, double *corp)
>  {
>    mynumber u;
> @@ -229,7 +232,8 @@ do_sin (double x, double dx, double *corp)
>  
>  /* A more precise variant of DO_SIN.  EPS is the adjustment to the correction
>     COR.  */
> -static double
> +static inline double
> +__always_inline
>  do_sin_slow (double x, double dx, double eps, double *corp)
>  {
>    mynumber u;
> @@ -615,8 +619,8 @@ __cos (double x)
>  /* precision  and if still doesn't accurate enough by mpsin   or dubsin */
>  /************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  slow (double x)
>  {
>    double res, cor, w[2];
> @@ -636,8 +640,8 @@ slow (double x)
>  /* and if result still doesn't accurate enough by mpsin   or dubsin            */
>  /*******************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  slow1 (double x)
>  {
>    double w[2], cor, res;
> @@ -657,8 +661,8 @@ slow1 (double x)
>  /*  Routine compute sin(x) for   0.855469  <|x|<2.426265  by  __sincostab.tbl  */
>  /* and if result still doesn't accurate enough by mpsin   or dubsin       */
>  /**************************************************************************/
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  slow2 (double x)
>  {
>    double w[2], y, y1, y2, cor, res;
> @@ -686,8 +690,8 @@ slow2 (double x)
>  /* result.And if result not accurate enough routine calls mpsin1 or dubsin */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  sloww (double x, double dx, double orig, int k)
>  {
>    double y, t, res, cor, w[2], a, da, xn;
> @@ -747,8 +751,8 @@ sloww (double x, double dx, double orig, int k)
>  /* accurate enough routine calls  mpsin1   or dubsin                       */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  sloww1 (double x, double dx, double orig, int k)
>  {
>    double w[2], cor, res;
> @@ -777,8 +781,8 @@ sloww1 (double x, double dx, double orig, int k)
>  /* accurate enough routine calls  mpsin1   or dubsin                       */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  sloww2 (double x, double dx, double orig, int n)
>  {
>    double w[2], cor, res;
> @@ -808,8 +812,8 @@ sloww2 (double x, double dx, double orig, int n)
>  /* result.And if result not accurate enough routine calls other routines    */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  bsloww (double x, double dx, double orig, int n)
>  {
>    double res, cor, w[2], a, da;
> @@ -837,8 +841,8 @@ bsloww (double x, double dx, double orig, int n)
>  /* And if result not  accurate enough routine calls  other routines         */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  bsloww1 (double x, double dx, double orig, int n)
>  {
>    double w[2], cor, res;
> @@ -865,8 +869,8 @@ bsloww1 (double x, double dx, double orig, int n)
>  /* And if result not accurate enough routine calls  other routines          */
>  /***************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  bsloww2 (double x, double dx, double orig, int n)
>  {
>    double w[2], cor, res;
> @@ -891,8 +895,8 @@ bsloww2 (double x, double dx, double orig, int n)
>  /* precision  and if still doesn't accurate enough by mpcos   or docos  */
>  /************************************************************************/
>  
> -static double
> -SECTION
> +static inline double
> +__always_inline
>  cslow2 (double x)
>  {
>    double w[2], cor, res;
>
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] Inline all support functions for sin and cos

Andreas Schwab
In reply to this post by Siddhesh Poyarekar-9
On Aug 23 2016, Siddhesh Poyarekar <[hidden email]> wrote:

> The support functions for sin and cos have a lot of identical
> functionality, so inlining them gives a pretty decent jump in
> functionality: ~19% in the sincos function.  On SPEC2006 this

What is the metric of functionality?

> translates to about 2.1% in the tonto test.

What does "tonto test" mean?

Andreas.

--
Andreas Schwab, SUSE Labs, [hidden email]
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] Inline all support functions for sin and cos

Ramana Radhakrishnan-7
On Tue, Aug 30, 2016 at 8:52 AM, Andreas Schwab <[hidden email]> wrote:

> On Aug 23 2016, Siddhesh Poyarekar <[hidden email]> wrote:
>
>> The support functions for sin and cos have a lot of identical
>> functionality, so inlining them gives a pretty decent jump in
>> functionality: ~19% in the sincos function.  On SPEC2006 this
>
> What is the metric of functionality?
>
>> translates to about 2.1% in the tonto test.
>
> What does "tonto test" mean?

https://www.spec.org/cpu2006/Docs/465.tonto.html



>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, [hidden email]
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 5/5] Inline all support functions for sin and cos

Siddhesh Poyarekar-9
In reply to this post by Andreas Schwab


On Tuesday 30 August 2016 01:22 PM, Andreas Schwab wrote:
>> The support functions for sin and cos have a lot of identical
>> functionality, so inlining them gives a pretty decent jump in
>> functionality: ~19% in the sincos function.  On SPEC2006 this
> What is the metric of functionality?

Sorry, that was a typo, it should read as "a pretty decent jump in
performance" in the sincos function microbenchmark in benchtests.

>> translates to about 2.1% in the tonto test.
> What does "tonto test" mean?

The tonto test is part of the CPU2006 benchmark and it uses sincos and
its children functions for a little under half of its execution time.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/5] Consolidate reduce_and_compute code

Siddhesh Poyarekar-9
In reply to this post by Joseph Myers


On Monday 29 August 2016 09:33 PM, Joseph Myers wrote:
> OK with a comment on this fallthrough (we might want to use the
> -Wimplicit-fallthrough being proposed for GCC 7...).
>

I pushed a separate commit with the fallthrough comment for both switch
blocks in s_sin.c.

Siddhesh
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 3/5] Consolidate input partitioning into do_cos and do_sin

Joseph Myers
In reply to this post by Siddhesh Poyarekar-9
On Tue, 23 Aug 2016, Siddhesh Poyarekar wrote:

> All calls to do_cos are preceded by code that partitions x into a
> larger double that gives an offset into the sincos table and a smaller
> double that is used in a polynomial computation.  Consolidate all of
> them into do_cos and do_sin to reduce code duplication.
>
> * sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Accept X and DX as input
> arguments.  Consolidate input partitioning from callers here.
> (do_cos_slow): Likewise.
> (do_sin): Likewise.
> (do_sin_slow): Likewise.
> (do_sincos_1): Remove the no longer necessary input partitioning.
> (do_sincos_2): Likewise.
> (__sin): Likewise.
> (__cos): Likewise.
> (slow1): Likewise.
> (slow2): Likewise.
> (sloww1): Likewise.
> (sloww2): Likewise.
> (bsloww1): Likewise.
> (bsloww2): Likewise.
> (cslow2): Likewise.

OK.

--
Joseph S. Myers
[hidden email]
12