crlibm elementary functions for the glibc ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

crlibm elementary functions for the glibc ?

Florent de Dinechin
Hello to all,

We are the maintainers of the crlibm project, which aims at developping
a modern, correctly rounded version of the libm licenced under the LGPL.
Project homepage is https://lipforge.ens-lyon.fr/projects/crlibm/

I feel that this project is now achieving sufficient maturity to be a
candidate for gradual inclusion in the glibc. I would appreciate any
feedback on this question, and probably some practical advice.


Correct rounding means that the value returned by a libm function is
fully and uniquely specified, to the best mathematical possible. Its
main advantage is therefore to improve numerical portability between
heterogeneous systems. The drawbacks are a small degradation in
performance (theoretically a few percents in average) and a larger
degradation in code size. The pdf files distributed along with CRLibm
details this issue and gives more references.

As far as I observe, the libm currently used in glibc is a derivative
from IBM libultim, at least for  architectures with double-precision by
default, like powerPC and x86_64 using the sse2 FP unit.
This is a very good choice, as it was the first efficient implementation
of a double-precision correctly rounded library.

However, libultim has several drawbacks:
- impredictible worst-case performance with some calls in the million
cycle range (see below)
- obscure code,
- some large tables of precomputed values (up to 43 KB for arctan, 13KB
for exp, etc)

CRLibm has a sounder theoretical basis, which allows in particular to
provide much more acceptable worst-case execution time (several hundred
times lower, see below). We have an emphasis on the proof of the correct
rounding property and in general on documentation. We also limit the
table size to 4KB altogether for each function (whether this is a good
idea can be discussed - several years ago, it seemed so).  All this has
some performance cost, which we are slowly catching up. Appended are
some performance comparisons.

Besides, CRLibm is currently actively maintained, and in any case should
be more future-proof than libultim. For instance, unlike libultim, the
CRLibm distribution includes all the files that generate the tables and
other constants, so current tradeoffs can be changed as technology
evolves. It also includes a battery of selftests. Another example is
that our code is designed to use FMA efficiently (on power/powerPC and
ia64) as soon as there is clean and efficient compiler support for it.
The code also attempts to expose parallelism (which current gcc fails to
exploit efficiently, but this is bound to improve).

We also have more correctly rounded functions that libultim (currently
log2 and log10, expm1 and log1p, and the trigpi functions mentionned in
the current draft of the revised IEEE-754 standard), and we keep adding
new ones.

There are other more prospective features, like machine-checkable proofs
of some of the code, partial support of directed rounding modes and
interval arithmetic functions, etc, all of which is probably currently
irrelevant to the glibc.

If there is interest, I would like some advice on the procedure to
follow. I appreciate that we will have to submit patches. I have many
questions on this process: should we add a separate directory and where,
which functions should we submit first, should first improve average
performance at the expense of larger tables, what should we do with the
autotest and documentation, etc. I also appreciate that it will take
some work to clean up crlibm and bring it to GNU standards.

We would also appreciate pointers on relevant discussions on the list
archive.

Thank you for your interest, and many thanks in advance,

      Florent de Dinechin and Christoph Lauter



Here is a sample of performance results. The actual input values used
for the worst-case timing tests are available in the crlibm distribution.
The random generators used for average case are by definition
controversial, but they were chosen in good faith, not to distort the
comparison in favor of crlibm. For example our asin and acos
implementations are faster than libultim's on a random distribution of
the floating-point numbers between -1 and 1, but slower on a normal
distribution of the reals between -1 and 1 (used below).

Tests on an Opteron
( Linux 2.6.17-2-amd64
    gcc (GCC) 4.1.2 20060901 (prerelease) (Debian 4.1.1-13) )

      exp      avg time  max time
default libm    141    1105468
   CRLibm        184       1210

       log     avg time  max time
default libm    210     150919
   CRLibm        146        903      (using double-extended arithmetic)
   CRLibm        266       1459      (using standard SSE2 doubles)

       sin     avg time  max time
default libm    147    1018622
   CRLibm        171       3895

       cos     avg time  max time
default libm    152     028302
   CRLibm        171       3752

       tan     avg time  max time
default libm    244    1101230
   CRLibm        280       9949

      asin     avg time  max time
default libm    107    1018823
   CRLibm        297       2383

      acos     avg time  max time
default libm    83    1018786
   CRLibm        322      2360

      atan     avg time  max time
default libm    144     100066
   CRLibm        243       4724

     log10     avg time  max time
default libm    249       1061    NOT correctly rounded
   CRLibm        321       1706

      log2     avg time  max time
default libm    174        274    NOT correctly rounded
   CRLibm        320       1663



The same, on an IBM power5 server (time units are arbitrary)


       log     avg time  max time
default libm     61      23471
   CRLibm         57        307

      exp      avg time  max time
default libm     41      25019
   CRLibm         42        242

       sin     avg time  max time
default libm     37     132435
   CRLibm         43       1910

       cos     avg time  max time
default libm     38     134045
   CRLibm         44       1946

       tan     avg time  max time
default libm     65     141885
   CRLibm         72       4671

      asin     avg time  max time
default libm     33     132912
   CRLibm         50        465

      acos     avg time  max time
default libm     34     132798
   CRLibm         58        433

      atan     avg time  max time
default libm     35      18785
   CRLibm         53       2315

     log10     avg time  max time
default libm     65        153 NOT correctly rounded
   CRLibm         65        355

      log2     avg time  max time
default libm     47         59  NOT correctly rounded
   CRLibm         65        351


PS2 : while browsing the code I just noticed a probably harmless bug
line 27 of sysdeps/ieee754/dbl-64/sincos.tbl : 40 should be 440)
Reply | Threaded
Open this post in threaded view
|

Re: crlibm elementary functions for the glibc ?

Joseph Myers
On Tue, 6 Feb 2007, Florent de Dinechin wrote:

> We are the maintainers of the crlibm project, which aims at developping a
> modern, correctly rounded version of the libm licenced under the LGPL.
> Project homepage is https://lipforge.ens-lyon.fr/projects/crlibm/
>
> I feel that this project is now achieving sufficient maturity to be a
> candidate for gradual inclusion in the glibc. I would appreciate any feedback
> on this question, and probably some practical advice.

The following are just my views, I don't speak for the glibc maintainers.

First, I expect it will be necessary to follow the FSF copyright
assignment process.  While various parts of glibc at present, including
mathematical functions, have non-FSF copyrights, they are less likely to
be accepted for new code; see
<http://sourceware.org/ml/libc-alpha/2007-01/msg00026.html>.

Second, I'd like to suggest that you consider contributing the code to the
libgcc-math <http://gcc.gnu.org/wiki/libgcc-math> project *as well as* to
glibc.  This aims to collect implementations of mathematical functions for
use from GCC-compiled code beyond the scope of glibc; for example,
vectorized functions, normal functions compiled with more efficient ABIs
(SSE argument passing on x86) and C99 functions for use by the C++ and
Fortran runtime libraries on platforms whose non-glibc libm is missing
those functions.  Such functions are specifically beyond the scope of what
is considered appropriate for inclusion in glibc
<http://sourceware.org/ml/libc-alpha/2005-07/msg00008.html>, so if you are
interested in this then you should discuss it on [hidden email] and not
on libc-alpha.  By assigning to the FSF and using the LGPL+exception
license used for soft-fp from the start, the code can be used in both
places; you should aim for making it possible to have identical source
files in both places, as that's best in terms of FSF policies.

> should first improve average
> performance at the expense of larger tables,

If doing so makes the average case performance better than the existing
implementations, that would make a clearer case for including this code as
a replacement for the current code.

There are trade-offs involved between speed, size and accuracy of
mathematical functions.  glibc has to strike a middle ground for general
users.  Other projects such as libgcc-math may collect different versions
tuned for different trade-offs and allow the user to select them at
compile time or link time.

The glibc bug database <http://sources.redhat.com/bugzilla/> has 57 open
bugs in the "math" component, mostly suspended until someone comes forward
with patches for them.  Perhaps your new implementations fix some of them?  
(Many of those bugs relate to float and long double versions of the
functions; as I understand it, your methods could generate proven
correctly rounded functions for float and probably but not proven
correctly rounded versions for at least IEEE extended and quad long
double.)

--
Joseph S. Myers
[hidden email]