RFC Power PC G3 optimized sqrtf function.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC Power PC G3 optimized sqrtf function.

Conn Clark
Hi everybody,

This is my 1st post and attempt at contributing to glibc

  I have written a sqrtf function that is much faster on a PowerPC G3 than
the original one used. It uses the frsqrte instruction and Newton
Goldschmidt iterations to get a result. I have had testing done on G3 and G4
processors and it has results that conform to ieee . Testing on G5's has
shown it gets within at least 2 bits of the correct answer. This shouldn't
matter because the G5 has a hardware sqrtf instruction. It may work on 603,
603e, 604, and 604e processors as well but I have not tested on them. It
will not work on a  601 processor.

  The limiting factor on ieee conformance is the frsqrte instruction must
produce a result that is within 1/59th of the correct value. A timing test
on all valid values using the current glibc function takes about 26 minutes
on a iMac g3 400MHz machine. With my implementation it takes about 21
minutes.

Please read the header for more details and give me some feedback.

P.S. Do I need to file copyright assignment papers for this?

Thanks,

Conn


 ---------------------------------------
Conn Clark

Electronic Systems Technology
415 N. Quay Street Building B1     (509)-735-9092 ext 117
Kennewick, WA. 99336

Gentoo Linux RU13$!!!

e_sqrtf.S (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC Power PC G3 optimized sqrtf function.

Steve Munroe

[hidden email] wrote on 12/14/2006 01:05:57 PM:

> Hi everybody,
>
> This is my 1st post and attempt at contributing to glibc
>
Thanks Conn. To get started, submittions to libc are normally in the form
of a patch with a changelog header. Please review
http://www.gnu.org/prep/standards/standards.html section 6.8.

>   I have written a sqrtf function that is much faster on a PowerPC G3
than
> the original one used. It uses the frsqrte instruction and Newton
> Goldschmidt iterations to get a result. I have had testing done on G3 and
G4
> processors and it has results that conform to ieee . Testing on G5's has
> shown it gets within at least 2 bits of the correct answer. This
shouldn't
> matter because the G5 has a hardware sqrtf instruction. It may work on
603,
> 603e, 604, and 604e processors as well but I have not tested on them. It
> will not work on a  601 processor.
>
Next, I assume you intend to add this to the powerpc-cpu add-on using
--with-cpu=g3 configuration?

In this case we need to place your e_sqrtf.S file an appropriate directory
so that it does not impact PowerPCs that do have fsqrt. For example:

./powerpc-cpu/sysdeps/powerpc/powerpc32/g3/fpu/e_sqrtf.S

You will also need an Implies file in the sysdeps/unix/sysv/linux tree to
make sure your new directory is early enough in the search order to
override the e_sqrtf in libc trunc.

For example:

./powerpc-cpu/sysdeps/unix/sysv/linux/powerpc/powerpc32/g3/fpu/Implies

would contain:

powerpc/powerpc32/g3/fpu

If you want g4 to default to the g3 implementation, create
powerpc/powerpc32/g4/fpu directories with Implies files referencing the
powerpc/powerpc32/g3/fpu directories. Similarly for 603, 604, ...

See the powerpc-cpu README for more details.

You patch should reflect this directory detail.

>   The limiting factor on ieee conformance is the frsqrte instruction must

> produce a result that is within 1/59th of the correct value. A timing
test
> on all valid values using the current glibc function takes about 26
minutes
> on a iMac g3 400MHz machine. With my implementation it takes about 21
> minutes.
>

Not sure what you are getting at here. The PowerPC Arch 2.0x (V1.x also)
states that frsqrte is "correct to one part in 32". Does you algorithm
require better precision then the Arch provides? The Arch does say that
results may vary between implementations. So does G3/G4 frsqrte provide
better then 1/32 precision?

> Please read the header for more details and give me some feedback.
>
> P.S. Do I need to file copyright assignment papers for this?
>
Yes you do.

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

Reply | Threaded
Open this post in threaded view
|

Re: RFC Power PC G3 optimized sqrtf function.

Conn Clark
Steve Munroe writes:

>
> [hidden email] wrote on 12/14/2006 01:05:57 PM:
>
>> Hi everybody,
>>
>> This is my 1st post and attempt at contributing to glibc
>>
> Thanks Conn. To get started, submittions to libc are normally in the form
> of a patch with a changelog header. Please review
> http://www.gnu.org/prep/standards/standards.html section 6.8.
>
>>   I have written a sqrtf function that is much faster on a PowerPC G3
> than
>> the original one used. It uses the frsqrte instruction and Newton

<SNIP>

>> will not work on a  601 processor.
>>
> Next, I assume you intend to add this to the powerpc-cpu add-on using
> --with-cpu=g3 configuration?
>

Correct

> In this case we need to place your e_sqrtf.S file an appropriate directory
> so that it does not impact PowerPCs that do have fsqrt. For example:
>
> ./powerpc-cpu/sysdeps/powerpc/powerpc32/g3/fpu/e_sqrtf.S
>

Okay, I'll do that.

> You will also need an Implies file in the sysdeps/unix/sysv/linux tree to
> make sure your new directory is early enough in the search order to
> override the e_sqrtf in libc trunc.
>
> For example:
>
> ./powerpc-cpu/sysdeps/unix/sysv/linux/powerpc/powerpc32/g3/fpu/Implies
>
> would contain:
>
> powerpc/powerpc32/g3/fpu
>
> If you want g4 to default to the g3 implementation, create
> powerpc/powerpc32/g4/fpu directories with Implies files referencing the
> powerpc/powerpc32/g3/fpu directories. Similarly for 603, 604, ...
>
> See the powerpc-cpu README for more details.
>
> You patch should reflect this directory detail.
>
>>   The limiting factor on ieee conformance is the frsqrte instruction must
>
>> produce a result that is within 1/59th of the correct value. A timing
> test
>> on all valid values using the current glibc function takes about 26
> minutes
>> on a iMac g3 400MHz machine. With my implementation it takes about 21
>> minutes.
>>
>
> Not sure what you are getting at here. The PowerPC Arch 2.0x (V1.x also)
> states that frsqrte is "correct to one part in 32". Does you algorithm
> require better precision then the Arch provides?

Yes

> The Arch does say that
> results may vary between implementations. So does G3/G4 frsqrte provide
> better then 1/32 precision?

Yes they do. All implementations


>> Please read the header for more details and give me some feedback.
>>
>> P.S. Do I need to file copyright assignment papers for this?
>>

> Yes you do.

Is there a link on how to do this?

> Steven J. Munroe
> Linux on Power Toolchain Architect
> IBM Corporation, Linux Technology Center
>

Thank you,

Conn

 ---------------------------------------
Conn Clark

Electronic Systems Technology
415 N. Quay Street Building B1     (509)-735-9092 ext 117
Kennewick, WA. 99336

Gentoo Linux RU13$!!!
Reply | Threaded
Open this post in threaded view
|

Re: RFC Power PC G3 optimized sqrtf function.

Steve Munroe



"Conn Clark" <[hidden email]> wrote on 12/14/2006 04:05:26 PM:

> Steve Munroe writes:
>
> >
> > [hidden email] wrote on 12/14/2006 01:05:57 PM:
> >>
> >> P.S. Do I need to file copyright assignment papers for this?
> >>
>
> > Yes you do.
>
> Is there a link on how to do this?
>
Send a note to [hidden email] and ask for info on the current process.

Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center