[PATCH v2] ARM: Improve armv7 memcpy performance.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH v2] ARM: Improve armv7 memcpy performance.

Will Newton

Only enter the aligned copy loop with buffers that can be 8-byte
aligned. This improves performance slightly on Cortex-A9 and
Cortex-A15 cores for large copies with buffers that are 4-byte
aligned but not 8-byte aligned.

ports/ChangeLog.arm:

2013-08-30  Will Newton  <[hidden email]>

        * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
        on entry to aligned copy loop to improve performance.
---
 ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Changes in v2:
 - Improved description

diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
index 3decad6..6e84173 100644
--- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
+++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
@@ -369,8 +369,8 @@ ENTRY(memcpy)
  cfi_adjust_cfa_offset (FRAME_SIZE)
  cfi_rel_offset (tmp2, 0)
  cfi_remember_state
- and tmp2, src, #3
- and tmp1, dst, #3
+ and tmp2, src, #7
+ and tmp1, dst, #7
  cmp tmp1, tmp2
  bne .Lcpy_notaligned

--
1.8.1.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] ARM: Improve armv7 memcpy performance.

Carlos O'Donell-6
On 08/30/2013 11:09 AM, Will Newton wrote:

>
> Only enter the aligned copy loop with buffers that can be 8-byte
> aligned. This improves performance slightly on Cortex-A9 and
> Cortex-A15 cores for large copies with buffers that are 4-byte
> aligned but not 8-byte aligned.
>
> ports/ChangeLog.arm:
>
> 2013-08-30  Will Newton  <[hidden email]>
>
> * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
> on entry to aligned copy loop to improve performance.

How did you test this?

Did you use the glibc performance microbenchmark?

Does the microbenchmark show gains with this change? What are the numbers?

Cheers,
Carlos.

> ---
>  ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Changes in v2:
>  - Improved description
>
> diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> index 3decad6..6e84173 100644
> --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> @@ -369,8 +369,8 @@ ENTRY(memcpy)
>   cfi_adjust_cfa_offset (FRAME_SIZE)
>   cfi_rel_offset (tmp2, 0)
>   cfi_remember_state
> - and tmp2, src, #3
> - and tmp1, dst, #3
> + and tmp2, src, #7
> + and tmp1, dst, #7
>   cmp tmp1, tmp2
>   bne .Lcpy_notaligned
>