[PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

Christos Gentsos
The same pop instruction that is used to restore registers can be used
to return from the function (as it is already done in other function
implementations).
---
 newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
 newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
index 61a72581..a0aad852 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
@@ -49,9 +49,7 @@ __aeabi_memmove:
  subs r3, r3, #1
  bcs 1b
 2:
- pop {r4}
- pop {r1}
- bx r1
+ pop {r4, pc}
 3:
  movs r3, #0
  cmp r2, #0
diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
index aa8f2719..5bb80b20 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
@@ -110,9 +110,7 @@ __aeabi_memset:
  cmp r4, r3
  bne 8b
 9:
- pop {r4, r5, r6}
- pop {r1}
- bx r1
+ pop {r4, r5, r6, pc}
 10:
  movs r3, r0
  movs r4, r1
--
2.23.0

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

Richard Earnshaw (lists)
On 30/09/2019 15:55, Christos Gentsos wrote:

> The same pop instruction that is used to restore registers can be used
> to return from the function (as it is already done in other function
> implementations).
> ---
>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>   2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> index 61a72581..a0aad852 100644
> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> @@ -49,9 +49,7 @@ __aeabi_memmove:
>   subs r3, r3, #1
>   bcs 1b
>   2:
> - pop {r4}
> - pop {r1}
> - bx r1
> + pop {r4, pc}
>   3:
>   movs r3, #0
>   cmp r2, #0
> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> index aa8f2719..5bb80b20 100644
> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> @@ -110,9 +110,7 @@ __aeabi_memset:
>   cmp r4, r3
>   bne 8b
>   9:
> - pop {r4, r5, r6}
> - pop {r1}
> - bx r1
> + pop {r4, r5, r6, pc}
>   10:
>   movs r3, r0
>   movs r4, r1
>

No.  That isn't interworking clean on armv4t, which we still need to
support.

Sorry.

R.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

Richard Earnshaw (lists)
On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:

> On 30/09/2019 15:55, Christos Gentsos wrote:
>> The same pop instruction that is used to restore registers can be used
>> to return from the function (as it is already done in other function
>> implementations).
>> ---
>>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> index 61a72581..a0aad852 100644
>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>> @@ -49,9 +49,7 @@ __aeabi_memmove:
>>       subs    r3, r3, #1
>>       bcs    1b
>>   2:
>> -    pop    {r4}
>> -    pop    {r1}
>> -    bx    r1
>> +    pop    {r4, pc}
>>   3:
>>       movs    r3, #0
>>       cmp    r2, #0
>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> index aa8f2719..5bb80b20 100644
>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>> @@ -110,9 +110,7 @@ __aeabi_memset:
>>       cmp    r4, r3
>>       bne    8b
>>   9:
>> -    pop    {r4, r5, r6}
>> -    pop    {r1}
>> -    bx    r1
>> +    pop    {r4, r5, r6, pc}
>>   10:
>>       movs    r3, r0
>>       movs    r4, r1
>>
>
> No.  That isn't interworking clean on armv4t, which we still need to
> support.
>
> Sorry.
>
> R.

However, a patch that tests __ARM_ARCH >=5 and uses your improved
sequence only in that case (preserving the old code otherwise) would
probably be OK :-)

R.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

Christos Gentsos
On Mon, Sep 30 2019 at 16:37:26 +0100, Richard Earnshaw (lists) wrote:

> On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:
>> On 30/09/2019 15:55, Christos Gentsos wrote:
>>> The same pop instruction that is used to restore registers can be used
>>> to return from the function (as it is already done in other function
>>> implementations).
>>> ---
>>>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>>>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>>>   2 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> index 61a72581..a0aad852 100644
>>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>> @@ -49,9 +49,7 @@ __aeabi_memmove:
>>>       subs    r3, r3, #1
>>>       bcs    1b
>>>   2:
>>> -    pop    {r4}
>>> -    pop    {r1}
>>> -    bx    r1
>>> +    pop    {r4, pc}
>>>   3:
>>>       movs    r3, #0
>>>       cmp    r2, #0
>>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> index aa8f2719..5bb80b20 100644
>>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>> @@ -110,9 +110,7 @@ __aeabi_memset:
>>>       cmp    r4, r3
>>>       bne    8b
>>>   9:
>>> -    pop    {r4, r5, r6}
>>> -    pop    {r1}
>>> -    bx    r1
>>> +    pop    {r4, r5, r6, pc}
>>>   10:
>>>       movs    r3, r0
>>>       movs    r4, r1
>>>
>>
>> No.  That isn't interworking clean on armv4t, which we still need to
>> support.
>>
>> Sorry.
>>
>> R.
>
> However, a patch that tests __ARM_ARCH >=5 and uses your improved
> sequence only in that case (preserving the old code otherwise) would
> probably be OK :-)
>
> R.

Oh sorry then, I wasn't aware of that, thanks for the correction. I
re-made the patch such that it now checks for __ARM_ARCH, as per your
suggestion. Does it look better?

Thanks,
Christos
---
 newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 ++++
 newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
index 61a72581..465a5a19 100644
--- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
@@ -49,9 +49,13 @@ __aeabi_memmove:
  subs r3, r3, #1
  bcs 1b
 2:
+#if __ARM_ARCH >= 5
+ pop {r4, pc}
+#else
  pop {r4}
  pop {r1}
  bx r1
+#endif
 3:
  movs r3, #0
  cmp r2, #0
diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
index aa8f2719..52094a7b 100644
--- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
+++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
@@ -110,9 +110,13 @@ __aeabi_memset:
  cmp r4, r3
  bne 8b
 9:
+#if __ARM_ARCH >= 5
+ pop {r4, r5, r6, pc}
+#else
  pop {r4, r5, r6}
  pop {r1}
  bx r1
+#endif
 10:
  movs r3, r0
  movs r4, r1
--
2.23.0
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize epilogue in thumb __aeabi_{memmove,memset} implementations

Richard Earnshaw (lists)
Thanks.  I tweaked this slightly to include the compatibility header
acle-compat.h and put this in.

R.

On 30/09/2019 18:16, Christos Gentsos wrote:

> On Mon, Sep 30 2019 at 16:37:26 +0100, Richard Earnshaw (lists) wrote:
>> On 30/09/2019 16:31, Richard Earnshaw (lists) wrote:
>>> On 30/09/2019 15:55, Christos Gentsos wrote:
>>>> The same pop instruction that is used to restore registers can be used
>>>> to return from the function (as it is already done in other function
>>>> implementations).
>>>> ---
>>>>    newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 +---
>>>>    newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 +---
>>>>    2 files changed, 2 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> index 61a72581..a0aad852 100644
>>>> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
>>>> @@ -49,9 +49,7 @@ __aeabi_memmove:
>>>>        subs    r3, r3, #1
>>>>        bcs    1b
>>>>    2:
>>>> -    pop    {r4}
>>>> -    pop    {r1}
>>>> -    bx    r1
>>>> +    pop    {r4, pc}
>>>>    3:
>>>>        movs    r3, #0
>>>>        cmp    r2, #0
>>>> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> index aa8f2719..5bb80b20 100644
>>>> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
>>>> @@ -110,9 +110,7 @@ __aeabi_memset:
>>>>        cmp    r4, r3
>>>>        bne    8b
>>>>    9:
>>>> -    pop    {r4, r5, r6}
>>>> -    pop    {r1}
>>>> -    bx    r1
>>>> +    pop    {r4, r5, r6, pc}
>>>>    10:
>>>>        movs    r3, r0
>>>>        movs    r4, r1
>>>>
>>>
>>> No.  That isn't interworking clean on armv4t, which we still need to
>>> support.
>>>
>>> Sorry.
>>>
>>> R.
>>
>> However, a patch that tests __ARM_ARCH >=5 and uses your improved
>> sequence only in that case (preserving the old code otherwise) would
>> probably be OK :-)
>>
>> R.
>
> Oh sorry then, I wasn't aware of that, thanks for the correction. I
> re-made the patch such that it now checks for __ARM_ARCH, as per your
> suggestion. Does it look better?
>
> Thanks,
> Christos
> ---
>   newlib/libc/machine/arm/aeabi_memmove-thumb.S | 4 ++++
>   newlib/libc/machine/arm/aeabi_memset-thumb.S  | 4 ++++
>   2 files changed, 8 insertions(+)
>
> diff --git a/newlib/libc/machine/arm/aeabi_memmove-thumb.S b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> index 61a72581..465a5a19 100644
> --- a/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memmove-thumb.S
> @@ -49,9 +49,13 @@ __aeabi_memmove:
>   subs r3, r3, #1
>   bcs 1b
>   2:
> +#if __ARM_ARCH >= 5
> + pop {r4, pc}
> +#else
>   pop {r4}
>   pop {r1}
>   bx r1
> +#endif
>   3:
>   movs r3, #0
>   cmp r2, #0
> diff --git a/newlib/libc/machine/arm/aeabi_memset-thumb.S b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> index aa8f2719..52094a7b 100644
> --- a/newlib/libc/machine/arm/aeabi_memset-thumb.S
> +++ b/newlib/libc/machine/arm/aeabi_memset-thumb.S
> @@ -110,9 +110,13 @@ __aeabi_memset:
>   cmp r4, r3
>   bne 8b
>   9:
> +#if __ARM_ARCH >= 5
> + pop {r4, r5, r6, pc}
> +#else
>   pop {r4, r5, r6}
>   pop {r1}
>   bx r1
> +#endif
>   10:
>   movs r3, r0
>   movs r4, r1
>