[Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Dinar Temirbulatov
Hello,
Following patch redefines atomic_exchange_acq/atomic_exchange_rel to
__atomic_exchange_n for ARM, that allows for example to reduce number
of instruction sequence for lll_unlock
from:
ldex, cmp, bne, stex, cmp, bne
to
ldex, stex, cmp, bne
, more on the issue here:
http://sourceware.org/bugzilla/show_bug.cgi?id=15640

This patch was tested on ARM a9 with glibc testsuite with no new
regressions. OK to commit?

                       Thanks, Dinar.

arm_atomic.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Abhishek Deb
Dinar,

Apologies for delayed response, as I was on vacation. Your changes look in line with the MIPS changes. Since GCC 4.7. onwards _atomic builtins are provided, it's the right thing to do. I have not tested your changes, but I don't see anything obviously wrong.

Following is how the compiled code looks like, when compiled with gcc 4.8

Atomic_exchange_rel :

  c8:   f57ff05f        dmb     sy
  cc:   e1952f9f        ldrex   r2, [r5]
  d0:   e1851f93        strex   r1, r3, [r5]
  d4:   e3510000        cmp     r1, #0
  d8:   1afffffb        bne     cc

Atomic_exchange_acq :

  94:   e1953f9f        ldrex   r3, [r5]
  98:   e1852f97        strex   r2, r7, [r5]
  9c:   e3520000        cmp     r2, #0
  a0:   1afffffb        bne     94
  a4:   e3530000        cmp     r3, #0
  a8:   130f6c18        movwne  r6, #64536      ; 0xfc18
  ac:   f57ff05f        dmb     sy

So these changes look correct.

However, taking a deeper look into atomic_compare_and_exchange_acq/rel, following is how the compiled code looks like

  34:   f57ff05f        dmb     sy
  38:   e1903f9f        ldrex   r3, [r0]
  3c:   e3530000        cmp     r3, #0
  40:   1a000002        bne     50
  44:   e1801f92        strex   r1, r2, [r0]
  48:   e3510000        cmp     r1, #0
  4c:   1afffff9        bne     38
  50:   e3530000        cmp     r3, #0
  54:   f57ff05f        dmb     sy
  58:   1afffff5        bne     34

As you can see it contains two memory barrier one in the beginning and one in the end. Is it because gcc's _sync_* builtin was used? The _atomic_* builtins provides users to specify memodel for each of the builtin functions. Hence, shouldn't _atomic_ builtin with appropriate memmodel be used to do an atomic_compare_and_exchange_acq/rel?

In other words for a acquire semantic the first dmb is redundant and for a release semantic the second one is redundant. I don't know what do you think.

Abhishek


-----Original Message-----
From: Dinar Temirbulatov [mailto:[hidden email]]
Sent: Tuesday, August 13, 2013 10:11 AM
To: [hidden email]
Cc: [hidden email]; Abhishek Deb
Subject: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Hello,
Following patch redefines atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n for ARM, that allows for example to reduce number of instruction sequence for lll_unlock
from:
ldex, cmp, bne, stex, cmp, bne
to
ldex, stex, cmp, bne
, more on the issue here:
http://sourceware.org/bugzilla/show_bug.cgi?id=15640

This patch was tested on ARM a9 with glibc testsuite with no new regressions. OK to commit?

                       Thanks, Dinar.
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Dinar Temirbulatov
Hello Abhishek,

>However, taking a deeper look into atomic_compare_and_exchange_acq/rel, following is how the compiled code looks like
>
>  34:   f57ff05f        dmb     sy
>  38:   e1903f9f        ldrex   r3, [r0]
>  3c:   e3530000        cmp     r3, #0
>  40:   1a000002        bne     50
>  44:   e1801f92        strex   r1, r2, [r0]
>  48:   e3510000        cmp     r1, #0
>  4c:   1afffff9        bne     38
>  50:   e3530000        cmp     r3, #0
>  54:   f57ff05f        dmb     sy
>  58:   1afffff5        bne     34
oh, I see that lll_unlock looks correct on my side:
.LBB12:
        .loc 1 42 0
        dmb     sy
.L21:
        ldrex   r2, [r4]
        strex   r1, r3, [r4]
        cmp     r1, #0
        bne     .L21
.LVL10:

and for example __lll_cond_lock/__lll_timedlock also look right.

But, I see that for  lll_lock defined as :
#define __lll_lock(futex, private)                                            \
  ((void) ({                                                                  \
    int *__futex = (futex);                                                   \
    if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex,       \
                                                                1, 0), 0))    \
      {                                                                       \
        if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)       \
          __lll_lock_wait_private (__futex);                                  \
        else                                                                  \
          __lll_lock_wait (__futex, private);                                 \
      }                                                                       \
  }))

and we could see that on instruction level it looks like this:
        dmb     sy
.L91:
        ldrex   r0, [r4]
        cmp     r0, r3
        bne     .L92
        strex   r1, r7, [r4]
        cmp     r1, #0
        bne     .L91
.L92:
.LBE10:
        .loc 1 204 0
        cmp     r3, r0
.LBB11:
        .loc 1 202 0
        dmb     sy

, so nothing wrong on my side.
              thanks, Dinar.
Reply | Threaded
Open this post in threaded view
|

RE: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Abhishek Deb
Dinar,

As I said currently __arch_compare_and_exchange_val_32_acq is defined to __sync_val_compare_and_swap ((mem), (oldval), (newval)), which probably uses two dmb.
Can't __atomic_compare_exchange_n be used and appropriate memodel be specified for acquire and release variants?

-----Original Message-----
From: Dinar Temirbulatov [mailto:[hidden email]]
Sent: Friday, August 16, 2013 3:58 PM
To: Abhishek Deb
Cc: [hidden email]; [hidden email]
Subject: Re: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Hello Abhishek,

>However, taking a deeper look into atomic_compare_and_exchange_acq/rel, following is how the compiled code looks like
>
>  34:   f57ff05f        dmb     sy
>  38:   e1903f9f        ldrex   r3, [r0]
>  3c:   e3530000        cmp     r3, #0
>  40:   1a000002        bne     50
>  44:   e1801f92        strex   r1, r2, [r0]
>  48:   e3510000        cmp     r1, #0
>  4c:   1afffff9        bne     38
>  50:   e3530000        cmp     r3, #0
>  54:   f57ff05f        dmb     sy
>  58:   1afffff5        bne     34
oh, I see that lll_unlock looks correct on my side:
.LBB12:
        .loc 1 42 0
        dmb     sy
.L21:
        ldrex   r2, [r4]
        strex   r1, r3, [r4]
        cmp     r1, #0
        bne     .L21
.LVL10:

and for example __lll_cond_lock/__lll_timedlock also look right.

But, I see that for  lll_lock defined as :
#define __lll_lock(futex, private)                                            \
  ((void) ({                                                                  \
    int *__futex = (futex);                                                   \
    if (__builtin_expect (atomic_compare_and_exchange_val_acq (__futex,       \
                                                                1, 0), 0))    \
      {                                                                       \
        if (__builtin_constant_p (private) && (private) == LLL_PRIVATE)       \
          __lll_lock_wait_private (__futex);                                  \
        else                                                                  \
          __lll_lock_wait (__futex, private);                                 \
      }                                                                       \
  }))

and we could see that on instruction level it looks like this:
        dmb     sy
.L91:
        ldrex   r0, [r4]
        cmp     r0, r3
        bne     .L92
        strex   r1, r7, [r4]
        cmp     r1, #0
        bne     .L91
.L92:
.LBE10:
        .loc 1 204 0
        cmp     r3, r0
.LBB11:
        .loc 1 202 0
        dmb     sy

, so nothing wrong on my side.
              thanks, Dinar.
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Joseph Myers
In reply to this post by Dinar Temirbulatov
On Tue, 13 Aug 2013, Dinar Temirbulatov wrote:

> Hello,
> Following patch redefines atomic_exchange_acq/atomic_exchange_rel to
> __atomic_exchange_n for ARM, that allows for example to reduce number
> of instruction sequence for lll_unlock
> from:
> ldex, cmp, bne, stex, cmp, bne
> to
> ldex, stex, cmp, bne
> , more on the issue here:
> http://sourceware.org/bugzilla/show_bug.cgi?id=15640
>
> This patch was tested on ARM a9 with glibc testsuite with no new
> regressions. OK to commit?

What does this do when building for an ARM architecture version where the
kernel helpers need to be used for atomicity?

I think what's desired in such a case is for the operations to continue to
be expanded with inline calls to the kernel helpers.  Generating
out-of-line calls to libgcc helpers would be less desirable (and
out-of-line calls to anything not in libgcc.a would break the build).

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] ARM define atomic_exchange_acq/atomic_exchange_rel to __atomic_exchange_n

Dinar Temirbulatov
In reply to this post by Abhishek Deb
Abhishek,

>As I said currently __arch_compare_and_exchange_val_32_acq is defined to __sync_val_compare_and_swap ((mem), (oldval), (newval)), which probably uses two dmb.
>Can't __atomic_compare_exchange_n be used and appropriate memodel be specified for acquire and release variants?
Yes, Looks like that makes sense.
                   thanks, Dinar.