[Bug string/26091] New: strcpy cost more time in glibc-2.31

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug string/26091] New: strcpy cost more time in glibc-2.31

Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26091

            Bug ID: 26091
           Summary: strcpy cost more time in glibc-2.31
           Product: glibc
           Version: 2.31
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: string
          Assignee: unassigned at sourceware dot org
          Reporter: guojinhui at huawei dot com
  Target Milestone: ---

Created attachment 12602
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12602&action=edit
I reduced the bug to a stand-alone test case, now attached.

When I use strcpy to copy ten byte of data, it takes 70ns in glibc-2.31 while
53ns in glibc-2.29. I found it related to the address of strcpy. When the
address of strcpy is 32-byte alignment, it takes less time than 16-byte
alignment.

------------------------------------------------------------------------
testcase                 address           alignment          time(ns)
------------------------------------------------------------------------
strcpy_10_libmicro       0x95AF0           16                 70.48611
strcpy_10_libmicro       0x95C90           16                 69.54695
strcpy_10_libmicro       0x95C10           16                 69.0097
strcpy_10_libmicro       0x95AE0           32                 53.42931
strcpy_10_libmicro       0x95B00           32                 53.28875
strcpy_10_libmicro       0x95B20           32                 53.29308
strcpy_10_libmicro       0x95B40           32                 53.31686
strcpy_10_libmicro       0x95B60           32                 53.28691
------------------------------------------------------------------------

Thus, should it be 32-byte alignment?

 14 diff --git a/sysdeps/powerpc/powerpc32/strcpy.S
b/sysdeps/powerpc/powerpc32/strcpy.S
 15 index 0067e76..7a8badd 100644
 16 --- a/sysdeps/powerpc/powerpc32/strcpy.S
 17 +++ b/sysdeps/powerpc/powerpc32/strcpy.S
 18 @@ -22,7 +22,7 @@
 19
 20  /* char * [r3] strcpy (char *dest [r3], const char *src [r4])  */
 21
 22 -EALIGN (strcpy, 4, 0)
 23 +EALIGN (strcpy, 5, 0)
 24
 25  #define rTMP   r0
 26  #define rRTN   r3      /* incoming DEST arg preserved as result */
 27 --
 28 2.12.3

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug string/26091] strcpy cost more time in glibc-2.31

Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26091

--- Comment #1 from JinhuiGuo <guojinhui at huawei dot com> ---
test case

I reduced the bug to a stand-alone test case, now attached.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug string/26091] strcpy cost more time in glibc-2.31

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26091

--- Comment #2 from JinhuiGuo <guojinhui at huawei dot com> ---
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <string.h>

int s = 10;
int unaligned = 0;

void init_str(char *str)
{
        static char *demo =
                "The quick brown fox jumps over the lazy dog.";
        int l = strlen(demo);
        int i;
        for (i = 0; i < s; i++) {
                str[i] = demo[i % l];
        }

        str[s] = 0;
}

int main(void)
{
        int i;
        struct timespec tv;
        struct timespec tv1;

        char *src2 = (char *)malloc(s + 1);
        char *src = (char *)malloc(s + 1 + unaligned);
        init_str(src2);
        src2 += unaligned;

        clock_gettime(CLOCK_MONOTONIC, &tv);

        for (i = 0; i < 1100000; i += 10) {
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
                (void) strcpy(src, src2);
        }

        clock_gettime(CLOCK_MONOTONIC, &tv1);
        long long  tmp = ((long long)tv1.tv_sec * 1000000000LL) - ((long
long)tv.tv_sec * 1000000000LL)  + ((long long)tv1.tv_nsec ) - ((long
long)tv.tv_nsec);
        printf("cost: %f ns\n", ((double)tmp) / i);

        src2 -= unaligned;
        free(src);
        free(src2);

        return 0;
}

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug string/26091] strcpy cost more time in glibc-2.31

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26091

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg

--- Comment #3 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to JinhuiGuo from comment #0)

> Created attachment 12602 [details]
> I reduced the bug to a stand-alone test case, now attached.
>
> When I use strcpy to copy ten byte of data, it takes 70ns in glibc-2.31
> while 53ns in glibc-2.29. I found it related to the address of strcpy. When
> the address of strcpy is 32-byte alignment, it takes less time than 16-byte
> alignment.
>
> ------------------------------------------------------------------------
> testcase                 address           alignment          time(ns)
> ------------------------------------------------------------------------
> strcpy_10_libmicro       0x95AF0           16              70.48611
> strcpy_10_libmicro 0x95C90   16              69.54695
> strcpy_10_libmicro 0x95C10   16              69.0097
> strcpy_10_libmicro 0x95AE0   32              53.42931
> strcpy_10_libmicro 0x95B00   32              53.28875
> strcpy_10_libmicro 0x95B20   32              53.29308
> strcpy_10_libmicro 0x95B40   32              53.31686
> strcpy_10_libmicro 0x95B60   32              53.28691
> ------------------------------------------------------------------------

I am seeing the opposite on gcc203 (POWER8) where changing the alignment to 32
(EALIGN (..., 5, 0) increases the cost from ~11.64 to ~12.41 to each call. This
is using the provided benchmark.

In fact this is really micro-arch dependent, where icache alignment might or
not imposes performance issues.  GCC also seems to use different alignment
depending of the target processor (-mcpu=xxx) and the default for powerX is ยด
.palign 4,,15'.

So to actually change the default alignment I would like to check if this is
not a pessimization on generic powerpc32 as it seems for POWER.

>
> Thus, should it be 32-byte alignment?
>
>  14 diff --git a/sysdeps/powerpc/powerpc32/strcpy.S
> b/sysdeps/powerpc/powerpc32/strcpy.S
>  15 index 0067e76..7a8badd 100644
>  16 --- a/sysdeps/powerpc/powerpc32/strcpy.S
>  17 +++ b/sysdeps/powerpc/powerpc32/strcpy.S
>  18 @@ -22,7 +22,7 @@
>  19
>  20  /* char * [r3] strcpy (char *dest [r3], const char *src [r4])  */
>  21
>  22 -EALIGN (strcpy, 4, 0)
>  23 +EALIGN (strcpy, 5, 0)
>  24
>  25  #define rTMP   r0
>  26  #define rRTN   r3      /* incoming DEST arg preserved as result */
>  27 --
>  28 2.12.3

--
You are receiving this mail because:
You are on the CC list for the bug.