[Bug nss/26038] New: [core_dump] call getpwnam_r() appears core_dump

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] New: [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

            Bug ID: 26038
           Summary: [core_dump] call getpwnam_r() appears core_dump
           Product: glibc
           Version: 2.31
            Status: UNCONFIRMED
          Severity: critical
          Priority: P2
         Component: nss
          Assignee: unassigned at sourceware dot org
          Reporter: liaichun at huawei dot com
  Target Milestone: ---

When I repeatedly call getpwnam_r (), there will be a small probability of
coredump. I analyze the cause of the problem,please see the description below.

nss/getXXbyYY_r.c

libc_hidden_proto (DB_LOOKUP_FCT)
int
INTERNAL (REENTRANT_NAME) (ADD_PARAMS, LOOKUP_TYPE *resbuf, char *buffer,
                           size_t buflen, LOOKUP_TYPE **result H_ERRNO_PARM
                           EXTRA_PARAMS)
{
   ......
   static bool startp_initialized;
   static lookup_function start_fct;
   union
  {
    lookup_function l;
    void *ptr;
  } fct;
  ......
  if (! startp_initialized)
    {
      no_more = DB_LOOKUP_FCT (&nip, REENTRANT_NAME_STRING,
                               REENTRANT2_NAME_STRING, &fct.ptr);
      if (no_more)     /* first init, if no_more, start_fct would always 0x0 */
        {
         /* do something */  
        }
      else
        {
          void *tmp_ptr = fct.l;
          start_fct = tmp_ptr;
          /* do something */
        }
      atomic_write_barrier ();
      startp_initialized = true;
   }
  else
    {
      fct.l = start_fct;   /* if first no_more, start_fct is 0x0, fct.l would
be always 0x0. */
      nip = startp;

      no_more = nip == (service_user *) -1l;
    }
  ......
  while (no_more == 0)
    {
#ifdef NEED_H_ERRNO
      any_service = true;
#endif
      status = DL_CALL_FCT (fct.l, (ADD_VARIABLES, resbuf, buffer, buflen,
                                    &errno H_ERRNO_VAR EXTRA_VARIABLES));
    /* fct.l is 0x0, and here call function fct.1 would core dump */
  ......
}

1、when first call DB_LOOKUP_FCT() return -1, startp_initialized would be true,
but start_fct stil is 0x0, and would never change.
2、Once start_fct is 0x0, fct.l would be always 0x0 n matter how many times
getpwnam_r() is called.
3、if fct.l is 0x0, the next call DL_CALL_FCT(fct.l,xxx) would cause core_dump.

==============================================================================
I think if first startp_initialized err, next should call DB_LOOKUP_FCT()
again, util successfully get, and only then is the initialization successful.

libc_hidden_proto (DB_LOOKUP_FCT)
int
INTERNAL (REENTRANT_NAME) (ADD_PARAMS, LOOKUP_TYPE *resbuf, char *buffer,
                           size_t buflen, LOOKUP_TYPE **result H_ERRNO_PARM
                           EXTRA_PARAMS)
{
   ......
   static bool startp_initialized;
   static lookup_function start_fct;
   union
  {
    lookup_function l;
    void *ptr;
  } fct;
  ......
  if (! startp_initialized)
    {
      no_more = DB_LOOKUP_FCT (&nip, REENTRANT_NAME_STRING,
                               REENTRANT2_NAME_STRING, &fct.ptr);
      if (no_more)    
        {
-         void *tmp_ptr = (service_user *) -1l;
-#ifdef PTR_MANGLE
-         PTR_MANGLE (tmp_ptr);
-#endif
-         startp = tmp_ptr;
+#ifdef NEED_H_ERRNO
+       *h_errnop = NETDB_INTERNAL;
+#endif
+       *result = NULL;
+       return errno;
        }
      else
        {
          void *tmp_ptr = fct.l;
          start_fct = tmp_ptr;
          /* do something */
        }
      atomic_write_barrier ();
      startp_initialized = true;
   }

  ......
}

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #1 from liaichun <liaichun at huawei dot com> ---
here is my debug code, i increased the judgment of fct, and triggered it
  else
    {
      fct.l = start_fct;  
      nip = startp;
if (fct.l == 0x0)            
   *(int *)0 = 0   
    }

#0  __getpwnam_r (name=0xffffbd62f940 <g_logOwner> "snasuser",
resbuf=0xffffbd0257a8, buffer=0xffffbd0257d8 "", buflen=96, result=0x0) at
../nss/getXXbyYY_r.c:306
306             *(int *)0 = 0;
[Current thread is 1 (Thread 0xffffbd0261d0 (LWP 83654))]
Missing separate debuginfos, use: dnf debuginfo-install
snas-cm-3.6.1.126270-1.aarch64
(gdb) bt
#0  __getpwnam_r (name=0xffffbd62f940 <g_logOwner> "snasuser",
resbuf=0xffffbd0257a8, buffer=0xffffbd0257d8 "", buflen=96, result=0x0) at
../nss/getXXbyYY_r.c:306
#1  0x0000ffffbd46c7d4 in __LOG_LogSetOwner (fd=7) at base_log.c:1778
#2  0x0000ffffbd46b988 in Base_Log_FileNeedExport (iFd=3, pcPath=0xffffbd025938
"/var/log/snas_CM.log", moduleId=MODULE_CM) at base_log.c:942
#3  0x0000ffffbd46ca1c in __LOG_LogSelfTask (pArgv=0xffffbd638198 <g_acSubSys>)
at base_log.c:1874
#4  0x0000ffffbe4538bc in start_thread (arg=0xffffc2be497f) at
pthread_create.c:486
#5  0x0000ffffbd1fda1c in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb)

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #2 from Andreas Schwab <[hidden email]> ---
If no_more is nonzero then the loop is never entered.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #3 from liaichun <liaichun at huawei dot com> ---
(In reply to Andreas Schwab from comment #2)
> If no_more is nonzero then the loop is never entered.

---int no_more not static int no_more,so when i call getpwnam_r () again
,no_more would be zero, and into the loop

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #4 from Andreas Schwab <[hidden email]> ---
no_more is set in both branches.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #5 from liaichun <liaichun at huawei dot com> ---
(In reply to Andreas Schwab from comment #4)
> no_more is set in both branches.

----do you mean here:
      no_more = nip == (service_user *) -1l;
      while (no_more == 0)
  when call again, nip is 0x0,  and no_more is also zero;

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #6 from Andreas Schwab <[hidden email]> ---
nip cannot be 0.  It is either a valid function address or -1.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #7 from liaichun <liaichun at huawei dot com> ---
(In reply to Andreas Schwab from comment #6)
> nip cannot be 0.  It is either a valid function address or -1.

Sorry,I haven't watched all before,
but it did happen fct.l is zero and nip is a valid function address .

(gdb) info local
startp_initialized = true
startp = 0x58b26143a952eb88
start_fct = 0x58b23416ed3e4548
nip = 0xffffac000b20
do_merge = 0
mergegrp = <optimized out>
mergebuf = 0x0
endptr = 0x0
fct = {l = 0x0, ptr = 0x0}
no_more = <optimized out>
err = <optimized out>
status = NSS_STATUS_UNAVAIL
nscd_status = <optimized out>
res = <optimized out>
(gdb) q

I guess this is caused by multi-threaded calls ?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #8 from Andreas Schwab <[hidden email]> ---
What do 0x58b26143a952eb88 and 0x58b23416ed3e4548 demangle to?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #9 from Andreas Schwab <[hidden email]> ---
nip should of course be a valid heap address or -1.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #10 from liaichun <liaichun at huawei dot com> ---
(In reply to Andreas Schwab from comment #9)
> nip should of course be a valid heap address or -1.

Sorry, i removed all my edits,and tried for a long time,it finally reappeared.

(gdb) p __pointer_chk_guard
$1 = 8256925845475092582
(gdb) p * __pointer_chk_guard
Cannot access memory at address 0x72967e4f3448e866
(gdb) p/x __pointer_chk_guard
$2 = 0x72967e4f3448e866
(gdb) bt
#0  0x72967e4f3448e866 in ?? ()
#1  0x0000ffff980683a4 in __getpwnam_r (name=0xffff984ca940 <g_logOwner>
"snasuser", resbuf=0xffff975be868, buffer=0xffff975be898 "", buflen=96,
result=0x0) at ../nss/getXXbyYY_r.c:315
#2  0x0000ffff983077d4 in __LOG_LogSetOwner (fd=7) at base_log.c:1778
#3  0x0000ffff98308cc0 in __LOG_Common_LogSelfTask (pArgv=0xffff991a1df8
<g_stLogUserAuditMgt>) at base_log.c:2588
#4  0x0000ffff992ee8bc in start_thread (arg=0xfffffcac7aff) at
pthread_create.c:486
#5  0x0000ffff980989dc in thread_start () at
../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) info local
startp_initialized = true
startp = 0x7296d4e5ca99a906
start_fct = 0x729681b0a29137c6
nip = 0x72967e4f3448e866
do_merge = 0
mergegrp = <optimized out>
mergebuf = 0xffff9841a658 "V100R001C01"
endptr = 0x0
fct = {l = 0x72967e4f3448e866, ptr = 0x72967e4f3448e866}
no_more = 0
err = <optimized out>
status = NSS_STATUS_UNAVAIL
nscd_status = <optimized out>
res = <optimized out>
(gdb) exit

nip is 0x72967e4f3448e866 after PTR_DEMANGLE, so i guess nip is zero before
PTR_DEMANGLE?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #11 from Andreas Schwab <[hidden email]> ---
Most likely you have some memory corruption.  Try valgrind.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #12 from liaichun <liaichun at huawei dot com> ---
(In reply to Andreas Schwab from comment #11)
> Most likely you have some memory corruption.  Try valgrind.

Thanks, I will check it.
And there is also a message ,i tried add atomic_read_barrier() before nip
assigned, it hasn't appeared in 2 weeks.If nothing changes, it can appear 4 or
5 times in 2 weeks.

--------------------------------------
libc_hidden_proto (DB_LOOKUP_FCT)
  else
    {
+     atomic_read_barrier();
      fct.l = start_fct;
      nip = startp;
#ifdef PTR_DEMANGLE
      PTR_DEMANGLE (fct.l);
      PTR_DEMANGLE (nip);
#endif
      no_more = nip == (service_user *) -1l;
    }

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

--- Comment #13 from liaichun <liaichun at huawei dot com> ---
(In reply to liaichun from comment #12)

> (In reply to Andreas Schwab from comment #11)
> > Most likely you have some memory corruption.  Try valgrind.
>
> Thanks, I will check it.
> And there is also a message ,i tried add atomic_read_barrier() before nip
> assigned, it hasn't appeared in 2 weeks.If nothing changes, it can appear 4
> or 5 times in 2 weeks.
>
> --------------------------------------
> libc_hidden_proto (DB_LOOKUP_FCT)
>   else
>     {
> +     atomic_read_barrier();
>       fct.l = start_fct;
>       nip = startp;
> #ifdef PTR_DEMANGLE
>       PTR_DEMANGLE (fct.l);
>       PTR_DEMANGLE (nip);
> #endif
>       no_more = nip == (service_user *) -1l;
>     }

================================================================
I Try valgrind, but nothing found.
And the same coredump occurs several times ...
If i add atomic_read_barrier() before nip assigned, and that never happened
again.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug nss/26038] [core_dump] call getpwnam_r() appears core_dump

Sourceware - glibc-bugs mailing list
In reply to this post by Sourceware - glibc-bugs mailing list
https://sourceware.org/bugzilla/show_bug.cgi?id=26038

liaichun <liaichun at huawei dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liaichun at huawei dot com

--- Comment #14 from liaichun <liaichun at huawei dot com> ---
Created attachment 12757
  --> https://sourceware.org/bugzilla/attachment.cgi?id=12757&action=edit
nss-make-sure-startp_initialized-do-first

--
You are receiving this mail because:
You are on the CC list for the bug.