[Bug ports/3775] New: kernel's zlib code upgrade triggers glibc>=2.4 misbehaviour

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug ports/3775] New: kernel's zlib code upgrade triggers glibc>=2.4 misbehaviour

tim@mr-dog.net
During 2.6.18 release cycle these changes(1) made to mainline kernel:

31925c8857ba17c11129b766a980ff7c87780301 [PATCH] Fix ppc32 zImage inflate
b762450e84e20a179ee5993b065caaad99a65fbf [PATCH] zlib inflate: fix function
definitions
0ecbf4b5fc38479ba29149455d56c11a23b131c0 move acknowledgment for Mark Adler to
CREDITS
4f3865fb57a04db7cca068fed1c15badc064a302 [PATCH] zlib_inflate: Upgrade library
code to a recent version

Which triggers glibc>=2.4 misbehaviour: applications segfaults on libutil (and
probably some other libraries) usage, when libutil placed into compressed
filesystem. Confirmed using EABI and old ABI, using different arm boards.

To reproduce do these steps:
1. Grab test.c from http://article.gmane.org/gmane.linux.ports.arm.kernel/28068
   (I'll also attach it)
2. arm-linux-gcc test.c -lutil -o test
On target with glibc>=2.4 and kernel>=2.6.18:
3. mkcramfs /lib testfs
4. mount -t cramfs testfs /mnt/testfs -o loop
5. LD_LIBRARY_PATH=/mnt/testfs ./test
(segfault or "illegal instruction" expected, no core dumped)

On the second run there will no segfault, which shows that if library file
cached then glibc properly operates with it. As another prove, you can "cat
/mnt/testfs/libutil.so.1 > /dev/null" prior to running test, and it will not
segfault then.

Bug also reproducible using JFFS2 filesystem with zlib compression. Bug can be
reproduced several times without reboot using umount/mount sequence, which
flushes files cache.

Bug also irreproducible using glibc-2.3.x. Bug also irreproducible if using
LD_BIND_NOW. Reverting changes(1) from the kernel also eliminates glibc
misbehaviour.

Also known, that first and second run of md5sum on libutil file producing same
results, which puts kernel's new zlib code almost above suspicion: cramfs/jffs2
using only one inflation/deflation path inside kernel.

New kernel's zlib code is a bit faster, and this could trigger race or timing
issue, which proves by these commands:
(/tmp is tmpfs, fast)
LD_DEBUG=all LD_LIBRARY_PATH=/mnt/testfs ./test 2> /tmp/2   <- segfaults
LD_DEBUG=all LD_LIBRARY_PATH=/mnt/testfs ./test             <- not segfaults

I.e. (IMHO) printf's which outputs to slow terminal "delays" dynamic loader
execution and thus we're not seeing segfault.

Of course, it could be still kernel bug, not glibc.. but we're stuck at that
moment and asking for help.

Thanks!

References:
http://article.gmane.org/gmane.linux.ports.arm.kernel/28068
http://lkml.org/lkml/2006/11/3/39
http://bugzilla.handhelds.org/show_bug.cgi?id=1773
http://bugs.openembedded.org/show_bug.cgi?id=1684

--
           Summary: kernel's zlib code upgrade triggers glibc>=2.4
                    misbehaviour
           Product: glibc
           Version: 2.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: ports
        AssignedTo: roland at gnu dot org
        ReportedBy: ya-cbou at yandex dot ru
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: arm-unknown-linux
GCC target triplet: arm-unknown-linux


http://sourceware.org/bugzilla/show_bug.cgi?id=3775

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug ports/3775] kernel's zlib code upgrade triggers glibc>=2.4 misbehaviour

tim@mr-dog.net

------- Additional Comments From ya-cbou at yandex dot ru  2006-12-20 21:22 -------
Created an attachment (id=1467)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=1467&action=view)
testcase


--


http://sourceware.org/bugzilla/show_bug.cgi?id=3775

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug ports/3775] kernel's zlib code upgrade triggers glibc>=2.4 misbehaviour

tim@mr-dog.net
In reply to this post by tim@mr-dog.net

------- Additional Comments From ya-cbou at yandex dot ru  2006-12-20 21:36 -------
Created an attachment (id=1468)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=1468&action=view)
LD_DEBUG=all output

This is diff of LD_DEBUG=all output, "1" and "2" files produced by:
LD_DEBUG=all LD_LIBRARY_PATH=/mnt/testfs/ /mnt/testfs/ld-2.5.so ./test 2>
/tmp/1
(first run, which segfaulted)
LD_DEBUG=all LD_LIBRARY_PATH=/mnt/testfs/ /mnt/testfs/ld-2.5.so ./test 2>
/tmp/2
(second run, which not segfaulted)

--


http://sourceware.org/bugzilla/show_bug.cgi?id=3775

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug ports/3775] kernel's zlib code upgrade triggers glibc>=2.4 misbehaviour

tim@mr-dog.net
In reply to this post by tim@mr-dog.net

------- Additional Comments From ya-cbou at yandex dot ru  2006-12-23 01:12 -------
Try reassign bug to Daniel Jacobowitz.

--
           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|roland at gnu dot org       |drow at sources dot redhat
                   |                            |dot com
             Status|NEW                         |ASSIGNED


http://sourceware.org/bugzilla/show_bug.cgi?id=3775

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
Reply | Threaded
Open this post in threaded view
|

[Bug ports/3775] kernel's zlib code upgrade triggers glibc>=2.4 misbehaviour

tim@mr-dog.net
In reply to this post by tim@mr-dog.net

------- Additional Comments From drow at sources dot redhat dot com  2006-12-24 04:47 -------
I think it is entirely implausible that glibc could be at fault.  In general
there is very little randomness in application startup; if something works some
of the time and crashes other times, the kernel will be to blame.  If you have
more specific information about what's wrong, feel free to reopen when you have
a better idea of what is failing.  Dynamic linking on ARM does not even require
cache flushing in userspace.

A good idea might be to find the faulting instruction.

--
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |INVALID


http://sourceware.org/bugzilla/show_bug.cgi?id=3775

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.