[Bug regex/23036] New: glibc-2.27: regex equivalence class regression

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] New: glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

            Bug ID: 23036
           Summary: glibc-2.27: regex equivalence class regression
           Product: glibc
           Version: 2.27
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: regex
          Assignee: unassigned at sourceware dot org
          Reporter: jim at meyering dot net
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Equivalence class regexps broke in glibc-2.27.
E.g., [[=a=]] fails to match à.

With Fedora 27's glibc-2.26, this program exits successfully:

#include <locale.h>
#include <regex.h>
int
main ()
{
  setlocale (LC_ALL, "en_US.UTF-8");
  regex_t r;
  if (regcomp (&r, "[[=a=]]", 0))
    return 9;
  return regexec (&r, "\303\241" /* á */ , 0, NULL, 0);
}

2.26$ gcc k.c && ./a.out; echo $?            
0
========================
But on Fedora 28 beta (with its glibc-2.27), it exits with status 1:

2.27$ gcc k.c && ./a.out; echo $?            
1
========================

Or demonstrate with grep:

2.26$ echo 'á' | LC_ALL=en_US.UTF-8 grep '[[=a=]]'
á

2.27$ echo 'á' | LC_ALL=en_US.UTF-8 grep '[[=a=]]'
2.27$ [Exit 1]

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Andreas Schwab <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2018-04-07
     Ever confirmed|0                           |1

--- Comment #1 from Andreas Schwab <[hidden email]> ---
I cannot reproduce that, it works correctly in openSUSE Tumbleweed.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Dmitry V. Levin <ldv at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ldv at sourceware dot org

--- Comment #3 from Dmitry V. Levin <ldv at sourceware dot org> ---
I cannot reproduce this regression on release/2.27/master branch, must be a
downstream bug.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com,
                   |                            |maiku.fabian at gmail dot com
              Flags|                            |security-

--- Comment #2 from Florian Weimer <fweimer at redhat dot com> ---
Could be a Fedora-specific bug due to the collation data backport.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #4 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Dmitry V. Levin from comment #3)
> I cannot reproduce this regression on release/2.27/master branch, must be a
> downstream bug.

I think Florian is right, it has something to do with my collation
data update.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #5 from jim at meyering dot net <jim at meyering dot net> ---
Any news? Since we're past the "final freeze" date, will this regression remain
in F28 final?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Andreas Schwab <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|---                         |INVALID

--- Comment #6 from Andreas Schwab <[hidden email]> ---
If you want to file a bug for Fedora you are wrong here.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
            Version|2.27                        |2.28
         Resolution|INVALID                     |---
     Ever confirmed|1                           |0

--- Comment #7 from Florian Weimer <fweimer at redhat dot com> ---
Sorry, this also applies to the upstream master branch (which has the CLDR
change applied).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] glibc-2.27: regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|2018-04-07 00:00:00         |2018-04-26
     Ever confirmed|0                           |1

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Dmitry V. Levin <ldv at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|glibc-2.27: regex           |regex equivalence class
                   |equivalence class           |regression
                   |regression                  |

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.redhat.com
                   |                            |/show_bug.cgi?id=1582224

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=23308

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at sourceware dot org   |fweimer at redhat dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #8 from jim at meyering dot net <jim at meyering dot net> ---
FYI, this now works for me on Fedora 28:

$ echo 'á' | LC_ALL=en_US.UTF-8 grep '[[=a=]]'
á

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #14 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to [hidden email] from comment #13)

> $ LC_ALL=en_US.UTF-8 strace -eopenat ./a.out                                
>
> openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
> openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
> openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC)
> = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) =
> 3
> openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
> 0
> +++ exited with 0 +++

Looks fine to me.  But considering that I still get the no match error, I have
to figure out why this happens for me.  Hopefully this will also shed some
light on the upstream bug.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #9 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to [hidden email] from comment #8)
> FYI, this now works for me on Fedora 28:
>
> $ echo 'á' | LC_ALL=en_US.UTF-8 grep '[[=a=]]'
> á

I still get no output for that with:

glibc-all-langpacks-2.27-19.fc28.x86_64
glibc-2.27-19.fc28.i686
grep-3.1-5.fc28.x86_64

And running the slightly clarified reproducer:

#include <err.h>
#include <locale.h>
#include <regex.h>
#include <stdio.h>

int
main ()
{
  if (setlocale (LC_ALL, "en_US.UTF-8") == NULL)
    err (1, "setlocale");
  regex_t r;
  if (regcomp (&r, "[[=a=]]", 0))
    errx (1, "regcomp");
  printf ("%d\n", regexec (&r, "\303\241" /* á */ , 0, NULL, 0));
}

results in “1” being printed.  The latter also happens when running under
upstream glibc, against the upstream locale data.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #10 from jim at meyering dot net <jim at meyering dot net> ---
Odd. Your program prints "0" for me. Here's what I've just done:
---
$ cat > kk.c
#include <err.h>
#include <locale.h>
#include <regex.h>
#include <stdio.h>

int
main ()
{
  if (setlocale (LC_ALL, "en_US.UTF-8") == NULL)
    err (1, "setlocale");
  regex_t r;
  if (regcomp (&r, "[[=a=]]", 0))
    errx (1, "regcomp");
  printf ("%d\n", regexec (&r, "\303\241" /* á */ , 0, NULL, 0));
}
$ gcc kk.c
$ ./a.out
0
$ rpm -qa|grep langpa
glibc-langpack-fr-2.27-19.fc28.x86_64
glibc-langpack-tr-2.27-19.fc28.x86_64
glibc-langpack-en-2.27-19.fc28.x86_64
glibc-langpack-ja-2.27-19.fc28.x86_64
yum-langpacks-0.4.5-8.fc28.noarch
libreoffice-langpack-en-6.0.5.2-1.fc28.x86_64
glibc-langpack-zh-2.27-19.fc28.x86_64
evolution-data-server-langpacks-3.28.3-1.fc28.noarch
$ rpm -q glibc
glibc-2.27-19.fc28.x86_64

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #11 from jim at meyering dot net <jim at meyering dot net> ---
Note that I noticed grep test failures a day or two ago due to lack of locale
support, so manually installed the few langpacks that were required to make
grep's tests pass.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #12 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to [hidden email] from comment #10)
> Odd. Your program prints "0" for me.

Could you run it under strace, to see which locales are loaded?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/23036] regex equivalence class regression

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=23036

--- Comment #13 from jim at meyering dot net <jim at meyering dot net> ---
$ LC_ALL=en_US.UTF-8 strace -eopenat ./a.out                                    
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/en_US.UTF-8/LC_TIME", O_RDONLY|O_CLOEXEC) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/locale/en_US.utf8/LC_TIME", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = 3
0
+++ exited with 0 +++

--
You are receiving this mail because:
You are on the CC list for the bug.
12