[Bug locale/19519] New: iconv(1) hangs on illegal sequences with "christmas" flags

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] New: iconv(1) hangs on illegal sequences with "christmas" flags

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

            Bug ID: 19519
           Summary: iconv(1) hangs on illegal sequences with "christmas"
                    flags
           Product: glibc
           Version: 2.22
            Status: NEW
          Severity: normal
          Priority: P2
         Component: locale
          Assignee: unassigned at sourceware dot org
          Reporter: jengelh at inai dot de
  Target Milestone: ---

In glibc 2.19 and glibc 2.22, I observe that combining //translit with //ignore
with -c with an unconvertible sequence will hang the iconv program. The order
of //translit and //ignore is significant. The target charset is of no
significance; it could also be -t utf-8//translit//ignore.

 echo -en '\x80' | iconv -f us-ascii -t us-ascii//translit//ignore -c

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) hangs on illegal sequences with "christmas" flags

glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
Thanks for reporting this.

I tested a glibc based on 2.12 upstream, without any of the gconv fixes from
the last years, and the problem is still there.  This does not look like a
regression.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) hangs on illegal sequences with "christmas" flags

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security+

--- Comment #2 from Florian Weimer <fweimer at redhat dot com> ---
Another reproducer:

echo -en "\x0e\x0e" | /usr/bin/iconv -c -f IBM1364

We received an independent report of this issue, and I think we should treat
this as a minor security bug.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|iconv(1) hangs on illegal   |iconv(1) with -c option
                   |sequences with "christmas"  |hangs on illegal multi-byte
                   |flags                       |sequences

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|iconv(1) with -c option     |iconv(1) with -c option
                   |hangs on illegal multi-byte |hangs on illegal multi-byte
                   |sequences                   |sequences (CVE-2016-10228)
              Alias|                            |CVE-2016-10228

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.gentoo.org/sho
                   |                            |w_bug.cgi?id=611344

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Mike Frysinger <vapier at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugs.gentoo.org/sho
                   |                            |w_bug.cgi?id=611344

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Robert Schiele <rschiele at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rschiele at gmail dot com

--- Comment #3 from Robert Schiele <rschiele at gmail dot com> ---
Created attachment 9875
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9875&action=edit
attempt to fix CVE-2016-10228

This is my attempt to fix that bug by advancing the buffer whenever an illegal
character is detected by one byte. While this does fix the specific examples in
this bug report I am not sure whether it is a good idea since I am not
particularly deep in that code.

Concerns that come to my mind and should be commented upon by someone that is
more familiar with that code:
1. Is the place I fixed here the only place that needs fixing or did I probably
miss other places.
2. Is advancing by one byte generally a good idea? Some character sets operate
on multiple byte characters. Do we need to consider this here and probably
advance with the character size instead? Is there information available in some
data structure about how many bytes we would need to advance?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Robert Schiele <rschiele at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #9875|0                           |1
        is obsolete|                            |

--- Comment #4 from Robert Schiele <rschiele at gmail dot com> ---
Created attachment 9876
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9876&action=edit
Add missing brackets.

Sorry, the indention style tricked me into missing the brackets before.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

--- Comment #5 from Jan Engelhardt <jengelh at inai dot de> ---
>2. Is advancing by one byte generally a good idea?

In case where each codepoint is known to be encoded as the same number of
bytes, e.g. 4 octets in UTF-32, I would expect it to advance by that stride.

echo -en '\xff\xff\xff\xff' | iconv -f utf-32 -t utf-32//translit//ignore -c

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

--- Comment #6 from Robert Schiele <rschiele at gmail dot com> ---
(In reply to Jan Engelhardt from comment #5)
> >2. Is advancing by one byte generally a good idea?
>
> In case where each codepoint is known to be encoded as the same number of
> bytes, e.g. 4 octets in UTF-32, I would expect it to advance by that stride.
>
> echo -en '\xff\xff\xff\xff' | iconv -f utf-32 -t utf-32//translit//ignore -c

Exactly such a case I had in mind with my concern. Thus next task is to find a
way to determine within that context the size of each code point in bytes. This
might probably be a bit difficult given that this knowledge seems to be totally
encapsulated within the iconv function call.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

--- Comment #7 from Robert Schiele <rschiele at gmail dot com> ---
Ok, I experimented quite a bit to figure out the minimal number of bytes used
to encode a code point but it seems I didn't really fully understand the
underlying data structure and so far didn't find a working solution for all
cases I tested.

Therefore for my own purposes I left the solution advancing one byte, leaving
the risk that for some encodings the output can contain even more garbage than
what was found in the input, following the concept "garbage in, garbage out".
At least it still avoids the tool to hang in those cases.

I leave it to the experts of this code area to improve this fix or provide me a
hint, where in the data structures I can find the information about what is the
smallest number of bytes for one code point in the input data encoding.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Shane Seymour <shane.seymour at hpe dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |shane.seymour at hpe dot com

--- Comment #8 from Shane Seymour <shane.seymour at hpe dot com> ---
I thought I'd chime in since I believe the independent report mentioned by
Florian was from me.

Although it's not a generic fix for the underlying issue for a class of
character sets that use shift in/shift out sequences (e.g. IBM930 and IBM933)
some of them work as expected and some don't (I can't remember which one did
ignore a duplicate shift in or out). For IBM930 the BODY macro looks in part
like:

#define BODY \
  {                                                                           \
    uint32_t ch = *inptr;                                                     \
    uint32_t res;                                                             \
                                                                              \
    if (__builtin_expect (ch, 0) == SO)                                       \
      {                                                                       \
        /* Shift OUT, change to DBCS converter.  */                           \
        if (curcs == db)                                                      \
          {                                                                   \
            result = __GCONV_ILLEGAL_INPUT;                                   \
            break;                                                            \
          }                                                                   \
        curcs = db;                                                           \
        ++inptr;                                                              \
        continue;                                                             \
      }                                                                       \

if this line:

        if (curcs == db)                                                      \

was changed to:

        if (curcs == db && (! ignore_errors_p ()))                            \

Then a duplicate shift out sequence in IBM930 would not cause the iconv command
to loop forever with the -c option (the shift in code needs the same change).

But in general the issue could be resolved up by the code that does the
conversions not returning an error when being told ignore errors. In the above
example while it appears to be illegal to have duplicate shift in/out sequences
when ignoring errors the duplicate should probably just be consumed and
ignored.

Should there be a general rule that the character set conversion macros must do
something with their input when faced with something invalid and they've been
told to ignore errors? Then iconv doesn't need to know anything when set to
ignore errors and it shouldn't loop.

For non-variable size multi-byte character sets the converter at least it knows
based on the character set the number of bytes it should consume so it's easier
for it to skip past any error (e.g. IBM930 knows if it should skip one or two
bytes based on the current shift state).

I thought of these general rules when errors are ignored:

1. Character set converters using a fixed or know number of bytes will skip
forward one character when encountering an invalid character (includes
character sets that are always the one fixed size and character sets using
shift states between different size fixed characters)
2. Invalid structural data will be consumed (e.g. duplicate shift in/out states
in character sets like IBM930)
3. Variable size character sets will consume everything in from the start of a
character up to and including the byte that made the character invalid (e.g. in
SJIS if the sequence of bytes 0x81 0x20 was seen both bytes would be consumed
not just the 0x81 but that could instead easily be advance one byte and start
again).

Those rules are entirely open to discussion though - I've probably missed some
conditions that need to be covered. The downside would be that there are a lot
of macros to review to make sure that they would all follow a defined set of
rules and may it change their behaviour.

Anything following a conversion failure should really be considered undefined
anyway (as someone else said garbage in garbage out) so how bad data gets
treated can follow set rules as long as they are followed consistently by
everything.

Does it make more sense that the character set converters should decide what to
drop, skip or quitely consume when being told to ignore errors vs doing it
somewhere that doesn't really know much about any particular character set?

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

--- Comment #9 from Shane Seymour <shane.seymour at hpe dot com> ---
Sorry I should add that the IBM930 example was just that (an example). That's
an illustration of a downstream issue not upstream since that code has been
changed for that conversion to ignore duplicate shift in shift out characters.
The same general class of issues exists in other things though. Should the
conversion routines be allowed to return errors when told to ignore them (or if
they do they can't return with the input pointer unchanged so forward progress
can always be guaranteed).

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=13541

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

--- Comment #10 from Florian Weimer <fweimer at redhat dot com> ---
Perhaps we should do the following:

(1) -c should add //IGNORE to the conversions if not already there.

(2) The conversion functions should be adjusted to make progress in //IGNORE.
See bug 13541.

(3) If the iconv function does not make progress, skip over individual input
bytes.  With (2), this is only used as a last resort, to avoid a hang.


We can probably implement (1) and (3) for this bug.  (2) is mostly a
charset-specific quality issue after that.  Even then, the scope of this bug is
a bit larger than what I expected.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at sourceware dot org   |arjun.is at lostca dot se

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arjun.is at lostca dot se

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug locale/19519] iconv(1) with -c option hangs on illegal multi-byte sequences (CVE-2016-10228)

glaubitz at physik dot fu-berlin.de
In reply to this post by glaubitz at physik dot fu-berlin.de
https://sourceware.org/bugzilla/show_bug.cgi?id=19519

--- Comment #11 from Arjun Shankar <arjun.is at lostca dot se> ---
> echo -en '\x80' | iconv -f us-ascii -t us-ascii//translit//ignore -c
> echo -en "\x0e\x0e" | /usr/bin/iconv -c -f IBM1364

I have not investigated the second reproducer, but I am quite sure that it is
completely unrelated to the first one.

The first one is caused by incorrect parsing of the "//" marked suffixes
(//translit and //ignore). The problem is that the second suffix is being
dropped when parsing. This is why the order matters. The case reported, i.e.:

$ echo -en '\x80' | iconv -f us-ascii -t us-ascii//translit//ignore -c

is one where the "//translit" suffix is actually irrelevant because the input
does not have characters that can/should be transliterated, but its presence as
the first suffix leads to the actually relevant "//ignore" suffix (0x80 is not
valid ASCII) being *ignored* by iconv, leading to a hang.

This is why, these four similar cases:

$ echo -en '\x80' | iconv -f us-ascii -t us-ascii//ignore//translit -c
$ echo -en '\x80' | iconv -f us-ascii -t us-ascii//ignore -c
$ echo -en '\x80' | iconv -f us-ascii -t us-ascii -c
$ echo -en '\x80' | iconv -f us-ascii -t us-ascii//translit -c

do not hang. The '//ignore' suffix being first (or implied via -c) gets picked
up by the program and it correctly outputs nothing, and  the "//translit", if
present, is dropped (incorrectly) - but this has no effect for this particular
input on the program's running because like I mentioned earlier: there is
nothing to be transliterated in the input. Note that '-c' actually appends the
"//ignore" to the target encoding suffix if no suffixes are present, and
",ignore" if one or more suffixes are already present. If two suffixes are
passed during input, then the ',ignore' appended because of '-c' will get
dropped along with the second suffix, which may or may not lead to hang
depending on whether the first suffix is //ignore.

The hang in case of a missing "//ignore" suffix (well, not really missing here
but ignored by the program) afterwards during execution is a separate problem
(that I haven't investigated).

I'm going to write a patch to fix the problem with the parsing at the right
place(s), and then tackle the hangs separately.

--
You are receiving this mail because:
You are on the CC list for the bug.
12