[Patch] multibyte encodings in strings

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Patch] multibyte encodings in strings

m4tze
Dear binutils team,

I am sending you a patch fixing an issue in the binary 'strings'.
The issue concerned finding multibyte encoded strings at odd offsets.

To test this issue try the following line

    echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | strings -el

Only the second string will be found.

The attached patch fixes this problem by doing only a single byte step if an
invalid character has been found.

Do not hesitate to contact me in case of questions or feedback.

Happy to contribute,

Mathias

strings.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] multibyte encodings in strings

Nick Clifton
Hi Mathias,

> I am sending you a patch fixing an issue in the binary 'strings'.
> The issue concerned finding multibyte encoded strings at odd offsets.

Thanks for the bug report and the patch.

I have decided to consider the patch as "obvious", in the sense that
it does not need an FSF copyright assignment, although I would still
encourage you to apply for such an assignment should you want to
contribute further patches.

I have applied the patch along with a couple of extra updates.  
Specifically I have created a ChangeLog entry for the patch, and a
new test in the binutils testsuite in order to make sure that the
patch continues to work.

There is one issue however that I would like to check with you.
The original (unpatched) decoding of the test binary that you
supplied produces the output: "String2" whereas the patched
version of string produces: "String1" and "tring2".  Is this
correct ?  I was kind of expecting the output to be "String1"
and "String2".

Cheers
  Nick

binutils/ChangeLog
2018-11-09  Mathias   <[hidden email]>

        * strings.c (print_strings): Check for multibyte encodings.
        * binutils-all/strings-1.bin: New file.  Test binary for string decoding.
        * testsuite/binutils-all/strings.exp: New file.  Test the strings program.
        * testsuite/config/default.exp (STRINGS): Define if not provided
        by the environment.
        (STRINGSFLAGS): Likewise.

Reply | Threaded
Open this post in threaded view
|

Re: [Patch] multibyte encodings in strings

m4tze
Hi Nick,

> I have applied the patch along with a couple of extra updates.  
> Specifically I have created a ChangeLog entry for the patch, and a
> new test in the binutils testsuite in order to make sure that the
> patch continues to work.

Thanks.
 
> There is one issue however that I would like to check with you.
> The original (unpatched) decoding of the test binary that you
> supplied produces the output: "String2" whereas the patched
> version of string produces: "String1" and "tring2".  Is this
> correct ?  I was kind of expecting the output to be "String1"
> and "String2".

Yes the output should be "String1" and "String2". As I am getting the expected
output on all my systems it is difficult for me to guess what is going wrong.
It looks as if your system accepts "aa53" as a printable utf-16-le char.
Could you please run

    echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | ./strings -el | xxd

to check if an additional character gets printed after "String1" and run

    echo "aa53007400720069006e00670031000053007400720069006e0067003200bb" | xxd -p -r | ./strings -el
    echo "aa53007400720069006e0067003100aabbcc53007400720069006e0067003200bb" | xxd -p -r | ./strings -el

to check if these yield the expected results?

Kind regards
Mathias
Reply | Threaded
Open this post in threaded view
|

Re: [Patch] multibyte encodings in strings

Nick Clifton
Hi Mathias,

>> correct ?  I was kind of expecting the output to be "String1"
>> and "String2".
>
> Yes the output should be "String1" and "String2". As I am getting the expected
> output on all my systems it is difficult for me to guess what is going wrong.

Ah, so if "strings -el" is working for you, then at least I can be sure that
I applied the patch correctly.

> It looks as if your system accepts "aa53" as a printable utf-16-le char.
> Could you please run
>
>     echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | ./strings -el | xxd
>
> to check if an additional character gets printed after "String1"

If by "additional" you mean a period character then yes:

  00000000: 5374 7269 6e67 310a 5374 7269 6e67 320a  String1.String2.

> and run
>
>     echo "aa53007400720069006e00670031000053007400720069006e0067003200bb" | xxd -p -r | ./strings -el

Gives:

String1
String2

>     echo "aa53007400720069006e0067003100aabbcc53007400720069006e0067003200bb" | xxd -p -r | ./strings -el

Gives the same:

String1
String2
 
> to check if these yield the expected results?

Which I assume is not right.  Presumably this is due to my environment.  In particular
I have LC_CTYPE=C which I think affects character display, right ?

Cheers
  Nick



Reply | Threaded
Open this post in threaded view
|

Re: [Patch] multibyte encodings in strings

m4tze
Hi Nick,

> > Could you please run
> >
> >     echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | ./strings -el | xxd
> >
> > to check if an additional character gets printed after "String1"
>
> If by "additional" you mean a period character then yes:
>
>   00000000: 5374 7269 6e67 310a 5374 7269 6e67 320a  String1.String2.

As the period is just a placeholder for the newline (0x0a) the output would be

String1
String2

Which is correct.
 

> >     echo "aa53007400720069006e00670031000053007400720069006e0067003200bb" | xxd -p -r | ./strings -el
>
> Gives:
>
> String1
> String2
>
> >     echo "aa53007400720069006e0067003100aabbcc53007400720069006e0067003200bb" | xxd -p -r | ./strings -el
>
> Gives the same:
>
> String1
> String2

All commands should give

String1
String2

So to me it looks like everything works just fine. If you can reproduce the
conditions where the output is "String1" and "tring2" please let me know.
Otherwise it should be ok to change the testcase to expect "String1" and
"String2"

Thanks a lot
Mathias