[Bug stdio/24466] New: Feature request: provide special printf formats for intXX_t

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug stdio/24466] New: Feature request: provide special printf formats for intXX_t

claude at 2xlibre dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=24466

            Bug ID: 24466
           Summary: Feature request: provide special printf formats for
                    intXX_t
           Product: glibc
           Version: unspecified
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: stdio
          Assignee: unassigned at sourceware dot org
          Reporter: nfxjfg at googlemail dot com
  Target Milestone: ---

Currently, the standard way (and only way if you want to be portable) to format
a value of e.g. int32_t type is int32_t v; printf("%" PRIi32, v);

I think this is very awkward and lowers readability. A dedicated set of printf
format specifiers would be more appropriate (technically, they're called length
modifiers). I propose that glibc should add such qualifiers as a GNU extension.

Ideally, the syntax should encode the full bit width of the type passed to it,
and due to how printf format specifiers work, the sign of the type. It's clear
that it should be relatively similar to how the PRI* macro names are composed.

One idea would just to base this on Microsoft's printf extensions. Microsoft
uses "I32" for 32 bit values, "I64" for 64 bit values, and "I" for
size_t/ptrdiff_t. I think they had the right idea here. The format specifier
consists of a prefix "I" that is not used by any other standard, followed by
the bit width of the type, followed by the sign.

Maybe it would be nicer if the sign came before the bit width (like in the PRI*
macros), but on the other hand, it helps to disambiguate parsing of the bit
width (e.g. if a number follows the format specifier).

Extended over Microsoft's extension, I suggest the following:

      I64u => uint64_t       (MS)
      I64d =>  int64_t       (MS)
      I32u => uint32_t       (MS)
      I32d =>  int32_t       (MS)
      I16u => uint16_t       (GNU)
      I16d =>  int16_t       (GNU)
      I8u  => uint8_t        (GNU)
      I8d  =>  int8_t        (GNU)
      Iu   => size_t         (redundant with %z)
      Id   => ptrdiff_t      (redundant with %t)

This could cover longer type names added in the future as well:

      I128u => uint128_t
      I12u  => uint12_t      (unlikely to exist, but unambiguous)

But other approaches which cover all intXX_t types now and in the future would
be appreciated. Anything to reduce ugliness of code, improve readability, and
as consequence to reduce bugs and security issues. It helps to make the intXX_t
types more "first class" as well, since no ugly hacks for printf are needed
anymore.

Note that the standard already has a precedent for such type aliases. For
example "%z" is for size_t, even though it would have been possible to avoid it
by adding a "PRIz" macro or such. This may be a sign that the standard would
not be terribly opposed to such an addition.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug stdio/24466] Feature request: provide special printf formats for intXX_t

claude at 2xlibre dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=24466

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg

--- Comment #1 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to nfxjfg from comment #0)
> Currently, the standard way (and only way if you want to be portable) to
> format a value of e.g. int32_t type is int32_t v; printf("%" PRIi32, v);
>
> I think this is very awkward and lowers readability. A dedicated set of
> printf format specifiers would be more appropriate (technically, they're
> called length modifiers). I propose that glibc should add such qualifiers as
> a GNU extension.

I don't have a strong position here, but 'awkward' and 'lowers readability' is
quite vague concepts imho. I think we will need a slightly better reason to
introduce potentially GNU extensions that would deviate from the standard and
other implementations.

>
> Ideally, the syntax should encode the full bit width of the type passed to
> it, and due to how printf format specifiers work, the sign of the type. It's
> clear that it should be relatively similar to how the PRI* macro names are
> composed.
>
> One idea would just to base this on Microsoft's printf extensions. Microsoft
> uses "I32" for 32 bit values, "I64" for 64 bit values, and "I" for
> size_t/ptrdiff_t. I think they had the right idea here. The format specifier
> consists of a prefix "I" that is not used by any other standard, followed by
> the bit width of the type, followed by the sign.
>
> Maybe it would be nicer if the sign came before the bit width (like in the
> PRI* macros), but on the other hand, it helps to disambiguate parsing of the
> bit width (e.g. if a number follows the format specifier).
>
> Extended over Microsoft's extension, I suggest the following:
>
>       I64u => uint64_t       (MS)
>       I64d =>  int64_t       (MS)
>       I32u => uint32_t       (MS)
>       I32d =>  int32_t       (MS)
>       I16u => uint16_t       (GNU)
>       I16d =>  int16_t       (GNU)
>       I8u  => uint8_t        (GNU)
>       I8d  =>  int8_t        (GNU)
>       Iu   => size_t         (redundant with %z)
>       Id   => ptrdiff_t      (redundant with %t)
>
> This could cover longer type names added in the future as well:
>
>       I128u => uint128_t
>       I12u  => uint12_t      (unlikely to exist, but unambiguous)

Unfortanelly 'I' is handled a flag character for decimal integer conversion (i,
d, u) the output uses the locale's alternative output digits.  It means that
all about options are already valid printf inputs (for instance I32d will
printf a number using locale rules (through _i18n_number_rewrite function with
32 characters as field width).

It means that a potentially GNU extension won't be portable with MS extension
as is.

>
> But other approaches which cover all intXX_t types now and in the future
> would be appreciated. Anything to reduce ugliness of code, improve
> readability, and as consequence to reduce bugs and security issues. It helps
> to make the intXX_t types more "first class" as well, since no ugly hacks
> for printf are needed anymore.

The 'reduce bugs and security issues' is a strong reason to add such extension.
Could you provide examples where current practices are introducing readl world
cases of bugs and security issues?

>
> Note that the standard already has a precedent for such type aliases. For
> example "%z" is for size_t, even though it would have been possible to avoid
> it by adding a "PRIz" macro or such. This may be a sign that the standard
> would not be terribly opposed to such an addition.

My understanding is size_t predates C99 inttypes.h.


[1]
https://docs.microsoft.com/en-us/cpp/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions?view=vs-2019

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug stdio/24466] Feature request: provide special printf formats for intXX_t

claude at 2xlibre dot net
In reply to this post by claude at 2xlibre dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=24466

--- Comment #2 from nfxjfg at googlemail dot com ---
Yes, since this is not strictly necessary, the arguments for it are going to be
sort of vague and fuzzy.

I justify better readability with the fact that with this extension, the whole
format specifier is contained within the string literal (purely lexically
speaking). The additional quotes and the PRI prefix add noise and thus lower
readability. Format strings are already a bit hard to read, and interrupting
them with that additional "unusual" looking noise makes it harder.

Claiming less bugs and thus a security advantage from higher readability is
probably bit weak, although it's definitely a real effect. Security bugs due to
format string mistakes do exist in general. But it's a bit hard to argue here,
because the main way to combat them is recommending -Werror=format to
programmers. Still, programmers may pick less appropriate builtin types over
intXX_t just to avoid the ugliness associated with intXX_t, one of which are
format specifiers. For example, if a programmer were to cast int64_t to long
just so the shorter "l" format specifier could be used, it may be possible that
this adds overflow issues on 32 bit platforms. In all of these cases the
programmer could be blamed for not following good practices, but my argument is
that such artificial inconveniences are simply one cause of bugs due to
carelessness. Not sure if I should pursue this argument further.

Ultimately, I think the C standard should never have added these PRI macros. To
be honest, it sounds like a hack so that they could have useful intXX_t types
without having to argue with the guys who own the printf format specifiers, or
something.

One other thing that I would love to see is if the compiler could warn if the
user is passing types other than intXX_t for these new format specifiers, even
if the type is the same (i.e. base it on whether an intXX_t typedef was used).
This is up to the compiler (and I don't know if gcc could do it or if it would
be willing to do it), and special format specifiers might enable it to do so.

Is there any formal process/requirements involved with GNU extensions? I assume
they were added much more frivolously in the past, while you're trying to be
more careful with not adding potentially pointless things.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug stdio/24466] Feature request: provide special printf formats for intXX_t

claude at 2xlibre dot net
In reply to this post by claude at 2xlibre dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=24466

--- Comment #3 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to nfxjfg from comment #2)

> Yes, since this is not strictly necessary, the arguments for it are going to
> be sort of vague and fuzzy.
>
> I justify better readability with the fact that with this extension, the
> whole format specifier is contained within the string literal (purely
> lexically speaking). The additional quotes and the PRI prefix add noise and
> thus lower readability. Format strings are already a bit hard to read, and
> interrupting them with that additional "unusual" looking noise makes it
> harder.
>
> Claiming less bugs and thus a security advantage from higher readability is
> probably bit weak, although it's definitely a real effect. Security bugs due
> to format string mistakes do exist in general. But it's a bit hard to argue
> here, because the main way to combat them is recommending -Werror=format to
> programmers. Still, programmers may pick less appropriate builtin types over
> intXX_t just to avoid the ugliness associated with intXX_t, one of which are
> format specifiers. For example, if a programmer were to cast int64_t to long
> just so the shorter "l" format specifier could be used, it may be possible
> that this adds overflow issues on 32 bit platforms. In all of these cases
> the programmer could be blamed for not following good practices, but my
> argument is that such artificial inconveniences are simply one cause of bugs
> due to carelessness. Not sure if I should pursue this argument further.

I agree that we can and GCC has done a nice job in improving the compiler
warning for using mismatched types on both printf and scanf.  And I also tend
to see this should be the focus on improving for a more generic solution,
instead of adding more type specifiers to printf/scanf.

Also, in your example, it would also require proper compiler support to warn
that narrowing down the range of the variable using a cast might incur in a
mismatched specifier. Also, adding new specifiers would require adding
counter-part support on compilers which also might take time (it might be easy
for Microsoft which packs a complete toolchain solution, but for glibc it would
require to coordinate it the compiler).

>
> Ultimately, I think the C standard should never have added these PRI macros.
> To be honest, it sounds like a hack so that they could have useful intXX_t
> types without having to argue with the guys who own the printf format
> specifiers, or something.

One advantage of the PRI macros is they are composable: with C99 specifying
that both uint64_t and long long are no longer compiler extension, printf/scanf
should also support the types. It simplifies compiler and runtime
implementation.

>
> One other thing that I would love to see is if the compiler could warn if
> the user is passing types other than intXX_t for these new format
> specifiers, even if the type is the same (i.e. base it on whether an intXX_t
> typedef was used). This is up to the compiler (and I don't know if gcc could
> do it or if it would be willing to do it), and special format specifiers
> might enable it to do so.
>
> Is there any formal process/requirements involved with GNU extensions? I
> assume they were added much more frivolously in the past, while you're
> trying to be more careful with not adding potentially pointless things.

The best course of action is to discuss it on libc-alpha maillist. Usually,
bugzilla is not the right to place to discuss API extensions.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug stdio/24466] Feature request: provide special printf formats for intXX_t

claude at 2xlibre dot net
In reply to this post by claude at 2xlibre dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=24466

--- Comment #4 from nfxjfg at googlemail dot com ---
> And I also
> tend to see this should be the focus on improving for a more generic
> solution, instead of adding more type specifiers to printf/scanf.

Such as?

> One advantage of the PRI macros is they are composable: with C99 specifying
> that both uint64_t and long long are no longer compiler extension,
> printf/scanf should also support the types. It simplifies compiler and
> runtime implementation.

A truly generic solution would not require hardcoding libc implementation
details in the compiler. Using macros is just a workaround exactly because you
still need to use the builtin compiler checker. If that weren't the case I
could just implement my own printf-like function, instead of trying to compel
all target libcs or the C standard to include it.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug stdio/24466] Feature request: provide special printf formats for intXX_t

claude at 2xlibre dot net
In reply to this post by claude at 2xlibre dot net
https://sourceware.org/bugzilla/show_bug.cgi?id=24466

--- Comment #5 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to nfxjfg from comment #4)
> > And I also
> > tend to see this should be the focus on improving for a more generic
> > solution, instead of adding more type specifiers to printf/scanf.
>
> Such as?

The very suggestion you are aiming with this bug report. They will be aliases
to current already supported type, which would require additional support from
compiler to enable both warning and attribute supports and additional support
from programs to check and use the correct support (if they aim to support
different libcs, since at least initially MS and glibc one would usee different
identifiers).

>
> > One advantage of the PRI macros is they are composable: with C99 specifying
> > that both uint64_t and long long are no longer compiler extension,
> > printf/scanf should also support the types. It simplifies compiler and
> > runtime implementation.
>
> A truly generic solution would not require hardcoding libc implementation
> details in the compiler. Using macros is just a workaround exactly because
> you still need to use the builtin compiler checker. If that weren't the case
> I could just implement my own printf-like function, instead of trying to
> compel all target libcs or the C standard to include it.

The generic solution already exists, they are the inttypes types. The compiler
supports is not to enable the support, but rather to improve warning support
for mismatched/invalid types either through warning flags or
attribute((format)).

The question is: should we add *another* printf alias as GNU specific
extension? Or would be better to try and raise on C standard to add as a
generic identifier?

Again, I still think the best course of action is to discuss it on libc-alpha
maillist.

--
You are receiving this mail because:
You are on the CC list for the bug.