alloca avoidance patches

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

alloca avoidance patches

Florian Weimer-5
As you might have seen, today a vulnerability was disclosed in
glibc/Linux/GCC, concerning alloca/VLAs, large stack frames, and
aliasing the stack and the heap.

I have posted an immediate stop-gap patch for the dynamic linker to fix
CVE-2017-1000366/swbz#21624.  This should go in ASAP and should be
backported.

I also have non-conforming patches which use NAME_MAX and PATH_MAX for
additional mitigations.  I'll post them shortly, but I expect that only
distributions will pick them up because they do not follow GNU standards
and will not work on the Hurd.

In additional, I have a series of patches which remove alloca from
libintl and vfprintf.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Joseph Myers
It seems to me that we need a clear definition of what stack frame size
glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
unless the compiler can see they are bounded, and use -Wstack-usage= for
building glibc to make sure no function uses too much stack).

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Jeff Law
On 06/19/2017 10:47 AM, Joseph Myers wrote:
> It seems to me that we need a clear definition of what stack frame size
> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
> unless the compiler can see they are bounded, and use -Wstack-usage= for
> building glibc to make sure no function uses too much stack).
>
What we were thinking was a new -fstack-check implementation that
properly probes to avoid these problems (the existing -fstack-check is
fatally flawed for this stuff).

I've just posted an RFC around this issue to gcc-patches.  I've actually
got code here that can be used to protect x86, ppc, s390 and aarch64
from this class of problems.

I'm pretty happy with the x86 and ppc specific bits.  Less so with
aarch64 and s390.  Michael Matz has some ideas on generic checking
that's less efficient, but easy to drop into the other target code
generators.

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Szabolcs Nagy-2
In reply to this post by Joseph Myers
On 19/06/17 17:47, Joseph Myers wrote:
> It seems to me that we need a clear definition of what stack frame size
> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
> unless the compiler can see they are bounded, and use -Wstack-usage= for
> building glibc to make sure no function uses too much stack).

my experience with musl is that the tricky recursive
unbounded stack usage cases are not possible to catch
with -Wstack-usage= (details below) and that it warns
about several seemingly unbounded vlas that are clearly
bounded after some manual analysis, i expect glibc
would be similar.


in musl the rule is that calling into libc will not
use more than 16K stack on any target.
(the actual value in practice is around 9K for
long double conversion functions on some targets)
(and i think there is a less strict rule that
as-safe calls that may appear in signal handlers
should not use more than 6K stack which affect
the SIGSTKSZ definition)

this is complicated by callback apis (when the libc
can be entered several times), and allocation that
is compile time controlled by the caller (execl
copies the argument pointers to an array on the stack,
this is not accounted as libc stack use) and the compiler
can break the guarantee when it inlines and allocates
stack for the sum of disjunct code paths instead of the max.

there are two known cases where the rule is violated:
the recursive implementation of nftw can use more stack
(though it is still bounded by the PATH_MAX limit musl
enforces) and there is a known issue in the tre regcomp
implementation which recursively walks its regex ast
(depth is only limited by a big ast node limit i think)

in case of glibc there are other considerations like
lazy binding increases libc stack use because simd
regs are saved which have to be taken into account for
a musl-style general stack limit guarantee.

Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Paul Eggert
In reply to this post by Joseph Myers
On 06/19/2017 09:47 AM, Joseph Myers wrote:
> we need a clear definition of what stack frame size
> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
> unless the compiler can see they are bounded, and use -Wstack-usage= for
> building glibc to make sure no function uses too much stack).

For what it's worth, Gnulib uses the magic number 4032. That is, arrays
containing more than 4032 bytes are supposed to be allocated on the heap
rather than on the stack via alloca. This number was chosen to work on
all platforms Gnulib is intended to be portable to; we didn't bother to
tailor the number to each platform. The basic assumptions in Gnulib are:

* Every stack is guarded by at least 4096 bytes' worth of inaccessible
pages; the 4032 is because we subtract a few bytes for local variables.

* There is at most one such array per stack frame.

* The compiler does not coalesce stack frames for functions that declare
large locals. This is a bit tricky, as a function that allocates a local
buffer via 'char buffer[4000];' is not safe if it calls another function
that allocates a similar local buffer, because the call could be
inlined. In theory I suppose this could be a problem even with alloca,
but in practice I don't recall seeing it happen with alloca.

PS. My first reaction was to ask "how's -fstack-check going?" but I see
Jeff Law beat me to it.

Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Szabolcs Nagy-2
On 19/06/17 18:36, Paul Eggert wrote:

> On 06/19/2017 09:47 AM, Joseph Myers wrote:
>> we need a clear definition of what stack frame size
>> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
>> unless the compiler can see they are bounded, and use -Wstack-usage= for
>> building glibc to make sure no function uses too much stack).
>
> For what it's worth, Gnulib uses the magic number 4032. That is, arrays containing more than 4032 bytes are
> supposed to be allocated on the heap rather than on the stack via alloca. This number was chosen to work on all
> platforms Gnulib is intended to be portable to; we didn't bother to tailor the number to each platform. The
> basic assumptions in Gnulib are:
>
> * Every stack is guarded by at least 4096 bytes' worth of inaccessible pages; the 4032 is because we subtract a
> few bytes for local variables.
>
> * There is at most one such array per stack frame.
>
> * The compiler does not coalesce stack frames for functions that declare large locals. This is a bit tricky, as
> a function that allocates a local buffer via 'char buffer[4000];' is not safe if it calls another function that
> allocates a similar local buffer, because the call could be inlined. In theory I suppose this could be a
> problem even with alloca, but in practice I don't recall seeing it happen with alloca.
>

indeed it is not enough to check for large stack
allocations, you need to check for many small
allocations without intermediate stack access too.

in any case i think it's more productive to
fix the stack usage bugs, instead of hardening
for this class of exploitable stack usage bugs,
even if the guard page catches the issue it
is an unwanted crash.

> PS. My first reaction was to ask "how's -fstack-check going?" but I see Jeff Law beat me to it.
>

Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Joseph Myers
In reply to this post by Szabolcs Nagy-2
On Mon, 19 Jun 2017, Szabolcs Nagy wrote:

> On 19/06/17 17:47, Joseph Myers wrote:
> > It seems to me that we need a clear definition of what stack frame size
> > glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
> > unless the compiler can see they are bounded, and use -Wstack-usage= for
> > building glibc to make sure no function uses too much stack).
>
> my experience with musl is that the tricky recursive
> unbounded stack usage cases are not possible to catch

Recursive usage is not the issue here, provided that each such recursive
function doesn't allocate too much stack itself, and always accesses the
allocated stack before or as part of calling another function, so ensuring
it's not possible to jump the guard page (or multiple-page guard region,
if the platform ABI requires the kernel fixes enlarging the region to
ensure security).  (Large recursive usage may be a QoI issue or a bug, but
it's not a security issue.)

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Adhemerval Zanella-2
In reply to this post by Szabolcs Nagy-2


On 19/06/2017 14:33, Szabolcs Nagy wrote:

> On 19/06/17 17:47, Joseph Myers wrote:
>> It seems to me that we need a clear definition of what stack frame size
>> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
>> unless the compiler can see they are bounded, and use -Wstack-usage= for
>> building glibc to make sure no function uses too much stack).
>
> my experience with musl is that the tricky recursive
> unbounded stack usage cases are not possible to catch
> with -Wstack-usage= (details below) and that it warns
> about several seemingly unbounded vlas that are clearly
> bounded after some manual analysis, i expect glibc
> would be similar.
>
>
> in musl the rule is that calling into libc will not
> use more than 16K stack on any target.
> (the actual value in practice is around 9K for
> long double conversion functions on some targets)
> (and i think there is a less strict rule that
> as-safe calls that may appear in signal handlers
> should not use more than 6K stack which affect
> the SIGSTKSZ definition)
>
> this is complicated by callback apis (when the libc
> can be entered several times), and allocation that
> is compile time controlled by the caller (execl
> copies the argument pointers to an array on the stack,
> this is not accounted as libc stack use) and the compiler
> can break the guarantee when it inlines and allocates
> stack for the sum of disjunct code paths instead of the max.
>
> there are two known cases where the rule is violated:
> the recursive implementation of nftw can use more stack
> (though it is still bounded by the PATH_MAX limit musl
> enforces) and there is a known issue in the tre regcomp
> implementation which recursively walks its regex ast
> (depth is only limited by a big ast node limit i think)
>
> in case of glibc there are other considerations like
> lazy binding increases libc stack use because simd
> regs are saved which have to be taken into account for
> a musl-style general stack limit guarantee.

On glibc side another issues are glob and fnmatch.  On glob side it is
mainly due to GNU extensions, where GLOB_TILDE expands the search path
to potentially large buffers than PATH_MAX and GLOB_BRACE which
potentially calls glob recursively. GLIBC tries to avoid too much stack
allocation through the stack accountability (through alloca_account), but
it is fragile and hacky seems it requires a lot of boilerplate to manage
buffers. My patchset proposal to refactor it try to use dynarray to
manage buffer, but it does not solve the recursive allocation call
(which in this specific case I think it is a user issue since recursive
is driven the input pattern).

Fnmatch at other side is just subpar algorithm that double allocates the
buffer in wide-character form and it could be avoid with a better
implementation (there are some discussion on BZ#14185).

Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Joseph Myers
In reply to this post by Szabolcs Nagy-2
On Mon, 19 Jun 2017, Szabolcs Nagy wrote:

> in any case i think it's more productive to
> fix the stack usage bugs, instead of hardening
> for this class of exploitable stack usage bugs,
> even if the guard page catches the issue it
> is an unwanted crash.

Which gets back to wanting to use appropriate warning options, even if
they don't catch all cases - and to needing an ABI-defined size it's safe
to allocate, possibly more than a page if you rely on kernel fixes.

(I expect strtold has one of the larger static stack allocations in glibc.  
I can see such amounts, possibly more, being needed for fixing cpow{,f,l}
inaccuracy as well, on the assumption we should avoid libm functions
calling malloc.)

--
Joseph S. Myers
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Szabolcs Nagy-2
On 19/06/17 18:58, Joseph Myers wrote:

> On Mon, 19 Jun 2017, Szabolcs Nagy wrote:
>
>> in any case i think it's more productive to
>> fix the stack usage bugs, instead of hardening
>> for this class of exploitable stack usage bugs,
>> even if the guard page catches the issue it
>> is an unwanted crash.
>
> Which gets back to wanting to use appropriate warning options, even if
> they don't catch all cases - and to needing an ABI-defined size it's safe
> to allocate, possibly more than a page if you rely on kernel fixes.
>
> (I expect strtold has one of the larger static stack allocations in glibc.  
> I can see such amounts, possibly more, being needed for fixing cpow{,f,l}
> inaccuracy as well, on the assumption we should avoid libm functions
> calling malloc.)
>

i know at least crypt call in glibc can allocate 128K on the
stack, so glibc has bigger problems than strtold

Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Jeff Law
In reply to this post by Joseph Myers
On 06/19/2017 11:58 AM, Joseph Myers wrote:

> On Mon, 19 Jun 2017, Szabolcs Nagy wrote:
>
>> in any case i think it's more productive to
>> fix the stack usage bugs, instead of hardening
>> for this class of exploitable stack usage bugs,
>> even if the guard page catches the issue it
>> is an unwanted crash.
>
> Which gets back to wanting to use appropriate warning options, even if
> they don't catch all cases - and to needing an ABI-defined size it's safe
> to allocate, possibly more than a page if you rely on kernel fixes.
>
> (I expect strtold has one of the larger static stack allocations in glibc.  
> I can see such amounts, possibly more, being needed for fixing cpow{,f,l}
> inaccuracy as well, on the assumption we should avoid libm functions
> calling malloc.)
I don't have the list handy, but the strtol family was in the list of
functions with enough stack space in the prologue to require probing.
Jeff

Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Jeff Law
In reply to this post by Paul Eggert
On 06/19/2017 11:36 AM, Paul Eggert wrote:

> On 06/19/2017 09:47 AM, Joseph Myers wrote:
>> we need a clear definition of what stack frame size
>> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
>> unless the compiler can see they are bounded, and use -Wstack-usage= for
>> building glibc to make sure no function uses too much stack).
>
> For what it's worth, Gnulib uses the magic number 4032. That is, arrays
> containing more than 4032 bytes are supposed to be allocated on the heap
> rather than on the stack via alloca. This number was chosen to work on
> all platforms Gnulib is intended to be portable to; we didn't bother to
> tailor the number to each platform. The basic assumptions in Gnulib are:
>
> * Every stack is guarded by at least 4096 bytes' worth of inaccessible
> pages; the 4032 is because we subtract a few bytes for local variables.
>
> * There is at most one such array per stack frame
>
> * The compiler does not coalesce stack frames for functions that declare
> large locals. This is a bit tricky, as a function that allocates a local
> buffer via 'char buffer[4000];' is not safe if it calls another function
> that allocates a similar local buffer, because the call could be
> inlined. In theory I suppose this could be a problem even with alloca,
> but in practice I don't recall seeing it happen with alloca.The compiler will certainly coalesce frames for inline calls.    But it
will also try to share slots when the lifetimes do not overlap and
aliasing properties of the two objects are "reasonable".  It's a fairly
painful area of GCC to get balanced correctly.

Most of the alloca usages that would lead to these kinds of problems
were fixed in the early 90s when we were still dealing with systems that
didn't have alloca builtins and instead layered alloca on top of malloc.
  With compilers getting more and more aggressive with inlining, I'm a
bit surprised some of the issues haven't come back.

Personally, I wouldn't lose any sleep if alloca just went away and
instead we left it up to the compiler to analyze the code and ultimately
optimize a malloc/free pair into alloca when it's safe (it's almost
always profitable).  But I won't get on my soapbox right now :-)

> PS. My first reaction was to ask "how's -fstack-check going?" but I see
> Jeff Law beat me to it.
:-)

A usable -fstack-check really is the way to go IMHO.    Thus a desire to
have one that is performant enough to just be enabled by default and we
can all get on with more interesting problems.

Jeff


Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Carlos O'Donell-6
In reply to this post by Florian Weimer-5
On 06/19/2017 11:59 AM, Florian Weimer wrote:

> As you might have seen, today a vulnerability was disclosed in
> glibc/Linux/GCC, concerning alloca/VLAs, large stack frames, and
> aliasing the stack and the heap.
>
> I have posted an immediate stop-gap patch for the dynamic linker to fix
> CVE-2017-1000366/swbz#21624.  This should go in ASAP and should be
> backported.
>
> I also have non-conforming patches which use NAME_MAX and PATH_MAX for
> additional mitigations.  I'll post them shortly, but I expect that only
> distributions will pick them up because they do not follow GNU standards
> and will not work on the Hurd.

In the case of NAME_MAX and PATH_MAX we should define something
for use internally in the loader and apply that limit as a
restriction for SUID binaries. The fact that this does not follow
the GNU standards, is unfortunate, but we must act on behalf of our
users.

The safety of our users and protecting them against exploits is
paramount in my opinion, and we should adopt such limits for SUID
binaries where it makes sense, even if we later back them out for
a better solution.

I think that even GNU/Hurd is susceptable to these kinds of attacks
since all modern operating systems follow the same models for stack
and heap usage. Therefore we still need the protections there, even
if temporary, until we have gcc's -fstack-check ready and operational
for compiling all of glibc.

To that end I have reviewed your 3 posted patches and suggested we
apply an arbitrary but small limit for GNU/Hurd SUID binaries.

--
Cheers,
Carlos.
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Florian Weimer-5
On 06/19/2017 10:11 PM, Carlos O'Donell wrote:

> The safety of our users and protecting them against exploits is
> paramount in my opinion, and we should adopt such limits for SUID
> binaries where it makes sense, even if we later back them out for
> a better solution.

Okay, fair enough, I will commit both patches and backport them to 2.25,
2.24, and 2.23, too.

> I think that even GNU/Hurd is susceptable to these kinds of attacks
> since all modern operating systems follow the same models for stack
> and heap usage.

Windows probably gets it right, and this may have been one reason why
they never implemented C99 supported in their C compiler.  They support
alloca as an extension, but it will raise an exception in case of stack
overflow.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Jeff Law
In reply to this post by Szabolcs Nagy-2
On 06/19/2017 11:33 AM, Szabolcs Nagy wrote:

> On 19/06/17 17:47, Joseph Myers wrote:
>> It seems to me that we need a clear definition of what stack frame size
>> glibc can assume is safe (so that we can aim to eliminate alloca and VLAs
>> unless the compiler can see they are bounded, and use -Wstack-usage= for
>> building glibc to make sure no function uses too much stack).
>
> my experience with musl is that the tricky recursive
> unbounded stack usage cases are not possible to catch
> with -Wstack-usage= (details below) and that it warns
> about several seemingly unbounded vlas that are clearly
> bounded after some manual analysis, i expect glibc
> would be similar.
The total stack usage isn't really the issue here.  Yes, over-use can
contribute to bringing the stack and heap close together.  But to jump
the  guard you need an allocation greater than a page without touching
the page.

My focus is on avoiding the possibility of jumping the guard by using
probes.  It's certainly good to limit unnecessary stack usage, but IMHO
it doesn't address the key choke point we have to mitigate these kinds
of attacks.


Jeff
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Jeff Law
In reply to this post by Szabolcs Nagy-2
On 06/19/2017 12:00 PM, Szabolcs Nagy wrote:

> On 19/06/17 18:58, Joseph Myers wrote:
>> On Mon, 19 Jun 2017, Szabolcs Nagy wrote:
>>
>>> in any case i think it's more productive to
>>> fix the stack usage bugs, instead of hardening
>>> for this class of exploitable stack usage bugs,
>>> even if the guard page catches the issue it
>>> is an unwanted crash.
>>
>> Which gets back to wanting to use appropriate warning options, even if
>> they don't catch all cases - and to needing an ABI-defined size it's safe
>> to allocate, possibly more than a page if you rely on kernel fixes.
>>
>> (I expect strtold has one of the larger static stack allocations in glibc.  
>> I can see such amounts, possibly more, being needed for fixing cpow{,f,l}
>> inaccuracy as well, on the assumption we should avoid libm functions
>> calling malloc.)
>>
>
> i know at least crypt call in glibc can allocate 128K on the
> stack, so glibc has bigger problems than strtold
It's useful to separate out how the allocations occur.

Statically allocated space is allocated and probed by the prologue.
There's just over 2 dozen of these on x86 which allocate a page or more
within glibc.  These are trivially found by disassembly or other tooling
that just looks at large constant allocations.  You'll find workers for
strtold, realpath, printf, tmpfile, tempnam, strxfrm, wcstold, wcsxfrm,
fnmatch, fnwmatch, getwd and  others.


Dynamically allocated pages (eg alloca/vlas) are allocated and probed by
more generic code.  The key difference is these (and their sizes) are
not trivially discovered by just reading the assembly code.  You
actually have to walk through the code which does rounding, alignment,
etc for dynamic stack space.  There's actually a lot more of these and I
believe the crypt code you're referring to falls into this class.



Protecting the former requires a lot more work in GCC than protecting
the latter as the former requires target specific improvements while the
latter is more generic code that runs long before prologue/epilogue
generation.

jeff
Reply | Threaded
Open this post in threaded view
|

Re: alloca avoidance patches

Szabolcs Nagy-2
On 19/06/17 22:37, Jeff Law wrote:

> On 06/19/2017 12:00 PM, Szabolcs Nagy wrote:
>> On 19/06/17 18:58, Joseph Myers wrote:
>>> On Mon, 19 Jun 2017, Szabolcs Nagy wrote:
>>>
>>>> in any case i think it's more productive to
>>>> fix the stack usage bugs, instead of hardening
>>>> for this class of exploitable stack usage bugs,
>>>> even if the guard page catches the issue it
>>>> is an unwanted crash.
>>>
>>> Which gets back to wanting to use appropriate warning options, even if
>>> they don't catch all cases - and to needing an ABI-defined size it's safe
>>> to allocate, possibly more than a page if you rely on kernel fixes.
>>>
>>> (I expect strtold has one of the larger static stack allocations in glibc.  
>>> I can see such amounts, possibly more, being needed for fixing cpow{,f,l}
>>> inaccuracy as well, on the assumption we should avoid libm functions
>>> calling malloc.)
>>>
>>
>> i know at least crypt call in glibc can allocate 128K on the
>> stack, so glibc has bigger problems than strtold
> It's useful to separate out how the allocations occur.
>
> Statically allocated space is allocated and probed by the prologue.
> There's just over 2 dozen of these on x86 which allocate a page or more
> within glibc.  These are trivially found by disassembly or other tooling
> that just looks at large constant allocations.  You'll find workers for
> strtold, realpath, printf, tmpfile, tempnam, strxfrm, wcstold, wcsxfrm,
> fnmatch, fnwmatch, getwd and  others.
>
>
> Dynamically allocated pages (eg alloca/vlas) are allocated and probed by
> more generic code.  The key difference is these (and their sizes) are
> not trivially discovered by just reading the assembly code.  You
> actually have to walk through the code which does rounding, alignment,
> etc for dynamic stack space.  There's actually a lot more of these and I
> believe the crypt code you're referring to falls into this class.
>
>
>
> Protecting the former requires a lot more work in GCC than protecting
> the latter as the former requires target specific improvements while the
> latter is more generic code that runs long before prologue/epilogue
> generation.

i see, so target specific sp adjustments in prologue/epilogue
are hard to deal with in gcc, otherwise the protection is
simple and it should have small overhead in practice.

actually crypt does not have the issue i thought it had.
i remembered that struct crypt_data is big and i saw it being
allocated on the stack before, but it seems that was not in
glibc.. sorry for the noise. (otoh callers of crypt_r likely
have the issue of >128k stack use)