[PATCH] Use malloca instead alloca

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

KOSAKI Motohiro-2
> Could you provide a citation for this? I'm pretty sure the kernel
> ensures this does not happen by never allocating maps in the
> RLIMIT_STACK range. On the other hand, I have seen buggy
> vendor-patched kernels which would allocate maps in the stack range
> even when there is no memory pressure.

OK, Maybe I need more long explanation.

1) x86 has special stack care.

generic version.

linux/mm/util.c
-----------------------------------------
#if defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm)
{
  mm->mmap_base = TASK_UNMAPPED_BASE;
  mm->get_unmapped_area = arch_get_unmapped_area;
  mm->unmap_area = arch_unmap_area;
}
#endif


x86 version
linux/arch/x86/mm/mmap.c
-----------------------------------------------------------------
static unsigned long mmap_base(void)
{
   unsigned long gap = rlimit(RLIMIT_STACK);

  if (gap < MIN_GAP)
    gap = MIN_GAP;
  else if (gap > MAX_GAP)
    gap = MAX_GAP;

  return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd());
}

/*
 * This function, called very early during the creation of a new
 * process VM image, sets up which VM layout function to use:
 */
void arch_pick_mmap_layout(struct mm_struct *mm)
{
  if (mmap_is_legacy()) {
    mm->mmap_base = mmap_legacy_base();
    mm->get_unmapped_area = arch_get_unmapped_area;
    mm->unmap_area = arch_unmap_area;
  } else {
    mm->mmap_base = mmap_base();
    mm->get_unmapped_area = arch_get_unmapped_area_topdown;
   mm->unmap_area = arch_unmap_area_topdown;
  }
}


x86 maintainer is wise. mm->mmap_base (aka area search starting address)
is lower than stack and search order is top to bottom.  however there
are several
bottom to top searching case.

1) no free vm area (x86 only)

x86 has arch specific arch_get_unmapped_area_topdown() and it fall back
bottom-to-top search when top-to-bottom search is failure.

see http://lxr.linux.no/linux+v3.7/arch/x86/kernel/sys_x86_64.c#L266

2) arch generic arch_pick_mmap_layout uses bottom-to-top search.

see http://lxr.linux.no/linux+v3.7/mm/util.c#L292

3) x86 uses bottom-to-top search when mmap_is_legacy() return true.

the code is here.

static int mmap_is_legacy(void)
{
  if (current->personality & ADDR_COMPAT_LAYOUT)
    return 1;

  if (rlimit(RLIMIT_STACK) == RLIM_INFINITY)
    return 1;

  return sysctl_legacy_va_layout;
}

i.e.
 - uses setarch -L or uses personality syscall directly.
 - RLIMIT_STACK == RLIM_INFINITY.
 - /proc/sys/vm/legacy_va_layout = 1.


Is this sufficient pointer?
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

KOSAKI Motohiro-2
In reply to this post by Ondřej Bílka
> As value allocated on stack is value not allocated on heap this would
> delay this condition. If 4mb were allocated on stack then with
> fragmentation ratio 2 you could allocate additional 2mb until this
> happen.

Sorry, I haven't caught your point. Maybe because my english skill is very poor.


>> Also, signal handlers can be run on alternate stacks with sigaltstack,
>> and it's possible to call non-async-signal-safe functions in libc from
>> such signal handlers as long as you can ensure they did not interrupt
>> unsafe functions. A trivial way to do this is raise().
>>
> This is not big problem. We could change bound from size-32768 to size/2.
> Or additionaly keep counter of memory used my malloca and
> allocate until size/2 is hit.

What's happen if RLIMIT_STACK is infinity? When stack size is large,
your approach
makes SEGV even if using overcommit_memory=2.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

KOSAKI Motohiro-2
In reply to this post by KOSAKI Motohiro-2
> x86 version
> linux/arch/x86/mm/mmap.c
> -----------------------------------------------------------------
> static unsigned long mmap_base(void)
> {
>    unsigned long gap = rlimit(RLIMIT_STACK);
>
>   if (gap < MIN_GAP)
>     gap = MIN_GAP;
>   else if (gap > MAX_GAP)
>     gap = MAX_GAP;
>
>   return PAGE_ALIGN(TASK_SIZE - gap - mmap_rnd());
> }

One more thing. mmap_base is calculated at process creation time. However
RLIMIT_STACK is used when actual stack expansion. then weird scenario might
occur when process changes RLIMIT_STACK dynamically.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Roland McGrath-4
In reply to this post by Paul Eggert
Paul's points are valid as a generic thing.  But they aren't the key
points for considering changes to libc.

The entire discussion about maximum usable size is unnecessary
fritter.  We already have __libc_use_alloca, alloca_account,
etc. (include/alloca.h) to govern that decision for libc code.
If people want to discuss changing that logic, they can do that
separately.  But we will certainly not have more than one separate
implementation of such logic in libc.

Extending that internal API in some fashion to make it less work to
use would certainly be welcome.  That would have to be done in some
way that doesn't add overhead when existing uses of __libc_use_alloca
are converted to the new interface.  The simplest way to do that would
be a macro interface that stores to a local bool, which is what the
users of __libc_use_alloca mostly do now.  It would be nice to have an
interface that is completely trivial to use like malloca is, but for
code inside libc that ideal is less important than making sure we do
not degrade the performance (including code size) of any of the
existing uses.

There are a few existing uses of alloca that use their own ad hoc code
instead of __libc_use_alloca (misc/err.c, sunrpc/auth_unix.c, maybe
others).  Those should be converted to use __libc_use_alloca or
whatever nicer interface is figured out.

Then there are the existing uses of alloca that don't use
__libc_use_alloca at all, such as argp/argp-help.c.  Those should
probably be converted as well, though in some situations like the argp
ones it's a bit hard to imagine their really being used with sizes
large enough to matter.


Thanks,
Roland
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Ondřej Bílka
On Mon, Jan 07, 2013 at 05:05:34PM -0800, Roland McGrath wrote:

> Paul's points are valid as a generic thing.  But they aren't the key
> points for considering changes to libc.
>
> The entire discussion about maximum usable size is unnecessary
> fritter.  We already have __libc_use_alloca, alloca_account,
> etc. (include/alloca.h) to govern that decision for libc code.
> If people want to discuss changing that logic, they can do that
> separately.  But we will certainly not have more than one separate
> implementation of such logic in libc.
>
> Extending that internal API in some fashion to make it less work to
> use would certainly be welcome.  That would have to be done in some
> way that doesn't add overhead when existing uses of __libc_use_alloca
> are converted to the new interface.  The simplest way to do that would
> be a macro interface that stores to a local bool, which is what the
> users of __libc_use_alloca mostly do now.  It would be nice to have an
> interface that is completely trivial to use like malloca is, but for
> code inside libc that ideal is less important than making sure we do
> not degrade the performance (including code size) of any of the
> existing uses.

I wrote a possible new interface below.
I added __libc_use_alloca that tests if current stack frame will becomes
larger than __MAX_ALLOCA_CUTOFF. However needs change of nptl __libc_use_alloca.

This makes alloca_account be counted twice so I aliased it with alloca
to be counted only once.

This could be more effective than current state as we do not need to
track counters. (modulo details like that on x64 stackinfo_get_sp
definition causes %rsp be unnecessary copied into %rax.)

>
> There are a few existing uses of alloca that use their own ad hoc code
> instead of __libc_use_alloca (misc/err.c, sunrpc/auth_unix.c, maybe
> others).  Those should be converted to use __libc_use_alloca or
> whatever nicer interface is figured out.
>
> Then there are the existing uses of alloca that don't use
> __libc_use_alloca at all, such as argp/argp-help.c.  Those should
> probably be converted as well, though in some situations like the argp
> ones it's a bit hard to imagine their really being used with sizes
> large enough to matter.


One technical issue is if we want to use STACKINFO_BP_DEF. It would make
getting base pointer more portable but must be added to each function
that uses __libc_use_alloca.

/* TODO: switch to later case when __builtin_frame_address don't work.  */
#if 1
  #define STACKINFO_BP_DEF
  #define stackinfo_get_bp() __builtin_frame_address(0)
#else
  #define STACKINFO_BP_DEF void *__stackinfo_bp = &__stackinfo_bp;
  #define stackinfo_get_bp() __stackinfo_bp
#endif

#ifdef _STACK_GROWS_DOWN
  #define __STACKINFO_UB stackinfo_get_bp ()
  #define __STACKINFO_LB stackinfo_get_sp ()
#endif
#ifdef _STACK_GROWS_UP
  #define __STACKINFO_UB stackinfo_get_sp ()
  #define __STACKINFO_LB stackinfo_get_bp ()
#endif


Then alloca can use following

#define __libc_use_alloca(x) \
  ( __STACKINFO_UB - __STACKINFO_LB + (x) <= __MAX_ALLOCA_CUTOFF )

#define alloca_account(n, var)  alloca(n)
#define extend_alloca_account(buf, len, newlen, avar) \
        extend_alloca        (buf, len, newlen, avar)

And here is new version of malloca.

/* Safe automatic memory allocation.
   Copyright (C) 2012 Free Software Foundation, Inc.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2, or (at your option)
   any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, see <http://www.gnu.org/licenses/>.  */


#ifndef _MALLOCA_H
#define _MALLOCA_H

#ifdef HAVE_ALLOCA_H

#include <alloca.h>
#include <stdlib.h>

#ifdef __cplusplus
extern "C" {
#endif

/* malloca(N) is a safe variant of alloca(N).  It allocates N bytes of
   memory allocated on the stack until stack frame has __MAX_ALLOCA_CUTOFF
   bytes and heap otherwise.
   It must be freed using freea() before the function returns.  */

#define malloca(n) ({                             \
  size_t  __n__ = (n);                            \
  void  * __r__ = NULL;                           \
  if (__libc_use_alloca (__n__))                  \
    {                                             \
      __r__ = alloca (__n__);                     \
    }                                             \
  else                                            \
    {                                             \
      __r__ = malloc (__n__);                     \
    }                                             \
  __r__;                                          \
})

/* Maybe it is faster to use unsigned comparison such as
   __r -  __STACKINFO_LB <= __STACKINFO_UB -__STACKINFO_LB */

#define freea(r) do {                             \
  void *__r = (r);                                \
  if ( __r && !( __STACKINFO_LB <= __r &          \
                 __r <= __STACKINFO_UB ))         \
    free (__r);                                   \
} while (0)


#ifdef __cplusplus
}
#endif

#else
#define malloca(x) malloc (x)
#define freea(x)   free (x)
#endif

#endif /* _MALLOCA_H */
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Florian Weimer-5
On 01/08/2013 09:41 PM, Ondřej Bílka wrote:

> One technical issue is if we want to use STACKINFO_BP_DEF. It would make
> getting base pointer more portable but must be added to each function
> that uses __libc_use_alloca.

I'm interested in extending GCC to provide a better alloca, including
one that can fail.  Parts of the split stacks support might reusable for
obtaining the stack boundary.

I'm a bit skeptical about the whole alloca-if-we-have-room concept.  We
must make sure that we keep sufficient stack space for further
operation.  But I'm not sure how to determine the magic number of bytes
to reserve.

--
Florian Weimer / Red Hat Product Security Team
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Rich Felker
On Wed, Jan 09, 2013 at 05:09:23PM +0100, Florian Weimer wrote:
> On 01/08/2013 09:41 PM, Ondřej Bílka wrote:
>
> >One technical issue is if we want to use STACKINFO_BP_DEF. It would make
> >getting base pointer more portable but must be added to each function
> >that uses __libc_use_alloca.
>
> I'm interested in extending GCC to provide a better alloca,
> including one that can fail.  Parts of the split stacks support
> might reusable for obtaining the stack boundary.

We spent a bit of time analyzing whether split stack is useful on the
musl list, and concluded that for the most part it's just harmful. For
64-bit systems, it has no significant advantages over just allocating
large stacks and letting the kernel do overcommit. For 32-bit systems,
you risk overcommitting virtual address space and crashing when your
threads run out of stack because they can't map more, but there are of
course some usage patterns on 32-bit systems where they may make it
possible to do things that would otherwise not be possible without
delicate analysis of stack needs and manual use of thread stack-size
attributes.

Overall, it seems to be an overcomplicated, mostly-harmful feature.

> I'm a bit skeptical about the whole alloca-if-we-have-room concept.
> We must make sure that we keep sufficient stack space for further
> operation.  But I'm not sure how to determine the magic number of
> bytes to reserve.

I agree entirely. It strikes me as just as misguided as when
application developers ask "how to I check how much memory the system
has so I can hoarde it all?" If you're making what you use
proportional to what's available rather than to what you need, you're
doing something wrong.

Honestly, my recommendation would be to ban use of alloca and VLA
entirely and replace them by small fixed-size arrays (like 256-1024
bytes depending on the expected usage), and simply call malloc when
the need is greater than the fixed size. This makes the free logic
easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
improves performance in the common cases (since the compiler can
generate better, simpler code when it does not need to do the
frame-pointer-like logic for managing dynamic-size allocations on the
stack.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Florian Weimer-5
On 01/09/2013 05:44 PM, Rich Felker wrote:

> On Wed, Jan 09, 2013 at 05:09:23PM +0100, Florian Weimer wrote:
>> On 01/08/2013 09:41 PM, Ondřej Bílka wrote:
>>
>>> One technical issue is if we want to use STACKINFO_BP_DEF. It would make
>>> getting base pointer more portable but must be added to each function
>>> that uses __libc_use_alloca.
>>
>> I'm interested in extending GCC to provide a better alloca,
>> including one that can fail.  Parts of the split stacks support
>> might reusable for obtaining the stack boundary.
>
> We spent a bit of time analyzing whether split stack is useful on the
> musl list, and concluded that for the most part it's just harmful. For
> 64-bit systems, it has no significant advantages over just allocating
> large stacks and letting the kernel do overcommit.

This really depends on your environment.  Some systems have lightweight
processes with a baseline cost of less than a page.  I agree that if you
need more than a page for other reasons, split stacks are probably not
that interesting.

Anyway, to clarify, I'm not interested in enabling split stacks per se.
  But the split stack support records the stack boundary in a TLS
variable, and __builtin_alloca could use that to check (reasonably
cheaply, both in terms of code size and run-time cost) that the
allocation attempt doesn't exceed the available space.  It would be a
bit like -fcheck-stack, but hopefully much cheaper because the stack
banging logic isn't needed.

>> I'm a bit skeptical about the whole alloca-if-we-have-room concept.
>> We must make sure that we keep sufficient stack space for further
>> operation.  But I'm not sure how to determine the magic number of
>> bytes to reserve.
>
> I agree entirely. It strikes me as just as misguided as when
> application developers ask "how to I check how much memory the system
> has so I can hoarde it all?" If you're making what you use
> proportional to what's available rather than to what you need, you're
> doing something wrong.

Good analogy.

> Honestly, my recommendation would be to ban use of alloca and VLA
> entirely and replace them by small fixed-size arrays (like 256-1024
> bytes depending on the expected usage), and simply call malloc when
> the need is greater than the fixed size. This makes the free logic
> easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
> improves performance in the common cases (since the compiler can
> generate better, simpler code when it does not need to do the
> frame-pointer-like logic for managing dynamic-size allocations on the
> stack.

As long as there's alloca, developers will use it.  It's quite popular
even in new code.  I suspect it's not just the performance aspect, the
implicit free() is likely a factor, too.

I'm not sure if we can change this preference, at least until we have
conservative garbage collection as part of libc (so that heap allocation
and stack allocation are equally simple).

--
Florian Weimer / Red Hat Product Security Team
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Rich Felker
On Wed, Jan 09, 2013 at 05:59:45PM +0100, Florian Weimer wrote:
> Anyway, to clarify, I'm not interested in enabling split stacks per
> se.  But the split stack support records the stack boundary in a TLS
> variable, and __builtin_alloca could use that to check (reasonably

OK, that makes sense. I still don't think it's useful however because
the "use as much as we have available" idiom is broken.

> >Honestly, my recommendation would be to ban use of alloca and VLA
> >entirely and replace them by small fixed-size arrays (like 256-1024
> >bytes depending on the expected usage), and simply call malloc when
> >the need is greater than the fixed size. This makes the free logic
> >easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
> >improves performance in the common cases (since the compiler can
> >generate better, simpler code when it does not need to do the
> >frame-pointer-like logic for managing dynamic-size allocations on the
> >stack.
>
> As long as there's alloca, developers will use it.  It's quite
> popular even in new code.

Application developers can use it, but a particular project libc glibc
can impose a policy that it's not allowed in that project's code.

> I suspect it's not just the performance
> aspect, the implicit free() is likely a factor, too.

Implicit free does not help if you have to have an alternate case for
using malloc on large allocations. And performance is worse than the
fixed-size automatic array approach I described due to frame pointer
management.

> I'm not sure if we can change this preference, at least until we
> have conservative garbage collection as part of libc (so that heap
> allocation and stack allocation are equally simple).

Conservative GC is not the solution for that. If you're talking about
libc-internal memory usage, the simplest/cleanest solution would be
something like talloc where a single cleanup call before returning
from libc to the caller can free either (1) all memory allocated as
part of the work in libc-space, or (2) only temp working memory
that was allocated. This would make both success and error returns
trivial to handle.

All conservative GC would do is add lots of complexity and vulns where
memory fails to be freed under attacker-controlled conditions.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Ondřej Bílka
On Wed, Jan 09, 2013 at 12:43:30PM -0500, Rich Felker wrote:

> On Wed, Jan 09, 2013 at 05:59:45PM +0100, Florian Weimer wrote:
> > Anyway, to clarify, I'm not interested in enabling split stacks per
> > se.  But the split stack support records the stack boundary in a TLS
> > variable, and __builtin_alloca could use that to check (reasonably
>
> OK, that makes sense. I still don't think it's useful however because
> the "use as much as we have available" idiom is broken.
>
> > >Honestly, my recommendation would be to ban use of alloca and VLA
> > >entirely and replace them by small fixed-size arrays (like 256-1024
> > >bytes depending on the expected usage), and simply call malloc when
> > >the need is greater than the fixed size. This makes the free logic
> > >easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
> > >improves performance in the common cases (since the compiler can
> > >generate better, simpler code when it does not need to do the
> > >frame-pointer-like logic for managing dynamic-size allocations on the
> > >stack.
> >

Where do you want to place these arrays?
Global?       You cannot do it due race conditions.
Thread local? You cannot do it in recursive functions.  
              Also signal handlers break this.
              Performance is also worse due of extra indirection.
On stack?     In better case gcc will translate free check into
              comparison with frame pointer.
              In worse case it won't and you have increased register
              pressure. This makes code slower.
              Also with recursive functions you allocate more stack that
              is needed.

> > As long as there's alloca, developers will use it.  It's quite
> > popular even in new code.
>
> Application developers can use it, but a particular project libc glibc
> can impose a policy that it's not allowed in that project's code.
>
> > I suspect it's not just the performance
> > aspect, the implicit free() is likely a factor, too.
>
> Implicit free does not help if you have to have an alternate case for
> using malloc on large allocations. And performance is worse than the
> fixed-size automatic array approach I described due to frame pointer
> management.

See above

>
> > I'm not sure if we can change this preference, at least until we
> > have conservative garbage collection as part of libc (so that heap
> > allocation and stack allocation are equally simple).
>
> Conservative GC is not the solution for that. If you're talking about
> libc-internal memory usage, the simplest/cleanest solution would be
> something like talloc where a single cleanup call before returning
> from libc to the caller can free either (1) all memory allocated as
> part of the work in libc-space, or (2) only temp working memory
> that was allocated. This would make both success and error returns
> trivial to handle.
>
As you mentioned GC then determining when it is possible to allocate on
stack and do it is hot topic. It make GC faster, with smaller memory
footprint, more cache friendly etc as he does not to consider objects on
stack. So you cannot avoid solving this question.

> All conservative GC would do is add lots of complexity and vulns where
> memory fails to be freed under attacker-controlled conditions.
>
> Rich

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Rich Felker
On Fri, Jan 11, 2013 at 01:53:00PM +0100, Ondřej Bílka wrote:

> On Wed, Jan 09, 2013 at 12:43:30PM -0500, Rich Felker wrote:
> > On Wed, Jan 09, 2013 at 05:59:45PM +0100, Florian Weimer wrote:
> > > Anyway, to clarify, I'm not interested in enabling split stacks per
> > > se.  But the split stack support records the stack boundary in a TLS
> > > variable, and __builtin_alloca could use that to check (reasonably
> >
> > OK, that makes sense. I still don't think it's useful however because
> > the "use as much as we have available" idiom is broken.
> >
> > > >Honestly, my recommendation would be to ban use of alloca and VLA
> > > >entirely and replace them by small fixed-size arrays (like 256-1024
> > > >bytes depending on the expected usage), and simply call malloc when
> > > >the need is greater than the fixed size. This makes the free logic
> > > >easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
> > > >improves performance in the common cases (since the compiler can
> > > >generate better, simpler code when it does not need to do the
> > > >frame-pointer-like logic for managing dynamic-size allocations on the
> > > >stack.
> > >
>
> Where do you want to place these arrays?

Wherever the compiler places automatic variables (i.e. in the real
world, on the stack). Of course there is no other option. Why would
you even ask this?

> On stack?     In better case gcc will translate free check into
>               comparison with frame pointer.
>               In worse case it won't and you have increased register
>               pressure. This makes code slower.
>               Also with recursive functions you allocate more stack that
>               is needed.

I already explained why it makes code smaller and faster. If you don't
believe me, write some test cases.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Ondřej Bílka
On Fri, Jan 11, 2013 at 08:34:42AM -0500, Rich Felker wrote:

> On Fri, Jan 11, 2013 at 01:53:00PM +0100, Ondřej Bílka wrote:
> > On Wed, Jan 09, 2013 at 12:43:30PM -0500, Rich Felker wrote:
> > > On Wed, Jan 09, 2013 at 05:59:45PM +0100, Florian Weimer wrote:
> > > > Anyway, to clarify, I'm not interested in enabling split stacks per
> > > > se.  But the split stack support records the stack boundary in a TLS
> > > > variable, and __builtin_alloca could use that to check (reasonably
> > >
> > > OK, that makes sense. I still don't think it's useful however because
> > > the "use as much as we have available" idiom is broken.
> > >
> > > > >Honestly, my recommendation would be to ban use of alloca and VLA
> > > > >entirely and replace them by small fixed-size arrays (like 256-1024
> > > > >bytes depending on the expected usage), and simply call malloc when
> > > > >the need is greater than the fixed size. This makes the free logic
> > > > >easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
> > > > >improves performance in the common cases (since the compiler can
> > > > >generate better, simpler code when it does not need to do the
> > > > >frame-pointer-like logic for managing dynamic-size allocations on the
> > > > >stack.
> > > >
> >
> > Where do you want to place these arrays?
>
> Wherever the compiler places automatic variables (i.e. in the real
> world, on the stack). Of course there is no other option. Why would
> you even ask this?
>
> > On stack?     In better case gcc will translate free check into
> >               comparison with frame pointer.
> >               In worse case it won't and you have increased register
> >               pressure. This makes code slower.
> >               Also with recursive functions you allocate more stack that
> >               is needed.
>
> I already explained why it makes code smaller and faster. If you don't
> believe me, write some test cases.
There are two separate issues. First is gcc does not optimize alloca
well so type[constant] is faster than alloca(constant). This is bug in
gcc and can be fixed. Or if speed is such issue you can allocate by
({ char buf[128]; buf;})

Second is freeing.
Your aproach (ptr!=fixed_buf) adds mainteinance burden, register
pressure and is slower than having linked list of malloced arrays and
testing if(list) free_all(list).
See my testcase http://www.kam.mff.cuni.cz/~ondra/alloc_bench.tar.bz2.
that shows that on sandy bridge there are better aproaches.
>
> Rich


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Rich Felker
On Sat, Jan 12, 2013 at 05:18:25PM +0100, Ondřej Bílka wrote:

> On Fri, Jan 11, 2013 at 08:34:42AM -0500, Rich Felker wrote:
> > On Fri, Jan 11, 2013 at 01:53:00PM +0100, Ondřej Bílka wrote:
> > > On Wed, Jan 09, 2013 at 12:43:30PM -0500, Rich Felker wrote:
> > > > On Wed, Jan 09, 2013 at 05:59:45PM +0100, Florian Weimer wrote:
> > > > > Anyway, to clarify, I'm not interested in enabling split stacks per
> > > > > se.  But the split stack support records the stack boundary in a TLS
> > > > > variable, and __builtin_alloca could use that to check (reasonably
> > > >
> > > > OK, that makes sense. I still don't think it's useful however because
> > > > the "use as much as we have available" idiom is broken.
> > > >
> > > > > >Honestly, my recommendation would be to ban use of alloca and VLA
> > > > > >entirely and replace them by small fixed-size arrays (like 256-1024
> > > > > >bytes depending on the expected usage), and simply call malloc when
> > > > > >the need is greater than the fixed size. This makes the free logic
> > > > > >easy (if (ptr!=fixed_buf) free(ptr);) and reduces the code size and
> > > > > >improves performance in the common cases (since the compiler can
> > > > > >generate better, simpler code when it does not need to do the
> > > > > >frame-pointer-like logic for managing dynamic-size allocations on the
> > > > > >stack.
> > > > >
> > >
> > > Where do you want to place these arrays?
> >
> > Wherever the compiler places automatic variables (i.e. in the real
> > world, on the stack). Of course there is no other option. Why would
> > you even ask this?
> >
> > > On stack?     In better case gcc will translate free check into
> > >               comparison with frame pointer.
> > >               In worse case it won't and you have increased register
> > >               pressure. This makes code slower.
> > >               Also with recursive functions you allocate more stack that
> > >               is needed.
> >
> > I already explained why it makes code smaller and faster. If you don't
> > believe me, write some test cases.
> There are two separate issues. First is gcc does not optimize alloca
> well so type[constant] is faster than alloca(constant). This is bug in
> gcc and can be fixed.

No, this _cannot_ be fixed. If any non-static-sized objects exist in
the stack frame, then local variables cannot be addressed sp-relative.
Instead, they need a frame pointer (whether you call it a frame
pointer or not, and whether you use a register for it or a slot on the
stack, it's a frame pointer), and that's fundamentally more expensive.

> Or if speed is such issue you can allocate by
> ({ char buf[128]; buf;})

This has exactly the same issue if the amount to be allocated cannot
be determined statically (at compile time); it precludes sp-relative
addressing.

> Second is freeing.
> Your aproach (ptr!=fixed_buf) adds mainteinance burden, register
> pressure and is slower than having linked list of malloced arrays and
> testing if(list) free_all(list).

If you have more than one or two, it's almost surely more than you
should be putting on the stack.

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Paul Eggert
On 01/12/2013 02:24 PM, Rich Felker wrote:
>> gcc does not optimize alloca
>> > well so type[constant] is faster than alloca(constant). This is bug in
>> > gcc and can be fixed.
> No, this _cannot_ be fixed. If any non-static-sized objects

Surely GCC can be fixed so that a function that does this first:

   char *p = alloca (100);

is executed with code that's no slower than if the function had
done this first:

   char buf[100];
   char *p = buf;

Currently, GCC doesn't do this optimization, but I don't see any
reason why it couldn't.

There are other optimizations involving alloca (constant) that
GCC couldn't do, but the ones that Ondřej are talking about
seem doable.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Ondřej Bílka
On Sat, Jan 12, 2013 at 02:38:57PM -0800, Paul Eggert wrote:

> On 01/12/2013 02:24 PM, Rich Felker wrote:
> >> gcc does not optimize alloca
> >> > well so type[constant] is faster than alloca(constant). This is bug in
> >> > gcc and can be fixed.
> > No, this _cannot_ be fixed. If any non-static-sized objects
>
> Surely GCC can be fixed so that a function that does this first:
>
>    char *p = alloca (100);
>
> is executed with code that's no slower than if the function had
> done this first:
>
>    char buf[100];
>    char *p = buf;
>
> Currently, GCC doesn't do this optimization, but I don't see any
> reason why it couldn't.
>
> There are other optimizations involving alloca (constant) that
> GCC couldn't do, but the ones that Ondřej are talking about
> seem doable.

Debian llvm-gcc-4.2 does that.

I found this by using c++ code that was something like this.

inline int bar(int x, int no){
  ary = alloca(no);
  stuff(x,ary);  
}

int foo(int x){
  bar(x,64);
}

Problem with this code is that gcc does not inline functions containing alloca.
icc does inline this but also inlines bar(x,1<<30);

I rewrote bar as a macro as I did not know how do these optimizations in
general.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Rich Felker
In reply to this post by Paul Eggert
On Sat, Jan 12, 2013 at 02:38:57PM -0800, Paul Eggert wrote:

> On 01/12/2013 02:24 PM, Rich Felker wrote:
> >> gcc does not optimize alloca
> >> > well so type[constant] is faster than alloca(constant). This is bug in
> >> > gcc and can be fixed.
> > No, this _cannot_ be fixed. If any non-static-sized objects
>
> Surely GCC can be fixed so that a function that does this first:
>
>    char *p = alloca (100);
>
> is executed with code that's no slower than if the function had
> done this first:
>
>    char buf[100];
>    char *p = buf;
>
> Currently, GCC doesn't do this optimization, but I don't see any
> reason why it couldn't.

We're not talking about this situation. We're talking about things of
the form:

T foo(size_t n)
{
    char *p;
    if (n<K) p = alloca(n);
    else p = malloc(n);
    /* ... */
}

This is FUNDAMENTALLY more expensive than:

T foo(size_t n)
{
    char *p, buf[K];
    if (n<K) p = buf;
    else p = malloc(n);
    /* ... */
}

How is that hard to understand?

Rich
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Use malloca instead alloca

Paul Eggert
On 01/12/2013 06:59 PM, Rich Felker wrote:

> We're not talking about this situation.

Perhaps you weren't talking about it, but I was,
and I think Ondřej was; otherwise, why would he
have mentioned the topic?

For example, I've written code with a large buffer
of size K that is a compile-time constant whose value
depends on the architecture, where alloca (K)
is not necessarily safe because K might be too large.
For that code, malloca (K) should turn into alloca (K)
on some architectures, and into malloc (K) on others.
In the alloca (K) case it'd be nice if GCC did the
optimization in question.  And this case would
not be fundamentally expensive; it wouldn't require
a separate frame pointer, for example.

12