PowerPC malloc alignment

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

PowerPC malloc alignment

Daniel Jacobowitz-2
The malloc alignment on PowerPC (32-bit) was bumped up to 16 bytes
just before glibc 2.4 and then immediately reverted.  I eventually
found the discussion of the reversion here:

  https://bugzilla.redhat.com/show_bug.cgi?id=183895

Jakub, were there more lurking problems than just the one you fixed in
the patch attached to that bug?

The current status breaks, among other things, GDB when built with
"-maltivec -mabi=altivec -mlong-double-128".  A 16-byte union
including a long double is allocated in the heap and copied using
lvx / stvx, which require a 16-byte aligned address.

--
Daniel Jacobowitz
CodeSourcery
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Jakub Jelinek
On Wed, Oct 31, 2007 at 04:52:40PM -0400, Daniel Jacobowitz wrote:

> The malloc alignment on PowerPC (32-bit) was bumped up to 16 bytes
> just before glibc 2.4 and then immediately reverted.  I eventually
> found the discussion of the reversion here:
>
>   https://bugzilla.redhat.com/show_bug.cgi?id=183895
>
> Jakub, were there more lurking problems than just the one you fixed in
> the patch attached to that bug?
>
> The current status breaks, among other things, GDB when built with
> "-maltivec -mabi=altivec -mlong-double-128".  A 16-byte union
> including a long double is allocated in the heap and copied using
> lvx / stvx, which require a 16-byte aligned address.

The main problem is emacs (aka the only user of
malloc_set_state/malloc_get_state).  Changing the alignment is really an ABI
change for these interfaces, unless it does some very ugly hacks in it.

        Jakub
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Daniel Jacobowitz-2
On Wed, Oct 31, 2007 at 10:23:16PM +0100, Jakub Jelinek wrote:
> The main problem is emacs (aka the only user of
> malloc_set_state/malloc_get_state).  Changing the alignment is really an ABI
> change for these interfaces, unless it does some very ugly hacks in it.

Only if it saves in a dump with one and restores with the other, isn't
it?  Actually, I can't see how it matters at all.  I may have to
grovel through the emacs dumper to try to figure it out.

Meanwhile, mallocing anything containing a long double or vector
generates quietly wrong code unless you use posix_memalign :-(

--
Daniel Jacobowitz
CodeSourcery
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Roland McGrath
In reply to this post by Jakub Jelinek
> The main problem is emacs (aka the only user of
> malloc_set_state/malloc_get_state).  Changing the alignment is really an ABI
> change for these interfaces, unless it does some very ugly hacks in it.

That can at least be made easy to detect gracefully via magic/version.  
Is it pathologically difficult to cope with an old dumped state?
(I'm sure we went into this at the time, but I'm hazy on the details now.)
Ideally, it would be doable enough just to handle the existing misaligned
allocations but always align the new ones.  That copes even if libc itself
or some other new-but-compatible library (X, whatever) has newly-compiled
code that manages to care about the alignment of its own allocations.  It
is probably sufficient in practice just to have the alignment parameterized
by the malloc_save_state so malloc_set_state from an old dumped emacs
reverts the behavior to the smaller alignment.


Thanks,
Roland
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Steve Munroe

Roland McGrath <[hidden email]> wrote on 10/31/2007 04:36:48 PM:

> > The main problem is emacs (aka the only user of
> > malloc_set_state/malloc_get_state).  Changing the alignment is really
an ABI
> > change for these interfaces, unless it does some very ugly hacks in it.
>
> That can at least be made easy to detect gracefully via magic/version.
> Is it pathologically difficult to cope with an old dumped state?
> (I'm sure we went into this at the time, but I'm hazy on the details
now.)
> Ideally, it would be doable enough just to handle the existing misaligned
> allocations but always align the new ones.  That copes even if libc
itself
> or some other new-but-compatible library (X, whatever) has newly-compiled
> code that manages to care about the alignment of its own allocations.  It
> is probably sufficient in practice just to have the alignment
parameterized
> by the malloc_save_state so malloc_set_state from an old dumped emacs
> reverts the behavior to the smaller alignment.
>
This thing has been a problem for many moons. Anything we can do to help?

>
Steven J. Munroe
Linux on Power Toolchain Architect
IBM Corporation, Linux Technology Center

Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Wolfram Gloger
In reply to this post by Roland McGrath
> From: Roland McGrath <[hidden email]>
>
> That can at least be made easy to detect gracefully via magic/version.  
> Is it pathologically difficult to cope with an old dumped state?
> (I'm sure we went into this at the time, but I'm hazy on the details now.)
> Ideally, it would be doable enough just to handle the existing misaligned
> allocations but always align the new ones.  That copes even if libc itself
> or some other new-but-compatible library (X, whatever) has newly-compiled
> code that manages to care about the alignment of its own allocations.  It
> is probably sufficient in practice just to have the alignment parameterized
> by the malloc_save_state so malloc_set_state from an old dumped emacs
> reverts the behavior to the smaller alignment.

The alignment is supposed to be a _constant_ in the malloc
implementation, so you couldn't easily parameterize it.  The only way
I would see w/o sacrificing massive amounts of performance would be to
have two different versions of malloc compiled into libc, one with the
old and one with the new alignment.  This could be selected via hooks,
and when malloc_set_state was called on an old heap the old version
would be used.  Yes, quite ugly.

But how did these long double types come about?  Wheren't they
available in old PowerPC versions (if yes, then the alignment should
have been 16 from the very start), or only with "-mlong-double-128"?

IOW, why does this problem suddenly appear?

Regards,
Wolfram.
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Daniel Jacobowitz-2
On Thu, Nov 01, 2007 at 04:17:45PM -0000, Wolfram Gloger wrote:
> But how did these long double types come about?  Wheren't they
> available in old PowerPC versions (if yes, then the alignment should
> have been 16 from the very start), or only with "-mlong-double-128"?
>
> IOW, why does this problem suddenly appear?

The introduction of 128-bit long double is a relatively recent ABI
change (within the past two years, and not entirely universal yet).
16-byte vectors are also relatively new, though I suspect people
writing vector code have been trained to use posix_memalign to avoid
trouble.  Before either of those, 64-bit was clearly the right
alignment.

--
Daniel Jacobowitz
CodeSourcery
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Wolfram Gloger
Hi,

> The introduction of 128-bit long double is a relatively recent ABI
> change (within the past two years, and not entirely universal yet).

Ok.  Changing the ABI should really entail a new libc, or at least
people shouldn't be surprised when they face problems when mixing old
and new binaries.

> 16-byte vectors are also relatively new, though I suspect people
> writing vector code have been trained to use posix_memalign to avoid
> trouble.

Those aren't standard C types, so IMHO using posix_memalign is the way
to go.  But long double is different of course.

Regards,
Wolfram.
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Daniel Jacobowitz-2
In reply to this post by Roland McGrath
On Wed, Oct 31, 2007 at 02:36:48PM -0700, Roland McGrath wrote:
> > The main problem is emacs (aka the only user of
> > malloc_set_state/malloc_get_state).  Changing the alignment is really an ABI
> > change for these interfaces, unless it does some very ugly hacks in it.
>
> That can at least be made easy to detect gracefully via magic/version.  
> Is it pathologically difficult to cope with an old dumped state?

Tricky, at least.  The alignment has to be a constant or we're going
to penalize performance all across the board.  And it is used
indirectly to size some arrays, e.g. via NFASTBINS.

I tried just ignoring malloc_set_state, to see if emacs does something
graceful.  It does not check for failure and immediately frees the
saved state, so it crashes.  If I forcibly skip both the call to
malloc_set_state and the following free, things seem to work OK but
malloc checking reveals it will happily realloc() pointers from the
saved heap.  So that won't help either.

I see two ways to increase the alignment.

  - Build a second set of malloc routines with the old alignment.  At
    malloc_set_state time, install them as hooks if necessary.

  - Tell people to rebuild emacs.  Improve the crash by adding a
    debugging message out of malloc_set_state when it detects the
    problem.

You can guess which one of these I like better.  I doubt that anything
besides Emacs uses malloc_set_state.  Some older librep may have, but
nowadays it has only the vestigial configure check.  The only other
references I could find were programs using malloc_get_state to peek
at statistics, and they won't be affected anyway.

Let me know if you think we need the hooks approach.

--
Daniel Jacobowitz
CodeSourcery
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Wolfram Gloger
> From: Daniel Jacobowitz <[hidden email]>

> I see two ways to increase the alignment.
>
>   - Build a second set of malloc routines with the old alignment.  At
>     malloc_set_state time, install them as hooks if necessary.
>
>   - Tell people to rebuild emacs.  Improve the crash by adding a
>     debugging message out of malloc_set_state when it detects the
>     problem.

I hate to suggest it (because you've surely considered it already),
but the obvious alternatives:

 - bump libc major version

 - put a version tag on malloc et al (don't know if that's feasible)

> Let me know if you think we need the hooks approach.

Let me know too, I may find some time to look at it.

Regards,
Wolfram.
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Daniel Jacobowitz-2
On Sat, Nov 03, 2007 at 11:52:47AM -0000, Wolfram Gloger wrote:
> I hate to suggest it (because you've surely considered it already),
> but the obvious alternatives:
>
>  - bump libc major version
>
>  - put a version tag on malloc et al (don't know if that's feasible)

We managed every other libc interface affected by the long double
change without having to bump the soname; it's so incredibly and
astoundingly painful that I will go to whatever lengths necessary
to avoid it.

I thought about versioning malloc, but it doesn't work out.  glibc and
other new libraries will call the malloc with new alignment.  You'd
have to have both versions of malloc and give them separate arenas,
I suppose...

--
Daniel Jacobowitz
CodeSourcery
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Daniel Jacobowitz-2
In reply to this post by Daniel Jacobowitz-2
On Fri, Nov 02, 2007 at 03:40:11PM -0400, Daniel Jacobowitz wrote:

> I see two ways to increase the alignment.
>
>   - Build a second set of malloc routines with the old alignment.  At
>     malloc_set_state time, install them as hooks if necessary.
>
>   - Tell people to rebuild emacs.  Improve the crash by adding a
>     debugging message out of malloc_set_state when it detects the
>     problem.
>
> You can guess which one of these I like better.  I doubt that anything
> besides Emacs uses malloc_set_state.  Some older librep may have, but
> nowadays it has only the vestigial configure check.  The only other
> references I could find were programs using malloc_get_state to peek
> at statistics, and they won't be affected anyway.
>
> Let me know if you think we need the hooks approach.

Here is a patch which does neither of those extras.  Both extras are
for the sole benefit of existing emacs binaries; probably one of them
should accompany this patch, but at least this is a starting point.

--
Daniel Jacobowitz
CodeSourcery

2007-11-05  Daniel Jacobowitz  <[hidden email]>

        * malloc/malloc.c (MALLOC_ALIGNMENT): Use __alignof__ (long double).
        (SMALLBIN_CORRECTION): New.
        (MIN_LARGE_SIZE, smallbin_index): Use it to handle 16-byte alignment.
        (largebin_index_32_big): New.
        (largebin_index): Use it for 16-byte alignment.
        (sYSMALLOc): Handle MALLOC_ALIGNMENT > 2 * SIZE_SZ.

---
 malloc/malloc.c |   61 ++++++++++++++++++++++++++++++++------------------------
 1 files changed, 35 insertions(+), 26 deletions(-)

Index: glibc-2.5.90/malloc/malloc.c
===================================================================
--- glibc-2.5.90.orig/malloc/malloc.c 2007-11-02 12:43:15.000000000 -0700
+++ glibc-2.5.90/malloc/malloc.c 2007-11-02 17:23:03.000000000 -0700
@@ -378,16 +378,8 @@ extern "C" {
 
 
 #ifndef MALLOC_ALIGNMENT
-/* XXX This is the correct definition.  It differs from 2*SIZE_SZ only on
-   powerpc32.  For the time being, changing this is causing more
-   compatibility problems due to malloc_get_state/malloc_set_state than
-   will returning blocks not adequately aligned for long double objects
-   under -mlong-double-128.
-
 #define MALLOC_ALIGNMENT       (2 * SIZE_SZ < __alignof__ (long double) \
  ? __alignof__ (long double) : 2 * SIZE_SZ)
-*/
-#define MALLOC_ALIGNMENT       (2 * SIZE_SZ)
 #endif
 
 /* The corresponding bit mask value */
@@ -2121,18 +2113,23 @@ typedef struct malloc_chunk* mbinptr;
 
     The bins top out around 1MB because we expect to service large
     requests via mmap.
+
+    Bin 0 does not exist.  Bin 1 is the unordered list; if that would be
+    a valid chunk size the small bins are bumped up one.
 */
 
 #define NBINS             128
 #define NSMALLBINS         64
 #define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
-#define MIN_LARGE_SIZE    (NSMALLBINS * SMALLBIN_WIDTH)
+#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > 2 * SIZE_SZ)
+#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)
 
 #define in_smallbin_range(sz)  \
   ((unsigned long)(sz) < (unsigned long)MIN_LARGE_SIZE)
 
 #define smallbin_index(sz) \
-  (SMALLBIN_WIDTH == 16 ? (((unsigned)(sz)) >> 4) : (((unsigned)(sz)) >> 3))
+  ((SMALLBIN_WIDTH == 16 ? (((unsigned)(sz)) >> 4) : (((unsigned)(sz)) >> 3)) \
+   + SMALLBIN_CORRECTION)
 
 #define largebin_index_32(sz)                                                \
 (((((unsigned long)(sz)) >>  6) <= 38)?  56 + (((unsigned long)(sz)) >>  6): \
@@ -2142,6 +2139,14 @@ typedef struct malloc_chunk* mbinptr;
  ((((unsigned long)(sz)) >> 18) <=  2)? 124 + (((unsigned long)(sz)) >> 18): \
                                         126)
 
+#define largebin_index_32_big(sz)                                            \
+(((((unsigned long)(sz)) >>  6) <= 45)?  49 + (((unsigned long)(sz)) >>  6): \
+ ((((unsigned long)(sz)) >>  9) <= 20)?  91 + (((unsigned long)(sz)) >>  9): \
+ ((((unsigned long)(sz)) >> 12) <= 10)? 110 + (((unsigned long)(sz)) >> 12): \
+ ((((unsigned long)(sz)) >> 15) <=  4)? 119 + (((unsigned long)(sz)) >> 15): \
+ ((((unsigned long)(sz)) >> 18) <=  2)? 124 + (((unsigned long)(sz)) >> 18): \
+                                        126)
+
 // XXX It remains to be seen whether it is good to keep the widths of
 // XXX the buckets the same or whether it should be scaled by a factor
 // XXX of two as well.
@@ -2154,7 +2159,9 @@ typedef struct malloc_chunk* mbinptr;
                                         126)
 
 #define largebin_index(sz) \
-  (SIZE_SZ == 8 ? largebin_index_64 (sz) : largebin_index_32 (sz))
+  (SIZE_SZ == 8 ? largebin_index_64 (sz)                                     \
+   : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz)                     \
+   : largebin_index_32 (sz))
 
 #define bin_index(sz) \
  ((in_smallbin_range(sz)) ? smallbin_index(sz) : largebin_index(sz))
@@ -2951,14 +2958,14 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
       Round up size to nearest page.  For mmapped chunks, the overhead
       is one SIZE_SZ unit larger than for normal chunks, because there
       is no following chunk whose prev_size field could be used.
+
+      See the front_misalign handling below, for glibc there is no
+      need for further alignments unless we have have high alignment.
     */
-#if 1
-    /* See the front_misalign handling below, for glibc there is no
-       need for further alignments.  */
-    size = (nb + SIZE_SZ + pagemask) & ~pagemask;
-#else
-    size = (nb + SIZE_SZ + MALLOC_ALIGN_MASK + pagemask) & ~pagemask;
-#endif
+    if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
+      size = (nb + SIZE_SZ + pagemask) & ~pagemask;
+    else
+      size = (nb + SIZE_SZ + MALLOC_ALIGN_MASK + pagemask) & ~pagemask;
     tried_mmap = true;
 
     /* Don't try if size wraps around 0 */
@@ -2976,13 +2983,16 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
           address argument for later munmap in free() and realloc().
         */
 
-#if 1
- /* For glibc, chunk2mem increases the address by 2*SIZE_SZ and
-   MALLOC_ALIGN_MASK is 2*SIZE_SZ-1.  Each mmap'ed area is page
-   aligned and therefore definitely MALLOC_ALIGN_MASK-aligned.  */
-        assert (((INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK) == 0);
-#else
-        front_misalign = (INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK;
+ if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
+  {
+    /* For glibc, chunk2mem increases the address by 2*SIZE_SZ and
+       MALLOC_ALIGN_MASK is 2*SIZE_SZ-1.  Each mmap'ed area is page
+       aligned and therefore definitely MALLOC_ALIGN_MASK-aligned.  */
+    assert (((INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK) == 0);
+    front_misalign = 0;
+  }
+ else
+  front_misalign = (INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK;
         if (front_misalign > 0) {
           correction = MALLOC_ALIGNMENT - front_misalign;
           p = (mchunkptr)(mm + correction);
@@ -2990,7 +3000,6 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
           set_head(p, (size - correction) |IS_MMAPPED);
         }
         else
-#endif
   {
     p = (mchunkptr)mm;
     set_head(p, size|IS_MMAPPED);
Reply | Threaded
Open this post in threaded view
|

Re: PowerPC malloc alignment

Daniel Jacobowitz-2
On Mon, Nov 05, 2007 at 01:09:27PM -0500, Daniel Jacobowitz wrote:
> Here is a patch which does neither of those extras.  Both extras are
> for the sole benefit of existing emacs binaries; probably one of them
> should accompany this patch, but at least this is a starting point.

Here's a revised version.  Corey Minyard at MontaVista discovered that
there was another place which assumed MALLOC_ALIGNMENT == 2 * SIZE_SZ;
it should be treated just like the others.

--
Daniel Jacobowitz
CodeSourcery

2007-11-30  Daniel Jacobowitz  <[hidden email]>

        * malloc/malloc.c (MALLOC_ALIGNMENT): Use __alignof__ (long double).
        (SMALLBIN_CORRECTION): New.
        (MIN_LARGE_SIZE, smallbin_index): Use it to handle 16-byte alignment.
        (largebin_index_32_big): New.
        (largebin_index): Use it for 16-byte alignment.
        (sYSMALLOc): Handle MALLOC_ALIGNMENT > 2 * SIZE_SZ.

Index: malloc/malloc.c
===================================================================
RCS file: /cvs/glibc/libc/malloc/malloc.c,v
retrieving revision 1.181
diff -u -p -r1.181 malloc.c
--- malloc/malloc.c 2 Oct 2007 03:52:03 -0000 1.181
+++ malloc/malloc.c 30 Nov 2007 17:10:43 -0000
@@ -378,16 +378,8 @@ extern "C" {
 
 
 #ifndef MALLOC_ALIGNMENT
-/* XXX This is the correct definition.  It differs from 2*SIZE_SZ only on
-   powerpc32.  For the time being, changing this is causing more
-   compatibility problems due to malloc_get_state/malloc_set_state than
-   will returning blocks not adequately aligned for long double objects
-   under -mlong-double-128.
-
 #define MALLOC_ALIGNMENT       (2 * SIZE_SZ < __alignof__ (long double) \
  ? __alignof__ (long double) : 2 * SIZE_SZ)
-*/
-#define MALLOC_ALIGNMENT       (2 * SIZE_SZ)
 #endif
 
 /* The corresponding bit mask value */
@@ -2121,18 +2113,23 @@ typedef struct malloc_chunk* mbinptr;
 
     The bins top out around 1MB because we expect to service large
     requests via mmap.
+
+    Bin 0 does not exist.  Bin 1 is the unordered list; if that would be
+    a valid chunk size the small bins are bumped up one.
 */
 
 #define NBINS             128
 #define NSMALLBINS         64
 #define SMALLBIN_WIDTH    MALLOC_ALIGNMENT
-#define MIN_LARGE_SIZE    (NSMALLBINS * SMALLBIN_WIDTH)
+#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > 2 * SIZE_SZ)
+#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)
 
 #define in_smallbin_range(sz)  \
   ((unsigned long)(sz) < (unsigned long)MIN_LARGE_SIZE)
 
 #define smallbin_index(sz) \
-  (SMALLBIN_WIDTH == 16 ? (((unsigned)(sz)) >> 4) : (((unsigned)(sz)) >> 3))
+  ((SMALLBIN_WIDTH == 16 ? (((unsigned)(sz)) >> 4) : (((unsigned)(sz)) >> 3)) \
+   + SMALLBIN_CORRECTION)
 
 #define largebin_index_32(sz)                                                \
 (((((unsigned long)(sz)) >>  6) <= 38)?  56 + (((unsigned long)(sz)) >>  6): \
@@ -2142,6 +2139,14 @@ typedef struct malloc_chunk* mbinptr;
  ((((unsigned long)(sz)) >> 18) <=  2)? 124 + (((unsigned long)(sz)) >> 18): \
                                         126)
 
+#define largebin_index_32_big(sz)                                            \
+(((((unsigned long)(sz)) >>  6) <= 45)?  49 + (((unsigned long)(sz)) >>  6): \
+ ((((unsigned long)(sz)) >>  9) <= 20)?  91 + (((unsigned long)(sz)) >>  9): \
+ ((((unsigned long)(sz)) >> 12) <= 10)? 110 + (((unsigned long)(sz)) >> 12): \
+ ((((unsigned long)(sz)) >> 15) <=  4)? 119 + (((unsigned long)(sz)) >> 15): \
+ ((((unsigned long)(sz)) >> 18) <=  2)? 124 + (((unsigned long)(sz)) >> 18): \
+                                        126)
+
 // XXX It remains to be seen whether it is good to keep the widths of
 // XXX the buckets the same or whether it should be scaled by a factor
 // XXX of two as well.
@@ -2154,7 +2159,9 @@ typedef struct malloc_chunk* mbinptr;
                                         126)
 
 #define largebin_index(sz) \
-  (SIZE_SZ == 8 ? largebin_index_64 (sz) : largebin_index_32 (sz))
+  (SIZE_SZ == 8 ? largebin_index_64 (sz)                                     \
+   : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz)                     \
+   : largebin_index_32 (sz))
 
 #define bin_index(sz) \
  ((in_smallbin_range(sz)) ? smallbin_index(sz) : largebin_index(sz))
@@ -2951,14 +2958,14 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
       Round up size to nearest page.  For mmapped chunks, the overhead
       is one SIZE_SZ unit larger than for normal chunks, because there
       is no following chunk whose prev_size field could be used.
+
+      See the front_misalign handling below, for glibc there is no
+      need for further alignments unless we have have high alignment.
     */
-#if 1
-    /* See the front_misalign handling below, for glibc there is no
-       need for further alignments.  */
-    size = (nb + SIZE_SZ + pagemask) & ~pagemask;
-#else
-    size = (nb + SIZE_SZ + MALLOC_ALIGN_MASK + pagemask) & ~pagemask;
-#endif
+    if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
+      size = (nb + SIZE_SZ + pagemask) & ~pagemask;
+    else
+      size = (nb + SIZE_SZ + MALLOC_ALIGN_MASK + pagemask) & ~pagemask;
     tried_mmap = true;
 
     /* Don't try if size wraps around 0 */
@@ -2976,13 +2983,16 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
           address argument for later munmap in free() and realloc().
         */
 
-#if 1
- /* For glibc, chunk2mem increases the address by 2*SIZE_SZ and
-   MALLOC_ALIGN_MASK is 2*SIZE_SZ-1.  Each mmap'ed area is page
-   aligned and therefore definitely MALLOC_ALIGN_MASK-aligned.  */
-        assert (((INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK) == 0);
-#else
-        front_misalign = (INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK;
+ if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
+  {
+    /* For glibc, chunk2mem increases the address by 2*SIZE_SZ and
+       MALLOC_ALIGN_MASK is 2*SIZE_SZ-1.  Each mmap'ed area is page
+       aligned and therefore definitely MALLOC_ALIGN_MASK-aligned.  */
+    assert (((INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK) == 0);
+    front_misalign = 0;
+  }
+ else
+  front_misalign = (INTERNAL_SIZE_T)chunk2mem(mm) & MALLOC_ALIGN_MASK;
         if (front_misalign > 0) {
           correction = MALLOC_ALIGNMENT - front_misalign;
           p = (mchunkptr)(mm + correction);
@@ -2990,7 +3000,6 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
           set_head(p, (size - correction) |IS_MMAPPED);
         }
         else
-#endif
   {
     p = (mchunkptr)mm;
     set_head(p, size|IS_MMAPPED);
@@ -3278,8 +3287,25 @@ static Void_t* sYSMALLOc(nb, av) INTERNA
 
       /* handle non-contiguous cases */
       else {
-        /* MORECORE/mmap must correctly align */
-        assert(((unsigned long)chunk2mem(brk) & MALLOC_ALIGN_MASK) == 0);
+ if (MALLOC_ALIGNMENT == 2 * SIZE_SZ)
+  /* MORECORE/mmap must correctly align */
+  assert(((unsigned long)chunk2mem(brk) & MALLOC_ALIGN_MASK) == 0);
+ else {
+  front_misalign = (INTERNAL_SIZE_T)chunk2mem(brk) & MALLOC_ALIGN_MASK;
+  if (front_misalign > 0) {
+
+    /*
+      Skip over some bytes to arrive at an aligned position.
+      We don't need to specially mark these wasted front bytes.
+      They will never be accessed anyway because
+      prev_inuse of av->top (and any chunk created from its start)
+      is always true after initialization.
+    */
+
+    correction = MALLOC_ALIGNMENT - front_misalign;
+    aligned_brk += correction;
+  }
+ }
 
         /* Find out current end of memory */
         if (snd_brk == (char*)(MORECORE_FAILURE)) {