[Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Romain Geissler
Hi,

While migrating all the Amadeus C++ code base from binutils 2.20 to
binutils 2.25.1, we discovered a strong performance regression in partial
linking, starting from binutils 2.21. The commit that introduced the
regression is:

commit 0672748ac053cc4f8159af0d21ca88ae8b3778e6
Author: H.J. Lu <[hidden email]>
Date:   Fri Apr 30 18:27:32 2010 +0000

    Remove relocation against discarded sections for relocatable link.

Mailing list archive link:
https://sourceware.org/ml/binutils/2010-04/msg00479.html


The purpose of the patch was to completely remove debug relocation that
were referring to discarded section, to avoid filling the resulting file
with empty relocs. However, to do that, relocs are removed, one by one,
from a contiguous array, using memmove. When you are handling a debug
section having millions of relocs, and when many of them should be
discarded, this results in millions of memmove at the end of the
relocation array.

This is the scenario that happens in our case. We have some complex C++
applications and for some componentS we package them in .o files generated
with "ld -r". We use partial linking at several levels: we have module1.o
(which embeds all .o from module 1, with ld -r), module2.o, and from these
two we build module3.o (using ld -r of module1.o + module2.o + others
module3-specific .o files). Since all these modules reference the same
common base classes, where some methods are inlined, a lot of .text.method
are inlined by g++, but then discarded by ld because many .o files embed
the same sections. So the above memmove is called many times.

When we create the first level module1.o and module2.o, everything works
fine. ld -r last less than 10s. However, that creates huges .o files,
having huge debug sections with millions of relocs. Later when we build
module3.o from module1.o and module2.o, since the memmove happens on huge
relocs array, ld end up consuming 100% CPU for minutes. With binutils
2.25.1 one single link operation lasts more than 5 minutes, compared to
10s with ld 2.20. With gold we haven't any issue, whatever the version.

I cannot provide you our proprietary code, but I have created a
reproducer. The attached Makefile (just run with "make -j N") will
generate A0.o ... A150.o, link the 50 first .o into first.o, then 50
following ones in second.o, then link the 50 last ones + first.o +
second.o in all.o. The latest link step takes tens of minutes instead of
tens of seconds.

I have a proposal of patch to fix that, see the patch attached. Note that
I know it only works for x86/x64 and that it will break other targets, I
just want to make sure you agree on the idea of the fix. To fix this
performance issue, I choose to have two iterators:
 - next_rel: that points to the next relocation that we have handled and
validated
 - rel: that points to the current relocation that we are currently
studying for relocation.
When a relocation must be discarded, rather memmove'ing the whole array,
we just increment rel but not next_rel. At each end of loop, if next_rel
don't match rel (iterators are not synced anymore because one or more
reloc was discarded), then we copy the content of rel into the content of
next_rel. SO in worst case, we copy all the element of the array only
once.

This patch was tested without regression on x64. It generates
byte-for-byte identical binaries. Linking time is back to tens of seconds
again.

Here is the proposed patch. Please note that I don't really like the usage
of "goto" and the fact that we have to modify all "continue" statements.
If you have better suggestions please raise them, I'll adapt my patch.

Cheers,
Romain

From 50ff8e7d9c8d3f979c69a65ea131b8d7182f082f Mon Sep 17 00:00:00 2001
From: Romain Geissler <[hidden email]>
Date: Wed, 4 Nov 2015 16:44:51 +0100
Subject: [PATCH] Avoid massive memmove when skipping a debug relocation with
 ld -r

---
 bfd/elf-bfd.h      | 23 ++++++++++------------
 bfd/elf32-i386.c   | 56 ++++++++++++++++++++++++++++++++++++------------------
 bfd/elf64-x86-64.c | 52 +++++++++++++++++++++++++++++++++-----------------
 3 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
index 6a87d71..8fb8f16 100644
--- a/bfd/elf-bfd.h
+++ b/bfd/elf-bfd.h
@@ -2491,10 +2491,9 @@ extern asection _bfd_elf_large_com_section;
    link, we remove such relocations.  Otherwise, we just want the
    section contents zeroed and avoid any special processing.  */
 #define RELOC_AGAINST_DISCARDED_SECTION(info, input_bfd, input_section, \
- rel, count, relend, \
+ rel, next_rel, relend, \
  howto, index, contents) \
   { \
-    int i_; \
     _bfd_clear_contents (howto, input_bfd, input_section, \
  contents + rel[index].r_offset); \
  \
@@ -2514,22 +2513,20 @@ extern asection _bfd_elf_large_com_section;
     rel_hdr = _bfd_elf_single_rel_hdr (input_section); \
     rel_hdr->sh_size -= rel_hdr->sh_entsize; \
  \
-    memmove (rel, rel + count, \
-     (relend - rel - count) * sizeof (*rel)); \
- \
     input_section->reloc_count--; \
-    relend -= count; \
-    rel--; \
+        next_rel--; \
     continue; \
   } \
       } \
  \
-    for (i_ = 0; i_ < count; i_++) \
-      { \
- rel[i_].r_info = 0; \
- rel[i_].r_addend = 0; \
-      } \
-    rel += count - 1; \
+    rel->r_info = 0; \
+    rel->r_addend = 0; \
+ \
+    if (next_rel != rel) \
+    { \
+      *next_rel = *rel; \
+    } \
+ \
     continue; \
   }

diff --git a/bfd/elf32-i386.c b/bfd/elf32-i386.c
index 69c0b54..ef5ee15 100644
--- a/bfd/elf32-i386.c
+++ b/bfd/elf32-i386.c
@@ -3151,6 +3151,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   bfd_vma *local_got_offsets;
   bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
+  Elf_Internal_Rela *next_rel;
   Elf_Internal_Rela *relend;
   bfd_boolean is_vxworks_tls;
   unsigned plt_entry_size;
@@ -3176,8 +3177,9 @@ elf_i386_relocate_section (bfd *output_bfd,
   plt_entry_size = GET_PLT_ENTRY_SIZE (output_bfd);

   rel = relocs;
+  next_rel = rel;
   relend = relocs + input_section->reloc_count;
-  for (; rel < relend; rel++)
+  for (; rel < relend; rel++, next_rel++)
     {
       unsigned int r_type;
       reloc_howto_type *howto;
@@ -3196,7 +3198,7 @@ elf_i386_relocate_section (bfd *output_bfd,
       r_type = ELF32_R_TYPE (rel->r_info);
       if (r_type == R_386_GNU_VTINHERIT
   || r_type == R_386_GNU_VTENTRY)
- continue;
+        goto copy_rel_and_continue;

       if ((indx = r_type) >= R_386_standard
   && ((indx = r_type - R_386_ext_offset) - R_386_standard
@@ -3323,10 +3325,10 @@ elf_i386_relocate_section (bfd *output_bfd,

       if (sec != NULL && discarded_section (sec))
  RELOC_AGAINST_DISCARDED_SECTION (info, input_bfd, input_section,
- rel, 1, relend, howto, 0, contents);
+ rel, next_rel, relend, howto, 0, contents);

       if (info->relocatable)
- continue;
+        goto copy_rel_and_continue;

       /* Since STT_GNU_IFUNC symbol must go through PLT, we handle
  it here if it is defined in a non-shared object.  */
@@ -3419,7 +3421,7 @@ elf_i386_relocate_section (bfd *output_bfd,
      we need to include the symbol value so that it
      becomes an addend for the dynamic reloc.  For an
      internal symbol, we have updated addend.  */
-  continue;
+          goto copy_rel_and_continue;
  }
       /* FALLTHROUGH */
     case R_386_PC32:
@@ -3759,7 +3761,7 @@ elf_i386_relocate_section (bfd *output_bfd,
  need to include the symbol value so that it becomes
  an addend for the dynamic reloc.  */
       if (! relocate)
- continue;
+            goto copy_rel_and_continue;
     }
   break;

@@ -3833,8 +3835,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   bfd_put_32 (output_bfd, elf_i386_tpoff (info, relocation),
       contents + roff);
   /* Skip R_386_PC32/R_386_PLT32.  */
-  rel++;
-  continue;
+          goto skip_next_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
  {
@@ -3859,7 +3860,7 @@ elf_i386_relocate_section (bfd *output_bfd,
      contents + roff - 1);
   bfd_put_32 (output_bfd, -elf_i386_tpoff (info, relocation),
       contents + roff);
-  continue;
+          goto copy_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
  {
@@ -3874,7 +3875,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   roff = rel->r_offset;
   bfd_put_8 (output_bfd, 0x66, contents + roff);
   bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
-  continue;
+          goto copy_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_IE)
  {
@@ -3927,7 +3928,7 @@ elf_i386_relocate_section (bfd *output_bfd,
     }
   bfd_put_32 (output_bfd, -elf_i386_tpoff (info, relocation),
       contents + rel->r_offset);
-  continue;
+          goto copy_rel_and_continue;
  }
       else
  {
@@ -3976,7 +3977,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   else
     bfd_put_32 (output_bfd, elf_i386_tpoff (info, relocation),
  contents + rel->r_offset);
-  continue;
+          goto copy_rel_and_continue;
  }
     }

@@ -4172,8 +4173,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   - htab->elf.sgotplt->output_offset,
   contents + roff + 8);
       /* Skip R_386_PLT32.  */
-      rel++;
-      continue;
+          goto skip_next_rel_and_continue;
     }
   else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
     {
@@ -4212,7 +4212,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   - htab->elf.sgotplt->output_section->vma
   - htab->elf.sgotplt->output_offset,
   contents + roff);
-      continue;
+          goto copy_rel_and_continue;
     }
   else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
     {
@@ -4245,7 +4245,7 @@ elf_i386_relocate_section (bfd *output_bfd,
   bfd_put_8 (output_bfd, 0xd8, contents + roff + 1);
  }

-      continue;
+          goto copy_rel_and_continue;
     }
   else
     BFD_ASSERT (FALSE);
@@ -4269,8 +4269,7 @@ elf_i386_relocate_section (bfd *output_bfd,
       memcpy (contents + rel->r_offset - 2,
       "\x65\xa1\0\0\0\0\x90\x8d\x74\x26", 11);
       /* Skip R_386_PC32/R_386_PLT32.  */
-      rel++;
-      continue;
+          goto skip_next_rel_and_continue;
     }

   if (htab->elf.sgot == NULL)
@@ -4335,7 +4334,7 @@ elf_i386_relocate_section (bfd *output_bfd,
  abort ();
       elf_append_rel (output_bfd, sreloc, &outrel);
       if (indx)
- continue;
+          goto copy_rel_and_continue;
       else if (r_type == R_386_TLS_LE_32)
  relocation = elf_i386_dtpoff_base (info) - relocation;
       else
@@ -4410,6 +4409,25 @@ check_relocation_error:
       return FALSE;
     }
  }
+
+copy_rel_and_continue:
+      if (next_rel != rel)
+      {
+        *next_rel = *rel;
+      }
+
+      continue;
+
+skip_next_rel_and_continue:
+      if (next_rel != rel)
+      {
+        *next_rel = *rel;
+      }
+
+      rel++;
+      next_rel++;
+
+      continue;
     }

   return TRUE;
diff --git a/bfd/elf64-x86-64.c b/bfd/elf64-x86-64.c
index 18983e8..9dd5f13 100644
--- a/bfd/elf64-x86-64.c
+++ b/bfd/elf64-x86-64.c
@@ -3439,6 +3439,7 @@ elf_x86_64_relocate_section (bfd *output_bfd,
   bfd_vma *local_got_offsets;
   bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
+  Elf_Internal_Rela *next_rel;
   Elf_Internal_Rela *relend;
   const unsigned int plt_entry_size = GET_PLT_ENTRY_SIZE (info->output_bfd);

@@ -3455,8 +3456,9 @@ elf_x86_64_relocate_section (bfd *output_bfd,
   elf_x86_64_set_tls_module_base (info);

   rel = relocs;
+  next_rel = rel;
   relend = relocs + input_section->reloc_count;
-  for (; rel < relend; rel++)
+  for (; rel < relend; rel++, next_rel++)
     {
       unsigned int r_type;
       reloc_howto_type *howto;
@@ -3476,7 +3478,7 @@ elf_x86_64_relocate_section (bfd *output_bfd,
       r_type = ELF32_R_TYPE (rel->r_info);
       if (r_type == (int) R_X86_64_GNU_VTINHERIT
   || r_type == (int) R_X86_64_GNU_VTENTRY)
- continue;
+        goto copy_rel_and_continue;

       if (r_type >= (int) R_X86_64_standard)
  {
@@ -3535,10 +3537,10 @@ elf_x86_64_relocate_section (bfd *output_bfd,

       if (sec != NULL && discarded_section (sec))
  RELOC_AGAINST_DISCARDED_SECTION (info, input_bfd, input_section,
- rel, 1, relend, howto, 0, contents);
+ rel, next_rel, relend, howto, 0, contents);

       if (info->relocatable)
- continue;
+        goto copy_rel_and_continue;

       if (rel->r_addend == 0 && !ABI_64_P (output_bfd))
  {
@@ -3682,7 +3684,7 @@ elf_x86_64_relocate_section (bfd *output_bfd,
      we need to include the symbol value so that it
      becomes an addend for the dynamic reloc.  For an
      internal symbol, we have updated addend.  */
-  continue;
+          goto copy_rel_and_continue;
  }
       /* FALLTHROUGH */
     case R_X86_64_PC32:
@@ -4223,7 +4225,7 @@ direct:
  need to include the symbol value so that it becomes
  an addend for the dynamic reloc.  */
       if (! relocate)
- continue;
+            goto copy_rel_and_continue;
     }

   break;
@@ -4295,8 +4297,7 @@ direct:
       elf_x86_64_tpoff (info, relocation),
       contents + roff + 8 + largepic);
   /* Skip R_X86_64_PC32/R_X86_64_PLT32/R_X86_64_PLTOFF64.  */
-  rel++;
-  continue;
+          goto skip_next_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
  {
@@ -4319,7 +4320,7 @@ direct:
   bfd_put_32 (output_bfd,
       elf_x86_64_tpoff (info, relocation),
       contents + roff);
-  continue;
+  goto copy_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_X86_64_TLSDESC_CALL)
  {
@@ -4330,7 +4331,7 @@ direct:
      xchg %ax,%ax.  */
   bfd_put_8 (output_bfd, 0x66, contents + roff);
   bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
-  continue;
+  goto copy_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_X86_64_GOTTPOFF)
  {
@@ -4405,7 +4406,7 @@ direct:
   bfd_put_32 (output_bfd,
       elf_x86_64_tpoff (info, relocation),
       contents + roff);
-  continue;
+  goto copy_rel_and_continue;
  }
       else
  BFD_ASSERT (FALSE);
@@ -4577,8 +4578,7 @@ direct:
   bfd_put_32 (output_bfd, relocation,
       contents + roff + 8 + largepic);
   /* Skip R_X86_64_PLT32/R_X86_64_PLTOFF64.  */
-  rel++;
-  continue;
+          goto skip_next_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
  {
@@ -4603,7 +4603,7 @@ direct:
       - input_section->output_offset
       - 4,
       contents + roff);
-  continue;
+  goto copy_rel_and_continue;
  }
       else if (ELF32_R_TYPE (rel->r_info) == R_X86_64_TLSDESC_CALL)
  {
@@ -4616,7 +4616,7 @@ direct:

   bfd_put_8 (output_bfd, 0x66, contents + roff);
   bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
-  continue;
+  goto copy_rel_and_continue;
  }
       else
  BFD_ASSERT (FALSE);
@@ -4661,8 +4661,7 @@ direct:
  memcpy (contents + rel->r_offset - 3,
  "\x0f\x1f\x40\x00\x64\x8b\x04\x25\0\0\0", 12);
       /* Skip R_X86_64_PC32/R_X86_64_PLT32/R_X86_64_PLTOFF64.  */
-      rel++;
-      continue;
+          goto skip_next_rel_and_continue;
     }

   if (htab->elf.sgot == NULL)
@@ -4777,6 +4776,25 @@ check_relocation_error:
       return FALSE;
     }
  }
+
+copy_rel_and_continue:
+      if (next_rel != rel)
+      {
+        *next_rel = *rel;
+      }
+
+      continue;
+
+skip_next_rel_and_continue:
+      if (next_rel != rel)
+      {
+        *next_rel = *rel;
+      }
+
+      rel++;
+      next_rel++;
+
+      continue;
     }

   return TRUE;
--
2.3.0



Reproducer Makefile: just run "make -j N" and look for the last relocation
time

CXXFLAGS=-std=gnu++11 -O2 -ggdb3

MAX=150
MAX_ONE_THIRD=$(shell echo $$((${MAX} / 3)))
MAX_ONE_THIRD_PLUS_ONE=$(shell echo $$((${MAX_ONE_THIRD} + 1)))
MAX_TWO_THIRD=$(shell echo $$((2 * ${MAX} / 3)))
MAX_TWO_THIRD_PLUS_ONE=$(shell echo $$((${MAX_TWO_THIRD} + 1)))

define A0_H_FILE_CONTENT
#ifndef HEADER_GUARD_A0
#define HEADER_GUARD_A0

#include <string>

struct A0
{
    A0() {}

    ~A0()
    {
        relocatedCall2();
    }

    A0(const std::string& value)
        :_value(value)
    {}

    std::string inlinedCall() const
    {
        return _value + relocatedCall1();
    }

    std::string relocatedCall1() const;

    void relocatedCall2();

    std::string _value;
};

#endif
endef

define A0_CPP_FILE_CONTENT
#include "A0.h"

std::string A0::relocatedCall1() const
{
    return "relocatedCall1";
}

void A0::relocatedCall2() {}
endef

#$1 = number N
#$2 = N - 1
define AXXX_H_FILE_CONTENT
#ifndef HEADER_GUARD_A$1
#define HEADER_GUARD_A$1

#include <string>
#include <map>
#include "A$2.h"

struct A$1
{
    A$1() {}

    A$1(const std::string& key, const A$2& value)
    {
        A$2 oldValueByInlinedCopy = _map[key];
        _map[key] = value;
        _map[oldValueByInlinedCopy.inlinedCall()] = oldValueByInlinedCopy;
    }

    A$1(const std::map<std::string, A$2>& map)
        :_map(map)
    {}

    std::string inlinedCall()
    {
        return "inlinedCall" + relocatedCall();
    }

    std::map<std::string, A$2> _map;

    std::string relocatedCall() const;
};

#endif
endef

#$1 = number N
define AXXX_CPP_FILE_CONTENT
#include "A$1.h"
#include "A0.h"

std::string A$1::relocatedCall() const
{
    std::string s;

    for (auto pairByInlinedCopy : _map)
    {
        s += pairByInlinedCopy.first + " = {" + pairByInlinedCopy.second.inlinedCall() + "}, ";
    }

    A0 inlinedA0("A$1");

    return s + inlinedA0.inlinedCall();
}
endef

all:all.o

first.o second.o all.o:
        time ${LD} -r -o $@ $^

first.o:$(shell echo A{0..${MAX_ONE_THIRD}}.o)
second.o:$(shell echo A{${MAX_ONE_THIRD_PLUS_ONE}..${MAX_TWO_THIRD}}.o)
all.o:$(shell echo A{${MAX_TWO_THIRD_PLUS_ONE}..${MAX}}.o) first.o second.o

$(shell echo A{0..${MAX}}.o):|$(shell echo A{0..${MAX}}.{h,cpp})
        $(COMPILE.cc) $(OUTPUT_OPTION) ${@:%.o=%.cpp}

$(shell echo A{0..${MAX}}.{h,cpp}):
        cat>$@<<<"$${FILE_CONTENT}"

A0.h:export FILE_CONTENT=${A0_H_FILE_CONTENT}
A0.cpp:export FILE_CONTENT=${A0_CPP_FILE_CONTENT}
MINUS_ONE_SEQUENCE:=$(shell echo {0..${MAX}})
MINUS_ONE=$(word $1,${MINUS_ONE_SEQUENCE})
A%.h:export FILE_CONTENT=$(call AXXX_H_FILE_CONTENT,${*:A%=%},$(call MINUS_ONE,${*:A%=%}))
A%.cpp:export FILE_CONTENT=$(call AXXX_CPP_FILE_CONTENT,${*:A%=%})

clean:
        ${RM} *.h *.cpp *.o

soft-clean:
        ${RM} all.o first.o second.o

.PHONY:all clean soft-clean
.PRECIOUS:$(shell echo A{0..${MAX}}.{h,cpp,o})
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Alan Modra-3
On Fri, Nov 06, 2015 at 05:22:56PM +0100, Romain Geissler wrote:
> When we create the first level module1.o and module2.o, everything works
> fine. ld -r last less than 10s. However, that creates huges .o files,
> having huge debug sections with millions of relocs. Later when we build
> module3.o from module1.o and module2.o, since the memmove happens on huge
> relocs array, ld end up consuming 100% CPU for minutes. With binutils
> 2.25.1 one single link operation lasts more than 5 minutes, compared to
> 10s with ld 2.20.

I've been telling people for at least 10 years not to use ld -r as a
packaging tool, so I'm tempted to say this as a good result..

> I have a proposal of patch to fix that, see the patch attached. Note that
> I know it only works for x86/x64 and that it will break other targets, I
> just want to make sure you agree on the idea of the fix. To fix this
> performance issue, I choose to have two iterators:

The idea is good, but the implementation horrible due to trying to
keep RELOC_AGAINST_DISCARDED_SECTION.  I think you should just throw
that away and start over.

Here for instance is what I'm testing on powerpc64, one of the more
complex targets to change given that powerpc emits relocs that match
edited code (TLS sequences and others) rather than just leaving the
original relocs.  I've replaced "rel" with "wrel" and "rrel".  The
rules are fairly simple:  Copy rrel[0] to wrel[0] at start of loop.
Use "wrel" as the pointer for negative or zero index array
dereferencing, "rrel" for positive.  If restarting current reloc
processing, as done by powerpc after changing the symbol, then copy
wrel[0] back to rrel[0] (but optimised a little in most cases by
changing rrel[0] in the first place).

We should probably look into removing the "???" code below.  If that
was just to pass the ld-elf/linkonce1 test, I think the test should be
updated and an empty reloc section written.  Also, it would be nice to
discard relocs for more than just debug sections, but that would
require changes to elf-eh-frame.c.

        * elf64-ppc.c (ppc64_elf_relocate_section): Use read and write
        pointers to reloc array, rather than memmove when deleting a
        reloc.  Don't use RELOC_AGAINST_DISCARDED_SECTION.  Adjust
        reloc counts at end of loop.

diff --git a/bfd/elf64-ppc.c b/bfd/elf64-ppc.c
index cda8e59..e6a3983 100644
--- a/bfd/elf64-ppc.c
+++ b/bfd/elf64-ppc.c
@@ -13161,7 +13161,8 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   struct ppc_link_hash_table *htab;
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
-  Elf_Internal_Rela *rel;
+  Elf_Internal_Rela *rrel;
+  Elf_Internal_Rela *wrel;
   Elf_Internal_Rela *relend;
   Elf_Internal_Rela outrel;
   bfd_byte *loc;
@@ -13193,9 +13194,9 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   sym_hashes = elf_sym_hashes (input_bfd);
   is_opd = ppc64_elf_section_data (input_section)->sec_type == sec_opd;
 
-  rel = relocs;
+  rrel = wrel = relocs;
   relend = relocs + input_section->reloc_count;
-  for (; rel < relend; rel++)
+  for (; rrel < relend; wrel++, rrel++)
     {
       enum elf_ppc64_reloc_type r_type;
       bfd_vma addend;
@@ -13219,21 +13220,24 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       struct ppc_stub_hash_entry *stub_entry;
       bfd_vma max_br_offset;
       bfd_vma from;
-      const Elf_Internal_Rela orig_rel = *rel;
+      const Elf_Internal_Rela orig_rel = *rrel;
       reloc_howto_type *howto;
       struct reloc_howto_struct alt_howto;
 
-      r_type = ELF64_R_TYPE (rel->r_info);
-      r_symndx = ELF64_R_SYM (rel->r_info);
+      if (wrel != rrel)
+ *wrel = *rrel;
+
+      r_type = ELF64_R_TYPE (rrel->r_info);
+      r_symndx = ELF64_R_SYM (rrel->r_info);
 
       /* For old style R_PPC64_TOC relocs with a zero symbol, use the
  symbol of the previous ADDR64 reloc.  The symbol gives us the
  proper TOC base to use.  */
-      if (rel->r_info == ELF64_R_INFO (0, R_PPC64_TOC)
-  && rel != relocs
-  && ELF64_R_TYPE (rel[-1].r_info) == R_PPC64_ADDR64
+      if (wrel->r_info == ELF64_R_INFO (0, R_PPC64_TOC)
+  && wrel != relocs
+  && ELF64_R_TYPE (wrel[-1].r_info) == R_PPC64_ADDR64
   && is_opd)
- r_symndx = ELF64_R_SYM (rel[-1].r_info);
+ r_symndx = ELF64_R_SYM (wrel[-1].r_info);
 
       sym = NULL;
       sec = NULL;
@@ -13251,12 +13255,12 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   sec = local_sections[r_symndx];
   sym_name = bfd_elf_sym_name (input_bfd, symtab_hdr, sym, sec);
   sym_type = ELF64_ST_TYPE (sym->st_info);
-  relocation = _bfd_elf_rela_local_sym (output_bfd, sym, &sec, rel);
+  relocation = _bfd_elf_rela_local_sym (output_bfd, sym, &sec, wrel);
   opd = get_opd_info (sec);
   if (opd != NULL && opd->adjust != NULL)
     {
       long adjust = opd->adjust[OPD_NDX (sym->st_value
- + rel->r_addend)];
+ + wrel->r_addend)];
       if (adjust == -1)
  relocation = 0;
       else
@@ -13267,7 +13271,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
      If it is a reloc against some other .opd symbol,
      then the symbol value will be adjusted later.  */
   if (ELF_ST_TYPE (sym->st_info) == STT_SECTION)
-    rel->r_addend += adjust;
+    wrel->r_addend += adjust;
   else
     relocation += adjust;
  }
@@ -13277,7 +13281,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  {
   bfd_boolean ignored;
 
-  RELOC_FOR_GLOBAL_SYMBOL (info, input_bfd, input_section, rel,
+  RELOC_FOR_GLOBAL_SYMBOL (info, input_bfd, input_section, wrel,
    r_symndx, symtab_hdr, sym_hashes,
    h_elf, sec, relocation,
    unresolved_reloc, warned, ignored);
@@ -13314,10 +13318,23 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       h = (struct ppc_link_hash_entry *) h_elf;
 
       if (sec != NULL && discarded_section (sec))
- RELOC_AGAINST_DISCARDED_SECTION (info, input_bfd, input_section,
- rel, 1, relend,
- ppc64_elf_howto_table[r_type], 0,
- contents);
+ {
+  _bfd_clear_contents (ppc64_elf_howto_table[r_type],
+       input_bfd, input_section,
+       contents + wrel->r_offset);
+  wrel->r_info = 0;
+  wrel->r_addend = 0;
+
+  /* For ld -r, remove relocations in debug sections against
+     sections defined in discarded sections.  Not done for
+     non-debug to preserve relocs in .eh_frame which the
+     eh_frame editing code expects to be present.  */
+  if (bfd_link_relocatable (info)
+      && (input_section->flags & SEC_DEBUGGING))
+    wrel--;
+
+  continue;
+ }
 
       if (bfd_link_relocatable (info))
  continue;
@@ -13355,7 +13372,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   unsigned char *toc_tls;
 
   if (!get_tls_mask (&toc_tls, &toc_symndx, &toc_addend,
-     &local_syms, rel, input_bfd))
+     &local_syms, wrel, input_bfd))
     return FALSE;
 
   if (toc_tls)
@@ -13385,7 +13402,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       (!IS_PPC64_TLS_RELOC (r_type)
        ? _("%P: %H: %s used with TLS symbol `%T'\n")
        : _("%P: %H: %s used with non-TLS symbol `%T'\n"),
-       input_bfd, input_section, rel->r_offset,
+       input_bfd, input_section, wrel->r_offset,
        ppc64_elf_howto_table[r_type]->name,
        sym_name);
  }
@@ -13409,13 +13426,13 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   break;
 
  case R_PPC64_LO_DS_OPT:
-  insn = bfd_get_32 (output_bfd, contents + rel->r_offset - d_offset);
+  insn = bfd_get_32 (output_bfd, contents + wrel->r_offset - d_offset);
   if ((insn & (0x3f << 26)) != 58u << 26)
     abort ();
   insn += (14u << 26) - (58u << 26);
-  bfd_put_32 (output_bfd, insn, contents + rel->r_offset - d_offset);
+  bfd_put_32 (output_bfd, insn, contents + wrel->r_offset - d_offset);
   r_type = R_PPC64_TOC16_LO;
-  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
   break;
 
  case R_PPC64_TOC16:
@@ -13428,7 +13445,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
     int retval;
 
     retval = get_tls_mask (&toc_tls, &toc_symndx, &toc_addend,
-   &local_syms, rel, input_bfd);
+   &local_syms, wrel, input_bfd);
     if (retval == 0)
       return FALSE;
 
@@ -13467,10 +13484,10 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (tls_mask != 0
       && (tls_mask & TLS_TPREL) == 0)
     {
-      rel->r_offset -= d_offset;
-      bfd_put_32 (output_bfd, NOP, contents + rel->r_offset);
+      wrel->r_offset -= d_offset;
+      bfd_put_32 (output_bfd, NOP, contents + wrel->r_offset);
       r_type = R_PPC64_NONE;
-      rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
     }
   break;
 
@@ -13480,22 +13497,24 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       && (tls_mask & TLS_TPREL) == 0)
     {
     toctprel:
-      insn = bfd_get_32 (output_bfd, contents + rel->r_offset - d_offset);
+      insn = bfd_get_32 (output_bfd,
+ contents + wrel->r_offset - d_offset);
       insn &= 31 << 21;
       insn |= 0x3c0d0000; /* addis 0,13,0 */
-      bfd_put_32 (output_bfd, insn, contents + rel->r_offset - d_offset);
+      bfd_put_32 (output_bfd, insn,
+  contents + wrel->r_offset - d_offset);
       r_type = R_PPC64_TPREL16_HA;
       if (toc_symndx != 0)
  {
-  rel->r_info = ELF64_R_INFO (toc_symndx, r_type);
-  rel->r_addend = toc_addend;
+  rrel->r_info = ELF64_R_INFO (toc_symndx, r_type);
+  rrel->r_addend = toc_addend;
   /* We changed the symbol.  Start over in order to
      get h, sym, sec etc. right.  */
-  rel--;
+  wrel--, rrel--;
   continue;
  }
       else
- rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+ wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
     }
   break;
 
@@ -13503,26 +13522,27 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (tls_mask != 0
       && (tls_mask & TLS_TPREL) == 0)
     {
-      insn = bfd_get_32 (output_bfd, contents + rel->r_offset);
+      insn = bfd_get_32 (output_bfd, contents + wrel->r_offset);
       insn = _bfd_elf_ppc_at_tls_transform (insn, 13);
       if (insn == 0)
  abort ();
-      bfd_put_32 (output_bfd, insn, contents + rel->r_offset);
+      bfd_put_32 (output_bfd, insn, contents + wrel->r_offset);
       /* Was PPC64_TLS which sits on insn boundary, now
  PPC64_TPREL16_LO which is at low-order half-word.  */
-      rel->r_offset += d_offset;
+      wrel->r_offset += d_offset;
       r_type = R_PPC64_TPREL16_LO;
       if (toc_symndx != 0)
  {
-  rel->r_info = ELF64_R_INFO (toc_symndx, r_type);
-  rel->r_addend = toc_addend;
+  rrel->r_offset = wrel->r_offset;
+  rrel->r_info = ELF64_R_INFO (toc_symndx, r_type);
+  rrel->r_addend = toc_addend;
   /* We changed the symbol.  Start over in order to
      get h, sym, sec etc. right.  */
-  rel--;
+  wrel--, rrel--;
   continue;
  }
       else
- rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+ wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
     }
   break;
 
@@ -13543,11 +13563,11 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   + R_PPC64_GOT_TPREL16_DS);
       else
  {
-  rel->r_offset -= d_offset;
-  bfd_put_32 (output_bfd, NOP, contents + rel->r_offset);
+  wrel->r_offset -= d_offset;
+  bfd_put_32 (output_bfd, NOP, contents + wrel->r_offset);
   r_type = R_PPC64_NONE;
  }
-      rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
     }
   break;
 
@@ -13573,17 +13593,17 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  reloc is the __tls_get_addr call associated with
  the current reloc.  Edit both insns.  */
       if (input_section->has_tls_get_addr_call
-  && rel + 1 < relend
-  && branch_reloc_hash_match (input_bfd, rel + 1,
+  && rrel + 1 < relend
+  && branch_reloc_hash_match (input_bfd, rrel + 1,
       htab->tls_get_addr,
       htab->tls_get_addr_fd))
- offset = rel[1].r_offset;
+ offset = rrel[1].r_offset;
       /* We read the low GOT_TLS (or TOC16) insn because we
  need to keep the destination reg.  It may be
  something other than the usual r3, and moved to r3
  before the call by intervening code.  */
       insn1 = bfd_get_32 (output_bfd,
-  contents + rel->r_offset - d_offset);
+  contents + wrel->r_offset - d_offset);
       if ((tls_mask & tls_gd) != 0)
  {
   /* IE */
@@ -13591,13 +13611,13 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   insn1 |= 58 << 26; /* ld */
   insn2 = 0x7c636a14; /* add 3,3,13 */
   if (offset != (bfd_vma) -1)
-    rel[1].r_info = ELF64_R_INFO (STN_UNDEF, R_PPC64_NONE);
+    rrel[1].r_info = ELF64_R_INFO (STN_UNDEF, R_PPC64_NONE);
   if ((tls_mask & TLS_EXPLICIT) == 0)
     r_type = (((r_type - (R_PPC64_GOT_TLSGD16 & 3)) & 3)
       + R_PPC64_GOT_TPREL16_DS);
   else
     r_type += R_PPC64_TOC16_DS - R_PPC64_TOC16;
-  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
  }
       else
  {
@@ -13617,29 +13637,32 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   break;
       if (r_symndx >= symtab_hdr->sh_info)
  r_symndx = STN_UNDEF;
-      rel->r_addend = htab->elf.tls_sec->vma + DTP_OFFSET;
+      rrel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      rrel->r_addend = htab->elf.tls_sec->vma + DTP_OFFSET;
       if (r_symndx != STN_UNDEF)
- rel->r_addend -= (local_syms[r_symndx].st_value
-  + sec->output_offset
-  + sec->output_section->vma);
+ rrel->r_addend -= (local_syms[r_symndx].st_value
+   + sec->output_offset
+   + sec->output_section->vma);
     }
   else if (toc_symndx != 0)
     {
       r_symndx = toc_symndx;
-      rel->r_addend = toc_addend;
+      rrel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      rrel->r_addend = toc_addend;
     }
   r_type = R_PPC64_TPREL16_HA;
-  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  rrel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  wrel->r_info = rrel->r_info;
   if (offset != (bfd_vma) -1)
     {
-      rel[1].r_info = ELF64_R_INFO (r_symndx,
-    R_PPC64_TPREL16_LO);
-      rel[1].r_offset = offset + d_offset;
-      rel[1].r_addend = rel->r_addend;
+      rrel[1].r_info = ELF64_R_INFO (r_symndx,
+     R_PPC64_TPREL16_LO);
+      rrel[1].r_offset = offset + d_offset;
+      rrel[1].r_addend = rrel->r_addend;
     }
  }
       bfd_put_32 (output_bfd, insn1,
-  contents + rel->r_offset - d_offset);
+  contents + wrel->r_offset - d_offset);
       if (offset != (bfd_vma) -1)
  {
   insn3 = bfd_get_32 (output_bfd,
@@ -13647,7 +13670,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (insn3 == NOP
       || insn3 == CROR_151515 || insn3 == CROR_313131)
     {
-      rel[1].r_offset += 4;
+      rrel[1].r_offset += 4;
       bfd_put_32 (output_bfd, insn2, contents + offset + 4);
       insn2 = NOP;
     }
@@ -13658,7 +13681,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  {
   /* We changed the symbol.  Start over in order
      to get h, sym, sec etc. right.  */
-  rel--;
+  wrel--, rrel--;
   continue;
  }
     }
@@ -13668,7 +13691,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (tls_mask != 0 && (tls_mask & TLS_GD) == 0)
     {
       unsigned int insn2, insn3;
-      bfd_vma offset = rel->r_offset;
+      bfd_vma offset = wrel->r_offset;
 
       if ((tls_mask & TLS_TPRELGD) != 0)
  {
@@ -13682,29 +13705,31 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (toc_symndx != 0)
     {
       r_symndx = toc_symndx;
-      rel->r_addend = toc_addend;
+      rrel->r_addend = toc_addend;
     }
   r_type = R_PPC64_TPREL16_LO;
-  rel->r_offset = offset + d_offset;
+  wrel->r_offset = offset + d_offset;
   insn2 = 0x38630000; /* addi 3,3,0 */
  }
-      rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
       /* Zap the reloc on the _tls_get_addr call too.  */
-      BFD_ASSERT (offset == rel[1].r_offset);
-      rel[1].r_info = ELF64_R_INFO (STN_UNDEF, R_PPC64_NONE);
+      BFD_ASSERT (offset == rrel[1].r_offset);
+      rrel[1].r_info = ELF64_R_INFO (STN_UNDEF, R_PPC64_NONE);
       insn3 = bfd_get_32 (output_bfd,
   contents + offset + 4);
       if (insn3 == NOP
   || insn3 == CROR_151515 || insn3 == CROR_313131)
  {
-  rel->r_offset += 4;
+  wrel->r_offset += 4;
   bfd_put_32 (output_bfd, insn2, contents + offset + 4);
   insn2 = NOP;
  }
       bfd_put_32 (output_bfd, insn2, contents + offset);
       if ((tls_mask & TLS_TPRELGD) == 0 && toc_symndx != 0)
  {
-  rel--;
+  rrel->r_offset = wrel->r_offset;
+  rrel->r_info = wrel->r_info;
+  wrel--, rrel--;
   continue;
  }
     }
@@ -13714,7 +13739,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (tls_mask != 0 && (tls_mask & TLS_LD) == 0)
     {
       unsigned int insn2, insn3;
-      bfd_vma offset = rel->r_offset;
+      bfd_vma offset = wrel->r_offset;
 
       if (toc_symndx)
  sec = local_sections[toc_symndx];
@@ -13725,59 +13750,59 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   break;
       if (r_symndx >= symtab_hdr->sh_info)
  r_symndx = STN_UNDEF;
-      rel->r_addend = htab->elf.tls_sec->vma + DTP_OFFSET;
+      rrel->r_addend = htab->elf.tls_sec->vma + DTP_OFFSET;
       if (r_symndx != STN_UNDEF)
- rel->r_addend -= (local_syms[r_symndx].st_value
-  + sec->output_offset
-  + sec->output_section->vma);
+ rrel->r_addend -= (local_syms[r_symndx].st_value
+   + sec->output_offset
+   + sec->output_section->vma);
 
       r_type = R_PPC64_TPREL16_LO;
-      rel->r_info = ELF64_R_INFO (r_symndx, r_type);
-      rel->r_offset = offset + d_offset;
+      rrel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      rrel->r_offset = offset + d_offset;
       /* Zap the reloc on the _tls_get_addr call too.  */
-      BFD_ASSERT (offset == rel[1].r_offset);
-      rel[1].r_info = ELF64_R_INFO (STN_UNDEF, R_PPC64_NONE);
+      BFD_ASSERT (offset == rrel[1].r_offset);
+      rrel[1].r_info = ELF64_R_INFO (STN_UNDEF, R_PPC64_NONE);
       insn2 = 0x38630000; /* addi 3,3,0 */
       insn3 = bfd_get_32 (output_bfd,
   contents + offset + 4);
       if (insn3 == NOP
   || insn3 == CROR_151515 || insn3 == CROR_313131)
  {
-  rel->r_offset += 4;
+  rrel->r_offset += 4;
   bfd_put_32 (output_bfd, insn2, contents + offset + 4);
   insn2 = NOP;
  }
       bfd_put_32 (output_bfd, insn2, contents + offset);
-      rel--;
+      wrel--, rrel--;
       continue;
     }
   break;
 
  case R_PPC64_DTPMOD64:
-  if (rel + 1 < relend
-      && rel[1].r_info == ELF64_R_INFO (r_symndx, R_PPC64_DTPREL64)
-      && rel[1].r_offset == rel->r_offset + 8)
+  if (rrel + 1 < relend
+      && rrel[1].r_info == ELF64_R_INFO (r_symndx, R_PPC64_DTPREL64)
+      && rrel[1].r_offset == wrel->r_offset + 8)
     {
       if ((tls_mask & TLS_GD) == 0)
  {
-  rel[1].r_info = ELF64_R_INFO (r_symndx, R_PPC64_NONE);
+  rrel[1].r_info = ELF64_R_INFO (r_symndx, R_PPC64_NONE);
   if ((tls_mask & TLS_TPRELGD) != 0)
     r_type = R_PPC64_TPREL64;
   else
     {
-      bfd_put_64 (output_bfd, 1, contents + rel->r_offset);
+      bfd_put_64 (output_bfd, 1, contents + wrel->r_offset);
       r_type = R_PPC64_NONE;
     }
-  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
  }
     }
   else
     {
       if ((tls_mask & TLS_LD) == 0)
  {
-  bfd_put_64 (output_bfd, 1, contents + rel->r_offset);
+  bfd_put_64 (output_bfd, 1, contents + wrel->r_offset);
   r_type = R_PPC64_NONE;
-  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
  }
     }
   break;
@@ -13786,7 +13811,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if ((tls_mask & TLS_TPREL) == 0)
     {
       r_type = R_PPC64_NONE;
-      rel->r_info = ELF64_R_INFO (r_symndx, r_type);
+      wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
     }
   break;
 
@@ -13801,24 +13826,24 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (!bfd_link_pic (info)
       && !info->traditional_format
       && h != NULL && &h->elf == htab->elf.hgot
-      && rel + 1 < relend
-      && rel[1].r_info == ELF64_R_INFO (r_symndx, R_PPC64_REL16_LO)
-      && rel[1].r_offset == rel->r_offset + 4
-      && rel[1].r_addend == rel->r_addend + 4
+      && rrel + 1 < relend
+      && rrel[1].r_info == ELF64_R_INFO (r_symndx, R_PPC64_REL16_LO)
+      && rrel[1].r_offset == wrel->r_offset + 4
+      && rrel[1].r_addend == wrel->r_addend + 4
       && relocation + 0x80008000 <= 0xffffffff)
     {
       unsigned int insn1, insn2;
-      bfd_vma offset = rel->r_offset - d_offset;
+      bfd_vma offset = wrel->r_offset - d_offset;
       insn1 = bfd_get_32 (output_bfd, contents + offset);
       insn2 = bfd_get_32 (output_bfd, contents + offset + 4);
       if ((insn1 & 0xffff0000) == 0x3c4c0000 /* addis 2,12 */
   && (insn2 & 0xffff0000) == 0x38420000 /* addi 2,2 */)
  {
   r_type = R_PPC64_ADDR16_HA;
-  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
-  rel->r_addend -= d_offset;
-  rel[1].r_info = ELF64_R_INFO (r_symndx, R_PPC64_ADDR16_LO);
-  rel[1].r_addend -= d_offset + 4;
+  wrel->r_info = ELF64_R_INFO (r_symndx, r_type);
+  wrel->r_addend -= d_offset;
+  rrel[1].r_info = ELF64_R_INFO (r_symndx, R_PPC64_ADDR16_LO);
+  rrel[1].r_addend -= d_offset + 4;
   bfd_put_32 (output_bfd, 0x3c400000, contents + offset);
  }
     }
@@ -13828,7 +13853,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       /* Handle other relocations that tweak non-addend part of insn.  */
       insn = 0;
       max_br_offset = 1 << 25;
-      addend = rel->r_addend;
+      addend = wrel->r_addend;
       reloc_dest = DEST_NORMAL;
       switch (r_type)
  {
@@ -13836,18 +13861,18 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   break;
 
  case R_PPC64_TOCSAVE:
-  if (relocation + addend == (rel->r_offset
+  if (relocation + addend == (wrel->r_offset
       + input_section->output_offset
       + input_section->output_section->vma)
       && tocsave_find (htab, NO_INSERT,
-       &local_syms, rel, input_bfd))
+       &local_syms, wrel, input_bfd))
     {
-      insn = bfd_get_32 (input_bfd, contents + rel->r_offset);
+      insn = bfd_get_32 (input_bfd, contents + wrel->r_offset);
       if (insn == NOP
   || insn == CROR_151515 || insn == CROR_313131)
  bfd_put_32 (input_bfd,
     STD_R2_0R1 + STK_TOC (htab),
-    contents + rel->r_offset);
+    contents + wrel->r_offset);
     }
   break;
 
@@ -13861,7 +13886,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  case R_PPC64_ADDR14_BRNTAKEN:
  case R_PPC64_REL14_BRNTAKEN:
   insn |= bfd_get_32 (output_bfd,
-      contents + rel->r_offset) & ~(0x01 << 21);
+      contents + wrel->r_offset) & ~(0x01 << 21);
   /* Fall thru.  */
 
  case R_PPC64_REL14:
@@ -13893,18 +13918,18 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       /* All of these stubs will modify r2, so there must be a
  branch and link followed by a nop.  The nop is
  replaced by an insn to restore r2.  */
-      if (rel->r_offset + 8 <= input_section->size)
+      if (wrel->r_offset + 8 <= input_section->size)
  {
   unsigned long br;
 
   br = bfd_get_32 (input_bfd,
-   contents + rel->r_offset);
+   contents + wrel->r_offset);
   if ((br & 1) != 0)
     {
       unsigned long nop;
 
       nop = bfd_get_32 (input_bfd,
- contents + rel->r_offset + 4);
+ contents + wrel->r_offset + 4);
       if (nop == NOP
   || nop == CROR_151515 || nop == CROR_313131)
  {
@@ -13918,7 +13943,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   else
     bfd_put_32 (input_bfd,
  LD_R2_0R1 + STK_TOC (htab),
- contents + rel->r_offset + 4);
+ contents + wrel->r_offset + 4);
   can_plt_call = TRUE;
  }
     }
@@ -13972,12 +13997,12 @@ ppc64_elf_relocate_section (bfd *output_bfd,
     info->callbacks->einfo
       (_("%P: %H: call to `%T' lacks nop, can't restore toc; "
  "recompile with -fPIC\n"),
-       input_bfd, input_section, rel->r_offset, sym_name);
+       input_bfd, input_section, wrel->r_offset, sym_name);
   else
     info->callbacks->einfo
       (_("%P: %H: call to `%T' lacks nop, can't restore toc; "
  "(-mcmodel=small toc adjust stub)\n"),
-       input_bfd, input_section, rel->r_offset, sym_name);
+       input_bfd, input_section, wrel->r_offset, sym_name);
 
   bfd_set_error (bfd_error_bad_value);
   ret = FALSE;
@@ -14009,7 +14034,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
 
   /* If the branch is out of reach we ought to have a long
      branch stub.  */
-  from = (rel->r_offset
+  from = (wrel->r_offset
   + input_section->output_offset
   + input_section->output_section->vma);
 
@@ -14050,9 +14075,9 @@ ppc64_elf_relocate_section (bfd *output_bfd,
    || stub_entry->stub_type == ppc_stub_plt_call_r2save)
   && (ALWAYS_EMIT_R2SAVE
       || stub_entry->stub_type == ppc_stub_plt_call_r2save)
-  && rel + 1 < relend
-  && rel[1].r_offset == rel->r_offset + 4
-  && ELF64_R_TYPE (rel[1].r_info) == R_PPC64_TOCSAVE)
+  && rrel + 1 < relend
+  && rrel[1].r_offset == wrel->r_offset + 4
+  && ELF64_R_TYPE (rrel[1].r_info) == R_PPC64_TOCSAVE)
  relocation += 4;
     }
 
@@ -14077,7 +14102,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
     insn ^= 0x01 << 21;
  }
 
-      bfd_put_32 (output_bfd, insn, contents + rel->r_offset);
+      bfd_put_32 (output_bfd, insn, contents + wrel->r_offset);
     }
 
   /* NOP out calls to undefined weak functions.
@@ -14090,7 +14115,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
    && relocation == 0
    && addend == 0)
     {
-      bfd_put_32 (output_bfd, NOP, contents + rel->r_offset);
+      bfd_put_32 (output_bfd, NOP, contents + wrel->r_offset);
       continue;
     }
   break;
@@ -14445,7 +14470,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  resolve to zero.  This is really just a tweak, since
  code using weak externs ought to check that they are
  defined before using them.  */
-      bfd_byte *p = contents + rel->r_offset - d_offset;
+      bfd_byte *p = contents + wrel->r_offset - d_offset;
 
       insn = bfd_get_32 (output_bfd, p);
       insn = _bfd_elf_ppc_at_tprel_transform (insn, 13);
@@ -14561,7 +14586,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       relocate = FALSE;
 
       out_off = _bfd_elf_section_offset (output_bfd, info,
- input_section, rel->r_offset);
+ input_section, wrel->r_offset);
       if (out_off == (bfd_vma) -1)
  skip = TRUE;
       else if (out_off == (bfd_vma) -2)
@@ -14569,7 +14594,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       out_off += (input_section->output_section->vma
   + input_section->output_offset);
       outrel.r_offset = out_off;
-      outrel.r_addend = rel->r_addend;
+      outrel.r_addend = wrel->r_addend;
 
       /* Optimize unaligned reloc use.  */
       if ((r_type == R_PPC64_ADDR64 && (out_off & 7) != 0)
@@ -14641,7 +14666,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   info->callbacks->einfo
     (_("%P: %H: %s for indirect "
        "function `%T' unsupported\n"),
-     input_bfd, input_section, rel->r_offset,
+     input_bfd, input_section, wrel->r_offset,
      ppc64_elf_howto_table[r_type]->name,
      sym_name);
   ret = FALSE;
@@ -14721,7 +14746,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   else if (ppc64_elf_howto_table[r_type]->pc_relative)
     addend = (input_section->output_section->vma
       + input_section->output_offset
-      + rel->r_offset);
+      + wrel->r_offset);
  }
     }
   break;
@@ -14787,7 +14812,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (htab->do_toc_opt && relocation + addend + 0x8000 < 0x10000
       && !ppc64_elf_tdata (input_bfd)->unexpected_toc_insn)
     {
-      bfd_byte *p = contents + (rel->r_offset & ~3);
+      bfd_byte *p = contents + (wrel->r_offset & ~3);
       bfd_put_32 (input_bfd, NOP, p);
     }
   break;
@@ -14803,7 +14828,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (htab->do_toc_opt && relocation + addend + 0x8000 < 0x10000
       && !ppc64_elf_tdata (input_bfd)->unexpected_toc_insn)
     {
-      bfd_byte *p = contents + (rel->r_offset & ~3);
+      bfd_byte *p = contents + (wrel->r_offset & ~3);
       insn = bfd_get_32 (input_bfd, p);
       if ((insn & (0x3f << 26)) == 12u << 26 /* addic */)
  {
@@ -14882,7 +14907,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  case R_PPC64_TPREL16_LO_DS:
  case R_PPC64_DTPREL16_DS:
  case R_PPC64_DTPREL16_LO_DS:
-  insn = bfd_get_32 (input_bfd, contents + (rel->r_offset & ~3));
+  insn = bfd_get_32 (input_bfd, contents + (wrel->r_offset & ~3));
   mask = 3;
   /* If this reloc is against an lq insn, then the value must be
      a multiple of 16.  This is somewhat of a hack, but the
@@ -14896,7 +14921,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
     {
       info->callbacks->einfo
  (_("%P: %H: error: %s not a multiple of %u\n"),
- input_bfd, input_section, rel->r_offset,
+ input_bfd, input_section, wrel->r_offset,
  howto->name,
  mask + 1);
       bfd_set_error (bfd_error_bad_value);
@@ -14913,11 +14938,11 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   && !((input_section->flags & SEC_DEBUGGING) != 0
        && h->elf.def_dynamic)
   && _bfd_elf_section_offset (output_bfd, info, input_section,
-      rel->r_offset) != (bfd_vma) -1)
+      wrel->r_offset) != (bfd_vma) -1)
  {
   info->callbacks->einfo
     (_("%P: %H: unresolvable %s against `%T'\n"),
-     input_bfd, input_section, rel->r_offset,
+     input_bfd, input_section, wrel->r_offset,
      howto->name,
      h->elf.root.root.string);
   ret = FALSE;
@@ -14932,7 +14957,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  {
   enum complain_overflow complain = complain_overflow_signed;
 
-  insn = bfd_get_32 (input_bfd, contents + (rel->r_offset & ~3));
+  insn = bfd_get_32 (input_bfd, contents + (wrel->r_offset & ~3));
   if ((insn & (0x3f << 26)) == 10u << 26 /* cmpli */)
     complain = complain_overflow_bitfield;
   else if (howto->rightshift == 0
@@ -14952,7 +14977,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  }
 
       r = _bfd_final_link_relocate (howto, input_bfd, input_section, contents,
-    rel->r_offset, relocation, addend);
+    wrel->r_offset, relocation, addend);
 
       if (r != bfd_reloc_ok)
  {
@@ -14985,7 +15010,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (!((*info->callbacks->reloc_overflow)
  (info, &h->elf.root, sym_name,
  reloc_name, orig_rel.r_addend,
- input_bfd, input_section, rel->r_offset)))
+ input_bfd, input_section, wrel->r_offset)))
     return FALSE;
  }
     }
@@ -14993,7 +15018,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
     {
       info->callbacks->einfo
  (_("%P: %H: %s against `%T': error %d\n"),
- input_bfd, input_section, rel->r_offset,
+ input_bfd, input_section, wrel->r_offset,
  reloc_name, sym_name, (int) r);
       ret = FALSE;
     }
@@ -15002,6 +15027,26 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  }
     }
 
+  if (wrel != rrel)
+    {
+      Elf_Internal_Shdr *rel_hdr;
+      size_t deleted = rrel - wrel;
+
+      rel_hdr = _bfd_elf_single_rel_hdr (input_section->output_section);
+      rel_hdr->sh_size -= rel_hdr->sh_entsize * deleted;
+      if (rel_hdr->sh_size == 0)
+ {
+  /* It is too late to remove an empty reloc section.  Leave
+     one NONE reloc.
+     ??? What is wrong with an empty section???  */
+  rel_hdr->sh_size = rel_hdr->sh_entsize;
+  deleted -= 1;
+ }
+      rel_hdr = _bfd_elf_single_rel_hdr (input_section);
+      rel_hdr->sh_size -= rel_hdr->sh_entsize * deleted;
+      input_section->reloc_count -= deleted;
+    }
+
   /* If we're emitting relocations, then shortly after this function
      returns, reloc offsets and addends for this section will be
      adjusted.  Worse, reloc symbol indices will be for the output
@@ -15011,12 +15056,12 @@ ppc64_elf_relocate_section (bfd *output_bfd,
     {
       bfd_size_type amt;
       amt = input_section->reloc_count * sizeof (Elf_Internal_Rela);
-      rel = bfd_alloc (input_bfd, amt);
+      wrel = bfd_alloc (input_bfd, amt);
       BFD_ASSERT (ppc64_elf_tdata (input_bfd)->opd.relocs == NULL);
-      ppc64_elf_tdata (input_bfd)->opd.relocs = rel;
-      if (rel == NULL)
+      ppc64_elf_tdata (input_bfd)->opd.relocs = wrel;
+      if (wrel == NULL)
  return FALSE;
-      memcpy (rel, relocs, amt);
+      memcpy (wrel, relocs, amt);
     }
   return ret;
 }

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Romain Geissler
Hi,

On Mon, 9 Nov 2015, Alan Modra wrote:

> On Fri, Nov 06, 2015 at 05:22:56PM +0100, Romain Geissler wrote:
> > When we create the first level module1.o and module2.o, everything works
> > fine. ld -r last less than 10s. However, that creates huges .o files,
> > having huge debug sections with millions of relocs. Later when we build
> > module3.o from module1.o and module2.o, since the memmove happens on huge
> > relocs array, ld end up consuming 100% CPU for minutes. With binutils
> > 2.25.1 one single link operation lasts more than 5 minutes, compared to
> > 10s with ld 2.20.
>
> I've been telling people for at least 10 years not to use ld -r as a
> packaging tool, so I'm tempted to say this as a good result..

Well indeed there was recently a regression in ld -r affecting kernel
people (the stable sort patch), and you replied exactly that. I know you
advocates against using it, but may I know why ? What's the problem with
partial linking ? Is it officially deprecated ?

I have no problem with changing it for all my company's projects, using
static libraries linked with --whole-archive should be equivalent, but I
am just curious about the motivations.

> > I have a proposal of patch to fix that, see the patch attached. Note that
> > I know it only works for x86/x64 and that it will break other targets, I
> > just want to make sure you agree on the idea of the fix. To fix this
> > performance issue, I choose to have two iterators:
>
> The idea is good, but the implementation horrible due to trying to
> keep RELOC_AGAINST_DISCARDED_SECTION.  I think you should just throw
> that away and start over.

Indeed, much better idea. I will use your patch and apply it to x86/64 as
well.

Cheers,
Romain
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Alan Modra-3
On Mon, Nov 09, 2015 at 10:04:42AM +0100, Romain Geissler wrote:
> What's the problem with partial linking ?

Section coalescing.  What is joined together is not easily separated.

- Architectures with limited branch offsets may find text sections
  become so large that a branch can't reach to a section boundary,
  which is where trampolines are usually placed.

- Architectures with limited data addressing may find that tables of
  constants and/or addresses are too large, and schemes that cope with
  this by making multiple tables, fail when one section is too large
  all by itself.

- Sections are combined in ways that lose code locality.  eg. with
  -ffunction-sections, all C static functions called "setup" will be
  placed together.

- -gc-sections works on a section basis.  Finer granularity allows
  more junk to be discarded.

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Alan Modra-3
In reply to this post by Romain Geissler
On Mon, Nov 09, 2015 at 10:04:42AM +0100, Romain Geissler wrote:
> Indeed, much better idea. I will use your patch and apply it to x86/64 as
> well.

On checking I found I'd made a mistake in the previous patch, so
decided to go with the safer approach of copying relocs at the end of
the loop.  Also, I'm not renaming "rel" to "rrel".

        * elf64-ppc.c (ppc64_elf_relocate_section): Use read and write
        pointers to reloc array, rather than memmove when deleting a
        reloc.  Don't use RELOC_AGAINST_DISCARDED_SECTION.  Adjust
        reloc counts at end of loop.
        * elf32-ppc.c (ppc_elf_relocate_section): Likewise.

diff --git a/bfd/elf32-ppc.c b/bfd/elf32-ppc.c
index 708076d..5c26077 100644
--- a/bfd/elf32-ppc.c
+++ b/bfd/elf32-ppc.c
@@ -7650,6 +7650,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
   struct elf_link_hash_entry **sym_hashes;
   struct ppc_elf_link_hash_table *htab;
   Elf_Internal_Rela *rel;
+  Elf_Internal_Rela *wrel;
   Elf_Internal_Rela *relend;
   Elf_Internal_Rela outrel;
   asection *got2;
@@ -7685,9 +7686,9 @@ ppc_elf_relocate_section (bfd *output_bfd,
  ".tls_vars"));
   if (input_section->sec_info_type == SEC_INFO_TYPE_TARGET)
     relax_info = elf_section_data (input_section)->sec_info;
-  rel = relocs;
+  rel = wrel = relocs;
   relend = relocs + input_section->reloc_count;
-  for (; rel < relend; rel++)
+  for (; rel < relend; wrel++, rel++)
     {
       enum elf_ppc_reloc_type r_type;
       bfd_vma addend;
@@ -7706,6 +7707,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
       struct plt_entry **ifunc;
       struct reloc_howto_struct alt_howto;
 
+    again:
       r_type = ELF32_R_TYPE (rel->r_info);
       sym = NULL;
       sec = NULL;
@@ -7742,8 +7744,22 @@ ppc_elf_relocate_section (bfd *output_bfd,
   howto = NULL;
   if (r_type < R_PPC_max)
     howto = ppc_elf_howto_table[r_type];
-  RELOC_AGAINST_DISCARDED_SECTION (info, input_bfd, input_section,
-   rel, 1, relend, howto, 0, contents);
+
+  _bfd_clear_contents (howto, input_bfd, input_section,
+       contents + rel->r_offset);
+  wrel->r_offset = rel->r_offset;
+  wrel->r_info = 0;
+  wrel->r_addend = 0;
+
+  /* For ld -r, remove relocations in debug sections against
+     sections defined in discarded sections.  Not done for
+     non-debug to preserve relocs in .eh_frame which the
+     eh_frame editing code expects to be present.  */
+  if (bfd_link_relocatable (info)
+      && (input_section->flags & SEC_DEBUGGING))
+    wrel--;
+
+  continue;
  }
 
       if (bfd_link_relocatable (info))
@@ -7759,7 +7775,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
   if (r_type != R_PPC_RELAX_PLT
       && r_type != R_PPC_RELAX_PLTREL24
       && r_type != R_PPC_RELAX)
-    continue;
+    goto copy_reloc;
  }
 
       /* TLS optimizations.  Replace instruction sequences and relocs
@@ -7802,10 +7818,12 @@ ppc_elf_relocate_section (bfd *output_bfd,
     {
       bfd_vma insn;
 
-      insn = bfd_get_32 (output_bfd, contents + rel->r_offset - d_offset);
+      insn = bfd_get_32 (output_bfd,
+ contents + rel->r_offset - d_offset);
       insn &= 31 << 21;
       insn |= 0x3c020000; /* addis 0,2,0 */
-      bfd_put_32 (output_bfd, insn, contents + rel->r_offset - d_offset);
+      bfd_put_32 (output_bfd, insn,
+  contents + rel->r_offset - d_offset);
       r_type = R_PPC_TPREL16_HA;
       rel->r_info = ELF32_R_INFO (r_symndx, r_type);
     }
@@ -7941,8 +7959,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
  {
   /* We changed the symbol on an LD reloc.  Start over
      in order to get h, sym, sec etc. right.  */
-  rel--;
-  continue;
+  goto again;
  }
     }
   break;
@@ -8000,8 +8017,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
       /* Zap the reloc on the _tls_get_addr call too.  */
       BFD_ASSERT (rel->r_offset - d_offset == rel[1].r_offset);
       rel[1].r_info = ELF32_R_INFO (STN_UNDEF, R_PPC_NONE);
-      rel--;
-      continue;
+      goto again;
     }
   break;
  }
@@ -8080,9 +8096,9 @@ ppc_elf_relocate_section (bfd *output_bfd,
   got_addr = (htab->got->output_section->vma
       + htab->got->output_offset
       + (h->got.offset & ~1));
-  rel->r_info = ELF32_R_INFO (0, R_PPC_ADDR16_HA);
-  rel->r_addend = got_addr;
-  rel->r_offset = (p - contents) + d_offset;
+  wrel->r_offset = (p - contents) + d_offset;
+  wrel->r_info = ELF32_R_INFO (0, R_PPC_ADDR16_HA);
+  wrel->r_addend = got_addr;
   insn &= ~0xffff;
   insn |= ((unsigned int )(got_addr + 0x8000) >> 16) & 0xffff;
   bfd_put_32 (output_bfd, insn, p);
@@ -8100,9 +8116,10 @@ ppc_elf_relocate_section (bfd *output_bfd,
   /* Use one of the spare relocs, so --emit-relocs
      output is reasonable.  */
   memmove (rel + 1, rel, (relend - rel - 1) * sizeof (*rel));
-  rel++;
+  wrel++, rel++;
+  rel->r_offset = wrel[-1].r_offset + 4;
   rel->r_info = ELF32_R_INFO (0, R_PPC_ADDR16_LO);
-  rel->r_offset += 4;
+  rel->r_addend = wrel[-1].r_addend;
 
   /* Continue on as if we had a got reloc, to output
      dynamic reloc.  */
@@ -8236,7 +8253,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
 
   bfd_set_error (bfd_error_bad_value);
   ret = FALSE;
-  continue;
+  goto copy_reloc;
 
  case R_PPC_NONE:
  case R_PPC_TLS:
@@ -8245,7 +8262,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
  case R_PPC_EMB_MRKREF:
  case R_PPC_GNU_VTINHERIT:
  case R_PPC_GNU_VTENTRY:
-  continue;
+  goto copy_reloc;
 
   /* GOT16 relocations.  Like an ADDR16 using the symbol's
      address in the GOT as relocation value instead of the
@@ -8496,7 +8513,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
 
     /* If here for a picfixup, we're done.  */
     if (r_type != ELF32_R_TYPE (rel->r_info))
-      continue;
+      goto copy_reloc;
 
     relocation = (htab->got->output_section->vma
   + htab->got->output_offset
@@ -8529,7 +8546,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
   rel->r_offset,
   TRUE))
  return FALSE;
-      continue;
+      goto copy_reloc;
     }
   break;
 
@@ -8768,7 +8785,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
       bfd_elf32_swap_reloca_out (output_bfd, &outrel, loc);
 
       if (skip == -1)
- continue;
+ goto copy_reloc;
 
       /* This reloc will be computed at runtime.  We clear the memory
  so that it contains predictable value.  */
@@ -8861,12 +8878,13 @@ ppc_elf_relocate_section (bfd *output_bfd,
        relocs to describe this relocation.  */
     BFD_ASSERT (ELF32_R_TYPE (relend[-1].r_info) == R_PPC_NONE);
     /* The relocs are at the bottom 2 bytes */
-    rel[0].r_offset += d_offset;
-    memmove (rel + 1, rel, (relend - rel - 1) * sizeof (*rel));
-    rel[0].r_info = ELF32_R_INFO (r_symndx, R_PPC_ADDR16_HA);
-    rel[1].r_offset += 4;
-    rel[1].r_info = ELF32_R_INFO (r_symndx, R_PPC_ADDR16_LO);
-    rel++;
+    wrel->r_offset = rel->r_offset + d_offset;
+    wrel->r_info = ELF32_R_INFO (r_symndx, R_PPC_ADDR16_HA);
+    wrel->r_addend = rel->r_addend;
+    memmove (wrel + 1, wrel, (relend - wrel - 1) * sizeof (*wrel));
+    wrel++, rel++;
+    wrel->r_offset += 4;
+    wrel->r_info = ELF32_R_INFO (r_symndx, R_PPC_ADDR16_LO);
   }
   continue;
 
@@ -9014,37 +9032,37 @@ ppc_elf_relocate_section (bfd *output_bfd,
   relocation = relocation + addend;
   ppc_elf_vle_split16 (output_bfd, contents + rel->r_offset,
        relocation, split16a_type);
-  continue;
+  goto copy_reloc;
 
  case R_PPC_VLE_LO16D:
   relocation = relocation + addend;
   ppc_elf_vle_split16 (output_bfd, contents + rel->r_offset,
        relocation, split16d_type);
-  continue;
+  goto copy_reloc;
 
  case R_PPC_VLE_HI16A:
   relocation = (relocation + addend) >> 16;
   ppc_elf_vle_split16 (output_bfd, contents + rel->r_offset,
        relocation, split16a_type);
-  continue;
+  goto copy_reloc;
 
  case R_PPC_VLE_HI16D:
   relocation = (relocation + addend) >> 16;
   ppc_elf_vle_split16 (output_bfd, contents + rel->r_offset,
        relocation, split16d_type);
-  continue;
+  goto copy_reloc;
 
  case R_PPC_VLE_HA16A:
   relocation = (relocation + addend + 0x8000) >> 16;
   ppc_elf_vle_split16 (output_bfd, contents + rel->r_offset,
        relocation, split16a_type);
-  continue;
+  goto copy_reloc;
 
  case R_PPC_VLE_HA16D:
   relocation = (relocation + addend + 0x8000) >> 16;
   ppc_elf_vle_split16 (output_bfd, contents + rel->r_offset,
        relocation, split16d_type);
-  continue;
+  goto copy_reloc;
 
   /* Relocate against either _SDA_BASE_, _SDA2_BASE_, or 0.  */
  case R_PPC_EMB_SDA21:
@@ -9093,7 +9111,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
 
  bfd_set_error (bfd_error_bad_value);
  ret = FALSE;
- continue;
+ goto copy_reloc;
       }
 
     if (sda != NULL)
@@ -9131,7 +9149,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
  if (r_type == R_PPC_VLE_SDA21
     && ((relocation + 0x80000) & 0xffffffff) > 0x100000)
   goto overflow;
- continue;
+ goto copy_reloc;
       }
     else if (r_type == R_PPC_EMB_SDA21
      || r_type == R_PPC_VLE_SDA21
@@ -9187,7 +9205,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
 
  bfd_set_error (bfd_error_bad_value);
  ret = FALSE;
- continue;
+ goto copy_reloc;
       }
 
     if (sda != NULL)
@@ -9234,7 +9252,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
      value, split16d_type);
       }
   }
-  continue;
+  goto copy_reloc;
 
   /* Relocate against the beginning of the section.  */
  case R_PPC_SECTOFF:
@@ -9282,7 +9300,7 @@ ppc_elf_relocate_section (bfd *output_bfd,
 
   bfd_set_error (bfd_error_invalid_operation);
   ret = FALSE;
-  continue;
+  goto copy_reloc;
  }
 
       /* Do any further special processing.  */
@@ -9342,7 +9360,8 @@ ppc_elf_relocate_section (bfd *output_bfd,
        that make up part of the insn opcode.  */
     unsigned int insn, mask, lobit;
 
-    insn = bfd_get_32 (output_bfd, contents + rel->r_offset - d_offset);
+    insn = bfd_get_32 (output_bfd,
+       contents + rel->r_offset - d_offset);
     mask = 0;
     if (is_insn_ds_form (insn))
       mask = 3;
@@ -9452,6 +9471,31 @@ ppc_elf_relocate_section (bfd *output_bfd,
       ret = FALSE;
     }
  }
+    copy_reloc:
+      if (wrel != rel)
+ *wrel = *rel;
+    }
+
+  if (wrel != rel)
+    {
+      Elf_Internal_Shdr *rel_hdr;
+      size_t deleted = rel - wrel;
+
+      rel_hdr = _bfd_elf_single_rel_hdr (input_section->output_section);
+      rel_hdr->sh_size -= rel_hdr->sh_entsize * deleted;
+      if (rel_hdr->sh_size == 0)
+ {
+  /* It is too late to remove an empty reloc section.  Leave
+     one NONE reloc.
+     ??? What is wrong with an empty section???  */
+  rel_hdr->sh_size = rel_hdr->sh_entsize;
+  deleted -= 1;
+  wrel++;
+ }
+      relend = wrel;
+      rel_hdr = _bfd_elf_single_rel_hdr (input_section);
+      rel_hdr->sh_size -= rel_hdr->sh_entsize * deleted;
+      input_section->reloc_count -= deleted;
     }
 
 #ifdef DEBUG
diff --git a/bfd/elf64-ppc.c b/bfd/elf64-ppc.c
index 0a85ab8..f491a09 100644
--- a/bfd/elf64-ppc.c
+++ b/bfd/elf64-ppc.c
@@ -13162,6 +13162,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   Elf_Internal_Rela *rel;
+  Elf_Internal_Rela *wrel;
   Elf_Internal_Rela *relend;
   Elf_Internal_Rela outrel;
   bfd_byte *loc;
@@ -13193,9 +13194,9 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   sym_hashes = elf_sym_hashes (input_bfd);
   is_opd = ppc64_elf_section_data (input_section)->sec_type == sec_opd;
 
-  rel = relocs;
+  rel = wrel = relocs;
   relend = relocs + input_section->reloc_count;
-  for (; rel < relend; rel++)
+  for (; rel < relend; wrel++, rel++)
     {
       enum elf_ppc64_reloc_type r_type;
       bfd_vma addend;
@@ -13219,10 +13220,13 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       struct ppc_stub_hash_entry *stub_entry;
       bfd_vma max_br_offset;
       bfd_vma from;
-      const Elf_Internal_Rela orig_rel = *rel;
+      Elf_Internal_Rela orig_rel;
       reloc_howto_type *howto;
       struct reloc_howto_struct alt_howto;
 
+    again:
+      orig_rel = *rel;
+
       r_type = ELF64_R_TYPE (rel->r_info);
       r_symndx = ELF64_R_SYM (rel->r_info);
 
@@ -13230,10 +13234,10 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  symbol of the previous ADDR64 reloc.  The symbol gives us the
  proper TOC base to use.  */
       if (rel->r_info == ELF64_R_INFO (0, R_PPC64_TOC)
-  && rel != relocs
-  && ELF64_R_TYPE (rel[-1].r_info) == R_PPC64_ADDR64
+  && wrel != relocs
+  && ELF64_R_TYPE (wrel[-1].r_info) == R_PPC64_ADDR64
   && is_opd)
- r_symndx = ELF64_R_SYM (rel[-1].r_info);
+ r_symndx = ELF64_R_SYM (wrel[-1].r_info);
 
       sym = NULL;
       sec = NULL;
@@ -13314,13 +13318,27 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       h = (struct ppc_link_hash_entry *) h_elf;
 
       if (sec != NULL && discarded_section (sec))
- RELOC_AGAINST_DISCARDED_SECTION (info, input_bfd, input_section,
- rel, 1, relend,
- ppc64_elf_howto_table[r_type], 0,
- contents);
+ {
+  _bfd_clear_contents (ppc64_elf_howto_table[r_type],
+       input_bfd, input_section,
+       contents + rel->r_offset);
+  wrel->r_offset = rel->r_offset;
+  wrel->r_info = 0;
+  wrel->r_addend = 0;
+
+  /* For ld -r, remove relocations in debug sections against
+     sections defined in discarded sections.  Not done for
+     non-debug to preserve relocs in .eh_frame which the
+     eh_frame editing code expects to be present.  */
+  if (bfd_link_relocatable (info)
+      && (input_section->flags & SEC_DEBUGGING))
+    wrel--;
+
+  continue;
+ }
 
       if (bfd_link_relocatable (info))
- continue;
+ goto copy_reloc;
 
       if (h != NULL && &h->elf == htab->elf.hgot)
  {
@@ -13480,10 +13498,12 @@ ppc64_elf_relocate_section (bfd *output_bfd,
       && (tls_mask & TLS_TPREL) == 0)
     {
     toctprel:
-      insn = bfd_get_32 (output_bfd, contents + rel->r_offset - d_offset);
+      insn = bfd_get_32 (output_bfd,
+ contents + rel->r_offset - d_offset);
       insn &= 31 << 21;
       insn |= 0x3c0d0000; /* addis 0,13,0 */
-      bfd_put_32 (output_bfd, insn, contents + rel->r_offset - d_offset);
+      bfd_put_32 (output_bfd, insn,
+  contents + rel->r_offset - d_offset);
       r_type = R_PPC64_TPREL16_HA;
       if (toc_symndx != 0)
  {
@@ -13491,8 +13511,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   rel->r_addend = toc_addend;
   /* We changed the symbol.  Start over in order to
      get h, sym, sec etc. right.  */
-  rel--;
-  continue;
+  goto again;
  }
       else
  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
@@ -13518,8 +13537,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   rel->r_addend = toc_addend;
   /* We changed the symbol.  Start over in order to
      get h, sym, sec etc. right.  */
-  rel--;
-  continue;
+  goto again;
  }
       else
  rel->r_info = ELF64_R_INFO (r_symndx, r_type);
@@ -13658,8 +13676,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  {
   /* We changed the symbol.  Start over in order
      to get h, sym, sec etc. right.  */
-  rel--;
-  continue;
+  goto again;
  }
     }
   break;
@@ -13703,10 +13720,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  }
       bfd_put_32 (output_bfd, insn2, contents + offset);
       if ((tls_mask & TLS_TPRELGD) == 0 && toc_symndx != 0)
- {
-  rel--;
-  continue;
- }
+ goto again;
     }
   break;
 
@@ -13748,8 +13762,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   insn2 = NOP;
  }
       bfd_put_32 (output_bfd, insn2, contents + offset);
-      rel--;
-      continue;
+      goto again;
     }
   break;
 
@@ -14091,7 +14104,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
    && addend == 0)
     {
       bfd_put_32 (output_bfd, NOP, contents + rel->r_offset);
-      continue;
+      goto copy_reloc;
     }
   break;
  }
@@ -14107,7 +14120,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
 
   bfd_set_error (bfd_error_bad_value);
   ret = FALSE;
-  continue;
+  goto copy_reloc;
 
  case R_PPC64_NONE:
  case R_PPC64_TLS:
@@ -14116,7 +14129,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  case R_PPC64_TOCSAVE:
  case R_PPC64_GNU_VTINHERIT:
  case R_PPC64_GNU_VTENTRY:
-  continue;
+  goto copy_reloc;
 
   /* GOT16 relocations.  Like an ADDR16 using the symbol's
      address in the GOT as relocation value instead of the
@@ -14752,7 +14765,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
 
   bfd_set_error (bfd_error_invalid_operation);
   ret = FALSE;
-  continue;
+  goto copy_reloc;
  }
 
       /* Multi-instruction sequences that access the TOC can be
@@ -14901,7 +14914,7 @@ ppc64_elf_relocate_section (bfd *output_bfd,
  mask + 1);
       bfd_set_error (bfd_error_bad_value);
       ret = FALSE;
-      continue;
+      goto copy_reloc;
     }
   break;
  }
@@ -15000,6 +15013,29 @@ ppc64_elf_relocate_section (bfd *output_bfd,
   if (more_info != NULL)
     free (more_info);
  }
+    copy_reloc:
+      if (wrel != rel)
+ *wrel = *rel;
+    }
+
+  if (wrel != rel)
+    {
+      Elf_Internal_Shdr *rel_hdr;
+      size_t deleted = rel - wrel;
+
+      rel_hdr = _bfd_elf_single_rel_hdr (input_section->output_section);
+      rel_hdr->sh_size -= rel_hdr->sh_entsize * deleted;
+      if (rel_hdr->sh_size == 0)
+ {
+  /* It is too late to remove an empty reloc section.  Leave
+     one NONE reloc.
+     ??? What is wrong with an empty section???  */
+  rel_hdr->sh_size = rel_hdr->sh_entsize;
+  deleted -= 1;
+ }
+      rel_hdr = _bfd_elf_single_rel_hdr (input_section);
+      rel_hdr->sh_size -= rel_hdr->sh_entsize * deleted;
+      input_section->reloc_count -= deleted;
     }
 
   /* If we're emitting relocations, then shortly after this function

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

H.J. Lu-30
On Mon, Nov 9, 2015 at 10:51 PM, Alan Modra <[hidden email]> wrote:

> On Mon, Nov 09, 2015 at 10:04:42AM +0100, Romain Geissler wrote:
>> Indeed, much better idea. I will use your patch and apply it to x86/64 as
>> well.
>
> On checking I found I'd made a mistake in the previous patch, so
> decided to go with the safer approach of copying relocs at the end of
> the loop.  Also, I'm not renaming "rel" to "rrel".
>
>         * elf64-ppc.c (ppc64_elf_relocate_section): Use read and write
>         pointers to reloc array, rather than memmove when deleting a
>         reloc.  Don't use RELOC_AGAINST_DISCARDED_SECTION.  Adjust
>         reloc counts at end of loop.
>         * elf32-ppc.c (ppc_elf_relocate_section): Likewise.
>

I checked in a similar, but a little different, patch to fix i386 and
x86-64.


--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Alan Modra-3
On Tue, Nov 10, 2015 at 12:24:32AM -0800, H.J. Lu wrote:

> On Mon, Nov 9, 2015 at 10:51 PM, Alan Modra <[hidden email]> wrote:
> > On Mon, Nov 09, 2015 at 10:04:42AM +0100, Romain Geissler wrote:
> >> Indeed, much better idea. I will use your patch and apply it to x86/64 as
> >> well.
> >
> > On checking I found I'd made a mistake in the previous patch, so
> > decided to go with the safer approach of copying relocs at the end of
> > the loop.  Also, I'm not renaming "rel" to "rrel".
> >
> >         * elf64-ppc.c (ppc64_elf_relocate_section): Use read and write
> >         pointers to reloc array, rather than memmove when deleting a
> >         reloc.  Don't use RELOC_AGAINST_DISCARDED_SECTION.  Adjust
> >         reloc counts at end of loop.
> >         * elf32-ppc.c (ppc_elf_relocate_section): Likewise.
> >
>
> I checked in a similar, but a little different, patch to fix i386 and
> x86-64.

You deleted a comment line, and need to look at all the "continue"
statements.  I see one that must copy relocs, and more that should
copy *or* you should remove the copying code at the end of the loop.

To clarify, I decided to add what is dead code to the powerpc backend
in case we ever remove relocs when !relocatable.  The only copying
that is really needed is this one:

      if (bfd_link_relocatable (info))
        {
          if (wrel != rel)
            *wrel = *rel;
          continue;
        }

and you need to add the missing VTINHERIT/VTENTRY copy..

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

H.J. Lu-30
On Tue, Nov 10, 2015 at 1:17 AM, Alan Modra <[hidden email]> wrote:

> On Tue, Nov 10, 2015 at 12:24:32AM -0800, H.J. Lu wrote:
>> On Mon, Nov 9, 2015 at 10:51 PM, Alan Modra <[hidden email]> wrote:
>> > On Mon, Nov 09, 2015 at 10:04:42AM +0100, Romain Geissler wrote:
>>>> Indeed, much better idea. I will use your patch and apply it to x86/64 as
>> >> well.
>> >
>> > On checking I found I'd made a mistake in the previous patch, so
>> > decided to go with the safer approach of copying relocs at the end of
>> > the loop.  Also, I'm not renaming "rel" to "rrel".
>> >
>> >         * elf64-ppc.c (ppc64_elf_relocate_section): Use read and write
>> >         pointers to reloc array, rather than memmove when deleting a
>> >         reloc.  Don't use RELOC_AGAINST_DISCARDED_SECTION.  Adjust
>> >         reloc counts at end of loop.
>> >         * elf32-ppc.c (ppc_elf_relocate_section): Likewise.
>> >
>>
>> I checked in a similar, but a little different, patch to fix i386 and
>> x86-64.
>
> You deleted a comment line, and need to look at all the "continue"
> statements.  I see one that must copy relocs, and more that should
> copy *or* you should remove the copying code at the end of the loop.
>
> To clarify, I decided to add what is dead code to the powerpc backend
> in case we ever remove relocs when !relocatable.  The only copying
> that is really needed is this one:
>
>       if (bfd_link_relocatable (info))
>         {
>           if (wrel != rel)
>             *wrel = *rel;
>           continue;
>         }

I have checked a patch into x86 backends,

> and you need to add the missing VTINHERIT/VTENTRY copy..
>

Done,

Thanks,

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: [Proposed patch] Huge performance regression in ld -r since binutils >= 2.21

Rich Felker-2
In reply to this post by Alan Modra-3
On Mon, Nov 09, 2015 at 09:11:07PM +1030, Alan Modra wrote:
> - Sections are combined in ways that lose code locality.  eg. with
>   -ffunction-sections, all C static functions called "setup" will be
>   placed together.

I actually wondered/worried what happens when static functions with
the same name from multiple files are used with -ffunction-sections
and --gc-sections, when some are used but others are not. Would it
perhaps make sense for gcc to attach some sort of hash based on things
like the filename and function contents to the section name so that
these section names are unique? That would probably solve a lot of the
problems with ld -r, too, assuming -ffunction-sections is used.

Of course ld -r could also have an option to create _new_ sections
like this rather than merging sections.

Rich