[PATCH] Optimize reading ELF objects with many group sections

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Optimize reading ELF objects with many group sections

Jens Widell
When processing a section that is a member of a group, the group
that contains it is looked up using a linear search. The resulting
O(n^2) complexity causes significant performance issues when
dealing with object files with very many groups.

By remembering the index of the last found group and restarting
the next search from that index, the search instead becomes O(n)
in common cases.

Signed-off-by: Jens Widell <[hidden email]>
---
Background: This turned out to be a significant issue when building
Chromium with debug fission, which means clang calls objcopy twice at
the end of a compilation to extract debug information into a separate
file, and unity building, which combines many source files into a
single compilation unit to reduce overall build time, thus procuding
some very large object files with many sections.

 bfd/elf-bfd.h |  1 +
 bfd/elf.c     | 15 ++++++++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
index 9c9b0c7..07879e5 100644
--- a/bfd/elf-bfd.h
+++ b/bfd/elf-bfd.h
@@ -1908,6 +1908,7 @@ struct elf_obj_tdata
 
   Elf_Internal_Shdr **group_sect_ptr;
   int num_group;
+  unsigned int group_search_offset;
 
   unsigned int symtab_section, dynsymtab_section;
   unsigned int dynversym_section, dynverdef_section, dynverref_section;
diff --git a/bfd/elf.c b/bfd/elf.c
index 9f44ff9..d7d084e 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -737,10 +737,17 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
 
   if (num_group != (unsigned) -1)
     {
-      unsigned int i;
+      unsigned int search_offset = elf_tdata (abfd)->group_search_offset;
+      unsigned int j;
 
-      for (i = 0; i < num_group; i++)
+      for (j = 0; j < num_group; j++)
  {
+  /* Begin search from previous found group. */
+  unsigned i = j + search_offset;
+
+  if (i >= num_group)
+    i -= num_group;
+
   Elf_Internal_Shdr *shdr = elf_tdata (abfd)->group_sect_ptr[i];
   Elf_Internal_Group *idx;
   bfd_size_type n_elt;
@@ -803,7 +810,9 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
  if (shdr->bfd_section != NULL)
   elf_next_in_group (shdr->bfd_section) = newsect;
 
- i = num_group - 1;
+ elf_tdata (abfd)->group_search_offset = i;
+
+ j = num_group - 1;
  break;
       }
  }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize reading ELF objects with many group sections

Nick Clifton
Hi Jens,

  A couple of comments on the patch:

  * It would be helpful if you could include a changelog
    entry along with the patch itself.

  * Please can you tell us how you tested the patch, and if
    you noticed any regressions.  (I do not expect that there
    were any, but you never know).  In particular a patch like
    this that affects generic code needs to be tested with a
    wide range of targets to make sure that it does not break
    anything.

  * Also...

> +++ b/bfd/elf-bfd.h
> @@ -1908,6 +1908,7 @@ struct elf_obj_tdata
>  
>    Elf_Internal_Shdr **group_sect_ptr;
>    int num_group;
> +  unsigned int group_search_offset;

  I think that it would be helpful to include a comment by
  this field explaining what it is for.


> +  /* Begin search from previous found group. */
> +  unsigned i = j + search_offset;
> +
> +  if (i >= num_group)
> +    i -= num_group;

  I found this to be a little bit confusing when I was reading
  the patch for the first time.  Maybe this would be clearer ?

          unsigned int i = (j + search_offset) % num_group;

  Of course since the intent is to optimise this loop for speed
  you may object to the presence of the modulo operator, so I
  leave the choice up to you.

Cheers
  Nick


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize reading ELF objects with many group sections

Jens Widell
On Mon, Jan 8, 2018 at 11:23 AM, Nick Clifton <[hidden email]> wrote:
> Hi Jens,
>
>   A couple of comments on the patch:
>
>   * It would be helpful if you could include a changelog
>     entry along with the patch itself.

Will do that in the next revision.

>   * Please can you tell us how you tested the patch, and if
>     you noticed any regressions.  (I do not expect that there
>     were any, but you never know).  In particular a patch like
>     this that affects generic code needs to be tested with a
>     wide range of targets to make sure that it does not break
>     anything.

As I was investigating a performance issue, my manual testing has been
somewhat focused on performance. I've run `make check` (on a 64-bit
Linux) with and without my patch, and the output was identical.

Some pointers to other useful testing I should do would be helpful.
First-time contributor here, so I'm not exactly familiar with the
process.

>   * Also...
>
>> +++ b/bfd/elf-bfd.h
>> @@ -1908,6 +1908,7 @@ struct elf_obj_tdata
>>
>>    Elf_Internal_Shdr **group_sect_ptr;
>>    int num_group;
>> +  unsigned int group_search_offset;
>
>   I think that it would be helpful to include a comment by
>   this field explaining what it is for.

Will add.

>> +       /* Begin search from previous found group. */
>> +       unsigned i = j + search_offset;
>> +
>> +       if (i >= num_group)
>> +         i -= num_group;
>
>   I found this to be a little bit confusing when I was reading
>   the patch for the first time.  Maybe this would be clearer ?
>
>           unsigned int i = (j + search_offset) % num_group;
>
>   Of course since the intent is to optimise this loop for speed
>   you may object to the presence of the modulo operator, so I
>   leave the choice up to you.

That happens to be exactly what I had in the first iteration of my patch. :-)

IIRC my next profiling run showed that division instruction as
"significant", and thought someone might comment on it in the context
of an optimization patch, so changed to the current code. But overall,
I like the code with a modulo operator better, and I doubt the
difference in performance, if there even was one, matters in practice.
I'll change back.

--
Jens
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Optimize reading ELF objects with many group sections

Nick Clifton
Hi Jens,

> Some pointers to other useful testing I should do would be helpful.
> First-time contributor here, so I'm not exactly familiar with the
> process.

No worries - thank you for taking the time to pursue contributing this patch.

As for testing I would recommend building and testing a 32-bit target, in
order to make sure that there are no problems relating to bit sizes.  Eg:

  % <path-to-binutils-sources>/configure --quiet CFLAGS="-m32"
  % make all-gas all-ld all-binutils
  % make check

Another useful test is with a toolchain configured for all targets as this
can often throw up issues with other architectures that do not show up with
your primary architecture:

  % <path-to-binutils-source>/configure --enable-64-bit-bfd --enable-targets=all
  % make all-gas all-ld all-binutils
  % make check

Note - I am not actually expecting you to find any issues with the patch,
it looks perfectly fine to me.  But it is good practice to always test,
even when you think that something looks innocuous.


>>           unsigned int i = (j + search_offset) % num_group;
>>
>>   Of course since the intent is to optimise this loop for speed
>>   you may object to the presence of the modulo operator, so I
>>   leave the choice up to you.
>
> That happens to be exactly what I had in the first iteration of my patch. :-)

Great minds ... :-)

> IIRC my next profiling run showed that division instruction as
> "significant", and thought someone might comment on it in the context
> of an optimization patch, so changed to the current code. But overall,
> I like the code with a modulo operator better, and I doubt the
> difference in performance, if there even was one, matters in practice.
> I'll change back.

Fair enough.  I would hope that a sufficiently clever compiler would
be able to replace the modulo operator with a subtraction anyway, so
performance should not suffer.

Cheers
  Nick
Reply | Threaded
Open this post in threaded view
|

[PATCH v2] Optimize reading ELF objects with many group sections

Jens Widell
When processing a section that is a member of a group, the group
that contains it is looked up using a linear search. The resulting
O(n^2) complexity causes significant performance issues when
dealing with object files with very many groups.

By remembering the index of the last found group and restarting
the next search from that index, the search instead becomes O(n)
in common cases.

Signed-off-by: Jens Widell <[hidden email]>
---
Changes from previous version:
 - added changelog entry
 - added comment in elf-bfd.h
 - use modulo operator for index calculation

Tests run with default build on 64-bit Linux, with CFLAGS=-m32 and
with "./configure --enable-64-bit-bfd --enable-targets=all".

 bfd/ChangeLog |  5 +++++
 bfd/elf-bfd.h |  4 ++++
 bfd/elf.c     | 12 +++++++++---
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/bfd/ChangeLog b/bfd/ChangeLog
index 7803ef8..44a22c1 100644
--- a/bfd/ChangeLog
+++ b/bfd/ChangeLog
@@ -1,3 +1,8 @@
+2018-01-08  Jens Widell  <[hidden email]>
+
+ * elf.c (setup_group): Optimize search for group by remembering
+ last found group and restarting search at that index.
+
 2018-01-03  John Baldwin  <[hidden email]>
 
  * elf.c (elfcore_grok_freebsd_note): Handle
diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
index 9c9b0c7..3136b22 100644
--- a/bfd/elf-bfd.h
+++ b/bfd/elf-bfd.h
@@ -1909,6 +1909,10 @@ struct elf_obj_tdata
   Elf_Internal_Shdr **group_sect_ptr;
   int num_group;
 
+  /* Index into group_sect_ptr, updated by setup_group when finding a
+     section's group.  Used to optimize subsequent group searches. */
+  unsigned int group_search_offset;
+
   unsigned int symtab_section, dynsymtab_section;
   unsigned int dynversym_section, dynverdef_section, dynverref_section;
 
diff --git a/bfd/elf.c b/bfd/elf.c
index 9f44ff9..2088e4b 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -737,10 +737,14 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
 
   if (num_group != (unsigned) -1)
     {
-      unsigned int i;
+      unsigned int search_offset = elf_tdata (abfd)->group_search_offset;
+      unsigned int j;
 
-      for (i = 0; i < num_group; i++)
+      for (j = 0; j < num_group; j++)
  {
+  /* Begin search from previous found group. */
+  unsigned i = (j + search_offset) % num_group;
+
   Elf_Internal_Shdr *shdr = elf_tdata (abfd)->group_sect_ptr[i];
   Elf_Internal_Group *idx;
   bfd_size_type n_elt;
@@ -803,7 +807,9 @@ setup_group (bfd *abfd, Elf_Internal_Shdr *hdr, asection *newsect)
  if (shdr->bfd_section != NULL)
   elf_next_in_group (shdr->bfd_section) = newsect;
 
- i = num_group - 1;
+ elf_tdata (abfd)->group_search_offset = i;
+
+ j = num_group - 1;
  break;
       }
  }
--
2.7.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH v2] Optimize reading ELF objects with many group sections

Nick Clifton
Hi Jens,

  Thanks for updating the patch.  I have now applied it, with one small change:

> +2018-01-08  Jens Widell  <[hidden email]>
> +
> + * elf.c (setup_group): Optimize search for group by remembering
> + last found group and restarting search at that index.

I added a line about the patch to elf-bfd.h to the ChangeLog entry.

Cheers
  Nick