[PATCH 2/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 2/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Chung-Lin Tang
This part is the actual code changes. While the past attempts appeared to
be either (1) more sophisticated graph algorithms, with attempts to add
Tarjan SCC, or (2) modifying of heuristics to augment the old algorithm to
behave more reasonably, here I have basically adhered to the KISS principle.

The main algorithm here is simply depth-first search (DFS) to obtain the
Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1
bitfield is added to struct link_map to more elegantly facilitate such a
search.

I also have experimented with doing an iterative version, but it is obviously
harder to understand, and actually slower when measured by hp-timing.h
facilities. I have chosen to use simple recursive DFS, for clarity and
performance (Some measures were however taken to curb recursion depth)

The DFS algorithm is applied to the input maps[nmap-1] backwards towards
maps[0]. This has the effect of a more "shallow" recursion depth in general
since the input is in BFS. Also, when combined with the natural order of
processing l_initfini[] at each node, this creates a resulting output sorting
closer to the intuitive "left-to-right" order in most cases.

Per-the discussion in #15311 about relocation dependencies overriding link
dependencies, similar to comments #7,#9 there, a second pass of
link-dependency-only sorting has been added in that case. Detection of
existence of reldeps is done during the first DFS traversal pass, to cull
away unneeded cases of this 2nd sorting pass. This also allows discarding of
the simple limited cycle detection (i.e. X has reldep on Y, Y links to X)
in the current algorithm. A testcase expressing the #15311 case has also been
added.

On the further general issue of circular dependencies causing SCCs across
shared objects, the ELF specification explicitly states that behavior in this
case is undefined, although I have found at least one reference describing
Solaris' behavior here as basically retaining the received original ordering
of those objects [1]. While quite well defined, I'm a little unsure this is
the reasonable behavior, as this seems to imply a single circular dependency
link will nullify correct topological dependency behavior for the majority of
nodes within that SCC.

[1] https://docs.oracle.com/cd/E19957-01/806-0641/6j9vuquip/index.html
     (section "Initialization and Termination Routines")

The Tarjan SCC algorthm has been mentioned multiple times in these related
BZ issues. It could be said that the Tarjan algorithm is a generalization of
post-order DFS traversal; some might say that's an understatement, but the
phases of the node visiting and processing really look analogous. It would be
possible to extend and implement it mostly within the confines of the code of
my patch, but considering the undefined status in the spec, arguably some
ambiguities of proper reasonable behavior, and the much more bookkeeping in
a piece of code that will be repeatedly executed an incredible number of times
across all systems, of which only applies to quite rare cases, I have refrained
from adding that kind of processing in this patch, though such issues may be
revisited later.

Other two notable implementation adjustments related to this _dl_sort_maps()
change are:

(1) The additional pass of sorting in dl_open_worker() right before relocation
has been removed. _dl_map_object_deps() already does a pass of sorting, and
simply collecting objects by that order is adequate. Sorting again in that place
before relocation appears to be redundant.

(2) The use of two char arrays 'used' and 'done' in _dl_close_worker to
represent two per-map attributes has been changed to simply use the two bits in
the 'l_reserved' field in struct link_map to implement.  This also allows
discarding the clunky 'used' array sorting that _dl_sort_maps had to (sometimes)
do along the way.

This patch has been tested on x86_64-linux, powerpc64le-linux, aarch64-linux
without regressions (includes the new tests I've added).
Requesting for comments and discussion, and of course later, approval to apply
to master at appropriate stage.

Thanks,
Chung-Lin

2019-07-20  Chung-Lin Tang  <[hidden email]>

         [BZ #17645]
         [BZ #15311]
         [BZ #15310]
         * elf/dl-close.c (MAP_DONE,MAP_USED,SET_MAP_DONE,SET_MAP_USED):
         New helper macros.
         (_dl_close_worker): Remove used[], done[] char arrays, use above
         macros, add l_reserved init/fini code, adjust call to _dl_sort_maps.
         * elf/dl-deps.c (_dl_map_object_deps): Adjust call to _dl_sort_maps.
         * elf/dl-fini.c (_dl_fini): Likewise.
         * elf/dl-open.c (dl_open_worker): Remove call to _dl_sort_maps,
         adjust surrounding code and comments.
         * elf/dl-sort_maps.c (dfs_traversal): New static function.
         (_dl_sort_maps): New Reverse-Postorder (RPO) based implementation.
         * include/link.h (struct link_map): Add l_visited:1 bitfield.
         * sysdeps/generic/ldsodefs.h (_dl_sort_maps): Adjust declaration.

dlsort-02-changes.patch (21K) Download Attachment