[PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Chung-Lin Tang
Hi, this patch is our attempt at resolving the slow shared object sorting
situation in #17645, #15310, and some effort at #15311.  I realize this is
pretty unsuitable timing to be submitting a patch of such nature now (probably
way too late to be included into 2.30), but still sending now anyways as this
will probably need quite some discussion before being approved.

Prior attempts at solving this slow sorting behavior appeared to have failed
due to inadequate proposed testing, therefore cannot convince reviewers to
touch what seems to be perceived as a sensitive and easy to break part of ld.so.

Therefore the first part of this patch is not a change to the dynamic loader
code proper, but a testing framework for constructing DSO sorting tests.
It consists of a new Python script 'dso-ordering-test.py' that serves to
generate both testcase source files and the needed Makefile fragments from
a short description string, for example:

     a->b->c->d          // four objects linked one after another

     a->[bc]->d;b->c     // a depends on b and c, which both depend on d,
                         // b depends on c (b,c linked to object a in fixed order)

     a->b->c;{+a;%a;-a}  // a, b, c serially dependent, main program uses
                         // dlopen/dlsym/dlclose on object a

     a->b->c;{}!->[abc]  // a, b, c serially dependent; multiple tests generated
                         // to test all permutations of a, b, c ordering linked
                         // to main program

    (Above is just a short description of what the script can do, more
     documentation is in the script comments.)

and, a patch to glibc/elf/Makefile which uses this script to add a few
DSO sorting testcases.  The description string notation and output form of the
generated testcases is short enough that both the test descriptions
and expected outcomes can all directly be specified in the Makefile.

In terms of the tests I added using this script, I am not completely sure they are
(together with existing tests) adequate to prove algorithmic integrity in face
of any ld.so code changes, but the script should provide a solid tool to further
improve on coverage.  Also welcome suggestions if the current features are still
lacking in expressing some case of shared object relations, or if the documentation
still feels unclear.

Thanks,
Chung-Lin

2019-07-20  Chung-Lin Tang  <[hidden email]>

         [BZ #17645]
         [BZ #15311]
         [BZ #15310]
         * elf/Makefile (test_dso_ordering): New make function.
         (tst-dso-ordering[123456789]): Define new DSO sorting tests.
         (tst-bz15311): Testcase from #15311.
         * scripts/dso-ordering-test.py: New script.

dlsort-01-tests.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Florian Weimer-5
* Chung-Lin Tang:

> Hi, this patch is our attempt at resolving the slow shared object sorting
> situation in #17645, #15310, and some effort at #15311.  I realize this is
> pretty unsuitable timing to be submitting a patch of such nature now (probably
> way too late to be included into 2.30), but still sending now anyways as this
> will probably need quite some discussion before being approved.
>
> Prior attempts at solving this slow sorting behavior appeared to have failed
> due to inadequate proposed testing, therefore cannot convince reviewers to
> touch what seems to be perceived as a sensitive and easy to break part of ld.so.

Thanks for working on this!

>         * elf/Makefile (test_dso_ordering): New make function.
>         (tst-dso-ordering[123456789]): Define new DSO sorting tests.
>         (tst-bz15311): Testcase from #15311.
>         * scripts/dso-ordering-test.py: New script.

I have not looked at the script in detail yet, sorry.  But I have some
early questions.

Is => intended to cover the case of run-time dependencies added late due
to lazy binding?

Currently, those late dependencies have two effects, I think: They keep
around the referenced libraries longer than before (so that dlclose
would not remove an object which is still in used solely due to lazy
binding).  And the ELF destructors are reordered to reflect these added
run-time dependencies.

Can your test framework test both cases?  What's your position on the
second effect?  I think it sometimes results in destructors running not
in the opposite order of constructors, due to the new topological sort.
(This also happens with the current implementation.)

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Chung-Lin Tang
On 2019/7/23 9:21 PM, Florian Weimer wrote:
> Is => intended to cover the case of run-time dependencies added late due
> to lazy binding?
>
> Currently, those late dependencies have two effects, I think: They keep
> around the referenced libraries longer than before (so that dlclose
> would not remove an object which is still in used solely due to lazy
> binding).  And the ELF destructors are reordered to reflect these added
> run-time dependencies.

Yes, you can test that. The effect of => is to create a caller/callee relation
between objects: 'x=>y' creates fn_x() and fn_y() in those two DSOs,
and fn_x() has a call to fn_y().

Though that's the only immediate effect that => has. To construct a test of
run-time added dependencies related to dlopen/etc. you also need to add those
operations inside the '{}' construct.

All the created DSOs have a constructor/destructor that outputs their single
character name. The generated main() program prints '[]' brackets after
dlopen/dlclose calls to separate out the following constructor/destructor output.
So taken whole, the entire output string should capture all constructor/destructor
activity and ordering behavior.

> Can your test framework test both cases?  What's your position on the
> second effect?  I think it sometimes results in destructors running not
> in the opposite order of constructors, due to the new topological sort.
> (This also happens with the current implementation.)

What I did in the ld.so code patch was add a second pass of sorting that ignores
runtime deps, prioritizing link dependencies; this appears to also be what prior
discussion pointed towards, see more details in that 2nd email with the actual
code patch.

I have attached an updated patch here; fixed some bugs in the script related to
the '@' operator for the main program construct.

Thanks,
Chung-Lin

dlsort-01-tests-v2.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Florian Weimer-5
* Chung-Lin Tang:

> On 2019/7/23 9:21 PM, Florian Weimer wrote:
>> Is => intended to cover the case of run-time dependencies added late due
>> to lazy binding?
>>
>> Currently, those late dependencies have two effects, I think: They keep
>> around the referenced libraries longer than before (so that dlclose
>> would not remove an object which is still in used solely due to lazy
>> binding).  And the ELF destructors are reordered to reflect these added
>> run-time dependencies.
>
> Yes, you can test that. The effect of => is to create a caller/callee
> relation between objects: 'x=>y' creates fn_x() and fn_y() in those
> two DSOs, and fn_x() has a call to fn_y().
>
> Though that's the only immediate effect that => has. To construct a
> test of run-time added dependencies related to dlopen/etc. you also
> need to add those operations inside the '{}' construct.
>
> All the created DSOs have a constructor/destructor that outputs their
> single character name. The generated main() program prints '[]'
> brackets after dlopen/dlclose calls to separate out the following
> constructor/destructor output.  So taken whole, the entire output
> string should capture all constructor/destructor activity and ordering
> behavior.

I see, thanks.

>> Can your test framework test both cases?  What's your position on the
>> second effect?  I think it sometimes results in destructors running not
>> in the opposite order of constructors, due to the new topological sort.
>> (This also happens with the current implementation.)
>
> What I did in the ld.so code patch was add a second pass of sorting
> that ignores runtime deps, prioritizing link dependencies; this
> appears to also be what prior discussion pointed towards, see more
> details in that 2nd email with the actual code patch.

I wonder if it makes sense to disentangle this (desirable) functional
change from the rest (which sould be purely an optimization).

Is it even necessary to re-sort on dlclose?  Is the original dependency
order available somewhere?  Then we could make it explicit that the
destructor order is the reverse of the constructor order (for the
objects unloaded).  Or is there a corner case which causes an expected
divergence?

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Chung-Lin Tang
On 2019/7/29 5:48 PM, Florian Weimer wrote:
> * Chung-Lin Tang:

>>> Can your test framework test both cases?  What's your position on the
>>> second effect?  I think it sometimes results in destructors running not
>>> in the opposite order of constructors, due to the new topological sort.
>>> (This also happens with the current implementation.)
>>
>> What I did in the ld.so code patch was add a second pass of sorting
>> that ignores runtime deps, prioritizing link dependencies; this
>> appears to also be what prior discussion pointed towards, see more
>> details in that 2nd email with the actual code patch.
>
> I wonder if it makes sense to disentangle this (desirable) functional
> change from the rest (which sould be purely an optimization).

By "functional change" here, are you referring to the testing framework,
or the described ld.so destructor behavior I described above?

> Is it even necessary to re-sort on dlclose?  Is the original dependency
> order available somewhere?  Then we could make it explicit that the
> destructor order is the reverse of the constructor order (for the
> objects unloaded).  Or is there a corner case which causes an expected
> divergence?

Dynamic loaded objects could add more relocation dependencies, and thus augment
the dependency relations (by adding more constraints), so a final sort should
still be required.

Thanks,
Chung-Lin
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Florian Weimer-5
* Chung-Lin Tang:

> On 2019/7/29 5:48 PM, Florian Weimer wrote:
>> * Chung-Lin Tang:
>
>>>> Can your test framework test both cases?  What's your position on the
>>>> second effect?  I think it sometimes results in destructors running not
>>>> in the opposite order of constructors, due to the new topological sort.
>>>> (This also happens with the current implementation.)
>>>
>>> What I did in the ld.so code patch was add a second pass of sorting
>>> that ignores runtime deps, prioritizing link dependencies; this
>>> appears to also be what prior discussion pointed towards, see more
>>> details in that 2nd email with the actual code patch.
>>
>> I wonder if it makes sense to disentangle this (desirable) functional
>> change from the rest (which sould be purely an optimization).
>
> By "functional change" here, are you referring to the testing framework,
> or the described ld.so destructor behavior I described above?

The destructor behavior.

>> Is it even necessary to re-sort on dlclose?  Is the original dependency
>> order available somewhere?  Then we could make it explicit that the
>> destructor order is the reverse of the constructor order (for the
>> objects unloaded).  Or is there a corner case which causes an expected
>> divergence?
>
> Dynamic loaded objects could add more relocation dependencies, and
> thus augment the dependency relations (by adding more constraints), so
> a final sort should still be required.

Yes, these dynamically added relocation dependencies could mean that
fewer objects than had been loaded by the dlopen can be freed with
dlclose.  But if we disregard those relocation dependencies for
destructor order sorting, wouldn't be the sorted result equivalent to
the constructor order?

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 1/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader

Chung-Lin Tang
On 2019/8/5 6:45 PM, Florian Weimer wrote:
>>>> What I did in the ld.so code patch was add a second pass of sorting
>>>> that ignores runtime deps, prioritizing link dependencies; this
>>>> appears to also be what prior discussion pointed towards, see more
>>>> details in that 2nd email with the actual code patch.
>>> I wonder if it makes sense to disentangle this (desirable) functional
>>> change from the rest (which sould be purely an optimization).
>> By "functional change" here, are you referring to the testing framework,
>> or the described ld.so destructor behavior I described above?
> The destructor behavior.

Well, I'm definitely not suggesting adding the two-pass sorting described
above with the current sorting algorithm (even if it should be relatively
straightforward to do so)

The entire #17645 issue is due to the current algorithm becoming prohibitively
slow in certain pathological cases. Trying to fix the destructor behavior that
way without replacing the current sorting algorithm will greatly exacerbate
the performance problem.

>>> Is it even necessary to re-sort on dlclose?  Is the original dependency
>>> order available somewhere?  Then we could make it explicit that the
>>> destructor order is the reverse of the constructor order (for the
>>> objects unloaded).  Or is there a corner case which causes an expected
>>> divergence?
>> Dynamic loaded objects could add more relocation dependencies, and
>> thus augment the dependency relations (by adding more constraints), so
>> a final sort should still be required.
> Yes, these dynamically added relocation dependencies could mean that
> fewer objects than had been loaded by the dlopen can be freed with
> dlclose.  But if we disregard those relocation dependencies for
> destructor order sorting, wouldn't be the sorted result equivalent to
> the constructor order?

Relocation dependencies are not completely disregarded during destruction,
just that they're prioritized lower than static link dependencies (when
dependence cycles cause ambiguity in determining a single ordering), hence
the two passes of sorting. Besides that, dlopen'ed but not dlclose'd objects
also need to be processed along, so any existing already-computed ordering
is probably not enough in the general case.

Chung-Lin