RFC: TLS improvements for IA32 and AMD64/EM64T

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
Over the past few months, I've been working on porting to IA32 and
AMD64/EM64T the interesting bits of the TLS design I came up with for
FR-V, achieving some impressive speedups along with slight code size
reductions in the most common cases.

Although the design is not set in stone yet, it's fully implemented
and functional with patches I'm about to post for binutils, gcc and
glibc mainline, as follow-ups to this message, except that the GCC
patch will go to gcc-patches, as expected.

The specs RFC is attached.  Comments are welcome.


        Thread Local Storage Descriptors for IA32 and AMD64/EM64T

                      Version 0.9.2 - 2005-09-15

     Alexandre Oliva <[hidden email], [hidden email]>


Introduction
============

While porting NPTL to the FR-V architecture, an idea occurred to me
that would enable significant improvements to the General Dynamic and
Local Dynamic access models to thread-local variables.  These methods
are known to be extremely inefficient because of the need to call a
function to obtain the address of a thread-local variable, a function
that can often be quite inefficient.

The reason for calling such a function is that, when code is compiled
for a dynamic library (the cases in which these access models are
used), it is not generally possible to know whether a thread-local
variable is going to be allocated in the Static Thread Local Storage
Block or not.  Dynamic libraries that are loaded at program start up
can have their thread local storage sections assigned to this static
block, since their TLS requirements are all known before the program
is initiated.  Libraries loaded at a later time, however, may need
dynamically-allocated storage for their TLS blocks.

In the former case, the offset from the Thread Pointer, usually held
in a register, to the thread-local variable is going to be the same
for all threads, whereas in the latter case, such offset may vary, and
it may even be necessary to allocate storage for the variable at the
time it is accessed.

Existing implementations of GD and LD access models did not take
advantage of this run-time constant to speed up the case of libraries
loaded before program start up, a case that is certainly the most
common.

Even though libraries can choose more efficient access models, they
can only do so by giving up the ability for the modules that define
such variables to be loaded after program start up, since different
code sequences have to be generated in this case.

The method proposed here doesn't make a compile-time trade off; it
rather decides, at run time, how each variable can be accessed most
efficiently, without any penalty on code or data sizes in case the
variable is defined in an initially-loaded module, and with some
additional data, allocated dynamically, for the case of late-loaded
modules.  In both cases, performance is improved over the traditional
access models.

Another advantage of this novel design for such access models is that
it enables relocations to thread-local variables to be resolved
lazily.


Background
==========

Thread-local storage is organized as follows: for every thread, two
blocks of memory are allocated: a Static TLS block and a Dynamic
Thread Vector.  A thread pointer, normally a reserved register, points
to some fixed location in the Static TLS block, that contains a
pointer to the dynamic thread vector at some fixed location as well.

TLS for the main executable, if it has a TLS section, is also at a
fixed offset from the thread pointer.  Other modules loaded before the
program starts will also have their TLS sections assigned to the
Static TLS block.

The dynamic thread vector starts with a generation count, followed by
pointers to TLS blocks holding thread-specific copies of the TLS
sections of each module.

If modules are loaded at run time, the dynamic thread vector may need
to grow, and the corresponding TLS blocks may have to be allocated
dynamically.  The generation count is used to control whether the DTV
is up-to-date with regard to newly-loaded or unloaded modules,
enabling the reuse of entries without confusion on whether a TLS block
has been created and initialized for a newly-loaded module or whether
that block was used by a module already unloaded, that is still
waiting for deallocation.


Programs can access thread-local variables by using code that follows
4 different models: Local Exec, Initial Exec, General Dynamic and
Local Dynamic.

Local Exec is only applicable when code in the main executable
accesses a thread-local variable also defined in the main executable.
In such cases, the offset from the Thread Pointer to the variable can
be computed by the linker, so the access model consists in simply
adding this offset to the Thread Pointer to compute the address of the
variable.

Initial Exec is applicable when code in the main executable accesses
thread-local variables that are defined in other loadable modules that
are dependencies of this executable.  Since the dynamic loader will
make sure they are loaded before the program can start running, it can
lay out the Static TLS Block taking their TLS sections' sizes and
alignment requirements into account.  Since the Thread Pointer points
into the same location within the Static TLS Block of each thread, the
offset from the Thread Pointer to the variable one wants to access is
constant for all threads.  However, unlike the Local Exec case, it is
not known at link time, but rather at run time.  So, the compiler
arranges for the offset to be loaded from a Global Offset Table entry,
that the linker creates to hold the TP offset of the variable, and
adds that to the TP to access the variable.  The linker also emits a
dynamic relocation for the dynamic loader to fill in the GOT entry
with the correct value.

Although the Initial Exec access model is intended for use in code
generated for executables, some dynamic libraries may use it as well,
when they know the referenced variable is defined in the main
executable or in another initially-loaded dynamic library.  In fact,
if a dynamic library references a thread-local variable defined in
itself, using the Initial Exec access model, it may be giving up the
ability to be loaded at run time, with mechanisms such as the dlopen()
library call.


General Dynamic is the most general access model, in that it poses no
requirement whatsoever on the variable-accessing or the
variable-defining modules.  This access model has traditionally
involved calling a function, typically called __tls_get_addr(), to
obtain the address of a thread-local variable.  This function takes
two pieces of information to compute the address: a module id, an
index into the dynamic thread vector corresponding to the module in
which the variable is defined, and an offset of the variable from the
beginning of the corresponding TLS block.

These two pieces of information are computed by the dynamic loader, in
GOT locations determined by the linker.  If they're adjacent, it's
possible to pass both pieces of information to __tls_get_addr() by
means of a single pointer; otherwise, they're usually passed as two
separate arguments.  In either case, the code sets up arguments for
__tls_get_addr() and then calls it explicitly, using the returned
value as the address of the variable.

The Local Dynamic access model can only be used to access variables
that are local to the module in which they're used, and it only makes
sense when accessing multiple such variables.  The idea is to use a
single call to __tls_get_addr() to compute the base address of the TLS
block for the module.  Then, to access variables, it suffices to add
to this base address the offset corresponding to each variable.  The
offset can be used as a literal in the code, since it's determined by
the linker when it lays out the TLS section of the module.


Overall Design
==============

The optimization proposed herein intends to improve the performance of
dynamic libraries that access thread-local variables using the dynamic
access models, in cases they're loaded before program start up,
without slowing down (or, if possible, speeding up) the performance
when the same libraries are loaded during run time.

The idea is to use TLS descriptors containing a pointer to a function
and an argument.  This uses two words in the GOT, just like the
arguments traditionally passed to __tls_get_addr(), but uses this
space in a very different way.

A single relocation is to be used to compute the value of the two
words of the TLS descriptor.  The dynamic loader, when resolving the
relocation, can determine whether the TLS block used for the module
that defines the TLS symbol is in the static TLS block or not.

If it is in the static TLS block, the offset from the TP to the symbol
is constant for all threads, so it sets the argument in the TLS
descriptor to this constant offset, and sets the function pointer to a
piece of code that simply returns the argument (plus the TP, in an
alternate design).

If the defining module is in dynamically-allocated TLS, it sets the
argument to a dynamically-allocated extended descriptor containing the
arguments passed to __tls_get_addr() and the module's generation
count, and the function pointer to a piece of code that checks whether
the generation count is sufficiently up to date and that the DTV entry
is set.  If so, this piece of code loads the DTV entry, adds the
symbol offset to it, and subtracts the TP value (unless using the
alternate design), returning the result.  Otherwise, it calls
__tls_get_addr(), and subtracts from its return value the TP (unless
using the alternate design).

The functions defined above use custom calling conventions that
require them to preserve any registers they modify.  This penalizes
the case that requires dynamic TLS, since it must preserve all
call-clobbered registers before calling __tls_get_addr(), but it is
optimized for the most common case of static TLS, and also for the
case in which the code generated by the compiler can be relaxed by the
linker to a more efficient access model: being able to assume no
registers are clobbered by the call tends to improve register
allocation.  Also, the function that handles the dynamic TLS case will
most often be able to avoid calling __tls_get_addr(), thus potentially
avoiding the need for preserving registers.


Lazy relocation is accomplished by setting the argument portion of the
TLS descriptor to point to the relocation, and the function pointer to
point to a lazy relocation function.  Some effort is needed to ensure
that the TLS descriptor is modified and accessed atomically, and that
the lazy relocation function can quickly identify the module that
contains the relocation.


This optimized access model thus consists in setting up a pointer to
the TLS descriptor (or, in another alternate design, loading its
contents atomically), then calling the function at the address given
by the pointer to function in the TLS descriptor, passing to it a
pointer to the descriptor itself (or, in an alternate design, the
argument portion of the TLS descriptor).  The value returned by this
call can then be used as an offset from the thread pointer to access
the variable (or as the address of the variable, in the first
alternate design).

The actual code sequences and implementation details for IA32 and
AMD64/EM64T are depicted in the following subsections.


IA32
----

The general dynamic access model used to be:

        leal    variable@TLSGD(,%ebx,1), %eax
        call    ___tls_get_addr@PLT
        # use %eax as the address, or (%eax) to access the value

After the call instruction, %eax holds the address of variable.
variable@TLSGD is resolved by the linker to the GOT offset holding the
data structure passed by reference to ___tls_get_addr(), containing
the TLS module id and the TLS offset, computed by two separate dynamic
relocations.  The odd addressing mode is needed to make the
instruction longer, such that, in case the linker finds out it can
relax the code sequence, the necessary code fits.


The optimized method proposed here uses:

        leal    tval@TLSDESC(%ebx), %eax
        [...] # any other instructions that preserve %eax
        call    *tval@TLSCALL(%eax)
        # add %gs:0 to %eax to compute the address,
        # or use %gs:(%eax) to access the value

Note that this call instruction is actually emitted without an offset,
so it's 4 bytes shorter than it appears to be.  It's equivalent to
`call *(%eax)', that emits a two-byte instruction, but it's annotated
with a relocation that enables the leal and the call to be moved apart
from each other, for better scheduling.  The following new relocations
types are used by the code above:

#define R_386_TLS_GOTDESC  38 /* GOT offset for TLS descriptor.  */
#define R_386_TLS_DESC     39 /* TLS descriptor containing
                                           pointer to code and to
                                           argument, returning the TLS
                                           offset for the symbol.  */
#define R_386_TLS_DESC_CALL 40 /* Marker of call through TLS
                                           descriptor for
                                           relaxation.  */

TLS_DESC is the dynamic relocation that the linker emits in response
to the two other relocations.  All of the relocations accept addends;
being REL, the relocations have in-place addends.  In the TLS_DESC
case, that applies to a pair of words, the addend is stored in the
second word, to simplify some lazy relocation cases.  In order to
enable lazy relocation, %ebx MUST point to the GOT at the point of the
call instruction.

Instead of adding more relocations for the Local Dynamic case, we
propose the use a symbol _TLS_MODULE_BASE_, that the linker implicitly
defines, as a hidden symbol, to the base address of the TLS section of
a module.


When relaxing the sequence above to the Initial Exec model, we'd get
sequences such as:

        movl variable@GOTNTPOFF(%ebx), %eax
        [...]
        movl %eax, %eax   # or any other two-byte nop

or:

        leal variable@GOTNTPOFF(%ebx), %eax
        [...]
        movl (%eax), %eax

or, in case there's a GOT entry holding the positive TPOFF:

        movl variable@GOTTPOFF(%ebx), %eax
        [...]
        negl %eax

I'm not sure which of these would be better to use, but in my current
implementation I've preferred the first alternative, unless a GOT
entry for the third is needed for other reasons, and one for the
first case isn't.


The Local Exec model uses code such as:

        .byte 0x65       # %gs, or any harmless 1-byte prefix
        movl $variable@NTPOFF, %eax
        [...]
        movl %eax, %eax # or any other two-byte nop

or:

        leal variable@NTPOFF, %eax # avoiding the need for the prefix
        [...]
        movl %eax, %eax # or any other two-byte nop


The function for the static TLS case can be as simple as:

_dl_tlsdesc_return:
        movl 4(%eax), %eax
        ret

whereas the other takes a bit more effort, requiring the following
data structure definitions:

struct tlsdesc
{
  ptrdiff_t __attribute__((regparm(1))) (*entry)(struct tlsdesc *);
  void *arg;
};

typedef struct dl_tls_index
{
  unsigned long int ti_module;
  unsigned long int ti_offset;
} tls_index;

struct tlsdesc_dynamic_arg
{
  tls_index tlsinfo;
  size_t gen_count;
};

The definition of the resolution function follows the logic depicted
below, except for the need to preserve all registers except %eax.

ptrdiff_t
__attribute__ ((__regparm__ (1)))
_dl_tlsdesc_dynamic (struct tlsdesc *tdp)
{
  struct tlsdesc_dynamic_arg *td = tdp->arg;
  dtv_t *dtv = *(dtv_t **)((char *)__thread_pointer + DTV_OFFSET);
  if (__builtin_expect (td->gen_count <= dtv[0].counter
                        && (dtv[td->tlsinfo.ti_module].pointer.val
                            != TLS_DTV_UNALLOCATED),
                        1))
    return dtv[td->tlsinfo.ti_module].pointer.val + td->tlsinfo.ti_offset
      - __thread_pointer;

  /* Preserve any call-clobbered registers not preserved because of
     the above across the call below.  */
  return ___tls_get_addr (&td->tlsinfo) - __thread_pointer;
}

The tlsdesc_dynamic_arg objects are allocated by the dynamic loader
when resolving a relocation, and stored in a hash table created for
the module in which the symbol is defined.  Note that this dynamic
allocation has no implications to prelinking, since prelinking is
only applicable to modules loaded before program start up, and thus
always uses the static TLS case, that does not need dynamic
allocation.


An alternate design in which the function called through the TLS
descriptor returns not the TP offset, but rather the address of the
variable of interest, could refrain from adding %gs:0 to the value
returned by the call to compute the address of a symbol, and from
using the %gs: prefix when accessing the variable, but it would
require the use of a longer call instruction to enable proper
relaxation.  The call instruction would have to be 7, instead of 2
bytes long, such that the linker could relax it to `addl %gs:0, %eax'.
This would make code that accesses the variable 4 bytes longer on
average (5 bytes minus one used by the %gs prefix), whereas code that
computes its address would be shorter by only two bytes.  It's not
clear such a change would be profitable.


Lazy relocation is not profitable in all cases.  Consider that we need
the argument in the TLS descriptor to hold both the addend and the
address of the relocation, if it's a REL relocation, and it probably
wouldn't make sense to depend on dynamic memory allocation for the
relocation.  So, in case we actually need both pieces of information,
we perform the relocation immediately.

In other cases, such as that of RELA relocations and REL relocations
with zero addends, we use the argument to hold a pointer to the
relocation.  Another special case we handle is that of REL relocations
with addends that reference the *ABS* section, i.e., that reference
the local TLS section.

All cases handled by lazy relocation start by grabbing a global
dynamic loader lock and checking that the pointer in the TLS
descriptor hasn't changed.  If it has, we return immediately into the
new pointer, after releasing the lock.  Otherwise, we set the function
pointer to a hold function that will get all other threads that
attempt to use this variable to wait until relocation is complete.

After diverting any other threads to the hold function, we can perform
the relocation, determining whether the referenced symbol is in Static
TLS or not, and deciding which of the two functions to use from that
point on and computing the argument to pass to it.  The argument is
stored first in the TLS descriptor, and the new entry point is stored
last.  The relocation functions finally release the lock and return
into the newly-computed function.

In the current implementation, the hold function attempts to grab the
lock, checks that the pointer hasn't changed and releases the lock,
such that, if any other such TLS descriptor lazy relocation is in
progress, it will wait until the lock is released.  When it obtains
the lock, it releases it immediately and returns into the function
newly-stored in the TLS descriptor.

An alternate implementation is envisioned that relies on condition
variables that hold functions would wait on, such that the relocation
functions wouldn't have to hold the lock throughout their execution,
rather waking up all hold functions upon completion of the relocation.


AMD64/EM64T
-----------

The design is very similar to that of IA32.  The main difference stems
from the fact that AMD64's IP-relative addressing modes have enabled
it to do away with the need for a register holding the GOT pointer,
which in turn required additional measures to enable lazy relocation
of TLS descriptors.

Where the existing ABI uses the following sequence:

        .byte   0x66
        leaq    variable@TLSGD(%rip), %rdi
        .word   0x6666
        rex64
        call    __tls_get_addr@PLT
        # use %eax as the address, or (%eax) to access the value

we propose this instead:

        leaq    tval@TLSDESC(%rip), %rax
        [...] # any other instructions that preserve %eax
        call    *tval@TLSCALL(%rax)
        # add %fs:0 to %eax to compute the address,
        # or use %fs:(%eax) to access the value

Note that, as in the IA32 case, the call instruction is a two-byte
instruction, the offset is completely discarded by the assembler.  The
following new relocations types are used by the code above:

#define R_X86_64_GOTPC32_TLSDESC 27 /* GOT offset for TLS descriptor.  */
#define R_X86_64_TLSDESC         28 /* TLS descriptor.  */
#define R_X86_64_TLSDESC_CALL    29 /* Marker for call through TLS
                                           descriptor.  */

As on IA32, _TLS_MODULE_BASE_ is to be used to obtain the base address
for the Local Dynamic access model.


The alternate design proposed for IA32 that gets the TLS descriptor
call to compute not the offset, but the actual address of the
variable, would require a much longer call instruction to accommodate
the 9 bytes needed to add the TP to the address in case of
relaxation, so it is even less likely to be profitable.


When relaxing the sequence above to the Initial Exec model, we'd get
sequences such as:

        movq    variable@GOTTPOFF(%rip), %rax
        [...]
        rex64 nop   # or any other two-byte nop

I'm not sure this last instruction is the most efficient 2-bytes-long
do-nothing instruction available on AMD64, but it's used for Local
Exec as well:

        movq    $variable@TPOFF, %rax
        [...]
        rex64 nop


The other data structures and TLS descriptor offset computation
functions are equivalent to those used on IA32, with one point worth
noting that the type of the `entry' member of struct tlsdesc cannot be
represented in C, and not even in GNU-extended C, since the function
takes its argument in %rax, rather than in %rdi.


Since AMD64 uses RELA dynamic relocations, all TLS descriptors are
suitable for lazy relocation; there's no need to worry about
preserving the addend, since it is held in the relocation table
itself.  Thus, there's only need for one lazy relocation function.

That said, this function needs some means to be told what module the
relocation at hand refers to.  On IA32, this is done by means of the
reserved register %ebx, that points to the Global Offset Table, but no
register holds this pointer on AMD64.  Although it would be possible
to use the relocation pointer to search the relocation ranges for all
loaded modules, this would be extremely inefficient, so we've
introduced two dynamic table entries to enable the dynamic loader to
communicate with modules that use lazy relocation:

#define DT_TLSDESC_GOT 0x6ffffff7 /* Location of GOT entry used
                                           by TLS descriptor resolver
                                           PLT entry.  */
#define DT_TLSDESC_PLT 0x6ffffff8 /* Location of PLT entry for
                                           TLS descriptor resolver
                                           calls.  */

A module that uses lazy TLSDESC relocations MUST define these two
entries.  The former indicates the address of a GOT entry to be filled
in by the dynamic loader with the address of the internal function to
be used for lazy relocation of TLS descriptors.  The latter must hold
the address of a PLT entry that pushes onto the stack the module's
link map address, located in the GOT portion reserved for the dynamic
loader to use, and then jumps to the lazy relocation function, using
the address stored in the TLSDESC_GOT entry.  The lazy relocation
function is responsible for releasing the stack slot taken up by the
PLT entry.


Future Improvements
===================

The use of TLS descriptors to access thread-local variables would
enable the compression of the DTV such that it contained only entries
for non-static modules.  Static ones could be given negative ids, such
that legacy relocations and direct calls to __tls_get_addr() could
still work correctly, but entries could be omitted from the DTV, and
the DTV entries would no longer need the boolean currently used to
denote entries that are in Static TLS.

The DTV could also be modified so as to contain TP offsets instead of
absolute addresses.  Some refactoring of _dl_tlsdesc_dynamic() and
__tls_get_addr() could avoid the need to subtract TP in the former,
using an alternate entry point that refrains from adding the TP to the
offsets in the new DTV.

It might make sense to combine the current design, in which TLSCALL
functions return the TP offsets, with the alternate design in which
they return the actual address, introducing different relocations for
each case, enabling code to use the former when accessing variables
directly using the TP segment register, and using the latter when
computing their addresses.  It's not obvious that this would have any
significant impact on performance, and although code size could
certainly be reduced for code that computes the address of
thread-local variables, it's not obvious that the need for additional
GOT entries and supporting code would not make up for it.


Conclusion
==========

The design described above is currently functional, using CVS
snapshots of GCC, binutils and glibc taken today, plus local changes
about to be contributed.

Even though the performance of the standard NPTL benchmark is improved
by only a negligible margin, that's understandable: glibc and nptl,
aware that the dynamic access models used to be so inefficient, take
advantage of the fact that libc is always loaded initially and
reference almost all thread-local variables using the more efficient
access models.

However, synthetic benchmarks designed to time functions that return
the value or the address of thread-local variables, have shown that
the performance of the proposed method is significantly better than
that of the currently-used method.  When the referenced variable is
found to be in static TLS at run time, the newly-proposed method makes
such functions about twice as fast as when using the method in wide
use today, bringing it close to the performance of the Initial Exec
model.  Even when the variable is in dynamic TLS, the speedup is still
over 20%.

As for code size, the new method tends to be a win for dynamic
libraries that access TLS variables through the segment registers more
often than they compute the address of such variables.  Libraries in
GNU libc such as libpthread, libmemusage, as well as most dynamic
libraries used in its testsuite, have experienced small reductions in
code size on both architectures.  The dynamic loader and libm
experienced size reductions only on AMD64, retaining the same size on
IA32.  Only libc itself grew in terms of code size, and only on AMD64,
by as little as 0.014%.


This is a draft of work in progress.  The ABI changes suggested above
are not to be taken as final.  In particular, the relocation numbers
and dynamic table entries have not been approved by the official
maintainers of the ABIs, the instruction selection is still subject to
change and even the calling conventions may be modified without
notice.

Copyright 2005 Alexandre Oliva.  Permission is granted to distribute
verbatim copies of this document.  Please contact the author at
[hidden email] or [hidden email] to request additional
permissions.


Change Log
==========

0.9.2 - 2005-09-15: Rename main section to Overall Design.  Clarify
    how the dynamic TLS handler can often avoid the overhead of
    preserving registers.  Add rough figures of performance
    improvements.

0.9.1 - 2005-09-13: Fixed typos and thinkos.  Improved readability.
    Added section with Future Improvements.

0.9 - 2005-09-09: initial version.


--
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:

> Over the past few months, I've been working on porting to IA32 and
> AMD64/EM64T the interesting bits of the TLS design I came up with for
> FR-V, achieving some impressive speedups along with slight code size
> reductions in the most common cases.

> Although the design is not set in stone yet, it's fully implemented
> and functional with patches I'm about to post for binutils, gcc and
> glibc mainline, as follow-ups to this message, except that the GCC
> patch will go to gcc-patches, as expected.

Here's the patch for binutils.

I'm not entirely happy with two aspects of the patch:

- the way I managed to emit the `call *(%[er]ax)' instruction from
  `call *variable@TLSCALL(%[er]ax)', dropping the offset from the
  instruction but still emitting the relocation, seems fragile to me,
  but there were not additional bits available to do something
  cleaner.  Any suggestions on a better approach?

- local_tlsdesc_gotent is probably too wasteful, since very few of all
  local symbols are going to require TLS descriptor entries.  I hope
  this is not too much of a problem, but I could introduce another
  data structure if people feel strongly about it.


Also note the several FIXMEs with decisions yet to be made on exact
instructions to be generated in several cases.  I'm yet to develop
some means to better evaluate the performance of each alternative, but
even then, I have limited hardware to test on.  I'd welcome feedback
from people more familiar with performance features of various
x86-compatible processors.  Anyone?  Thanks in advance,

Here's the patch.  Built and tested on x86_64-linux-gnu and
i686-pc-linux-gnu.  Ok to install?


Index: include/elf/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * common.h (DT_TLSDESC_GOT, DT_TLSDESC_PLT): New.
        * i386.h (R_386_TLS_GOTDESC, R_386_TLS_DESC, R_386_TLS_DESC_CALL):
        New.
        * x86-64.h (R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC,
        R_X86_64_TLSDESC_CALL): New.

Index: include/elf/common.h
===================================================================
RCS file: /cvs/uberbaum/./include/elf/common.h,v
retrieving revision 1.72
diff -u -p -r1.72 common.h
--- include/elf/common.h 14 Jul 2005 22:52:15 -0000 1.72
+++ include/elf/common.h 15 Sep 2005 22:39:19 -0000
@@ -581,6 +581,8 @@
 #define DT_SYMINFO 0x6ffffeff
 #define DT_ADDRRNGHI 0x6ffffeff
 
+#define DT_TLSDESC_GOT 0x6ffffff7
+#define DT_TLSDESC_PLT 0x6ffffff8
 #define DT_RELACOUNT 0x6ffffff9
 #define DT_RELCOUNT 0x6ffffffa
 #define DT_FLAGS_1 0x6ffffffb
Index: include/elf/i386.h
===================================================================
RCS file: /cvs/uberbaum/./include/elf/i386.h,v
retrieving revision 1.10
diff -u -p -r1.10 i386.h
--- include/elf/i386.h 10 May 2005 10:21:10 -0000 1.10
+++ include/elf/i386.h 15 Sep 2005 22:39:19 -0000
@@ -1,5 +1,5 @@
 /* ix86 ELF support for BFD.
-   Copyright 1998, 1999, 2000, 2002, 2004 Free Software Foundation, Inc.
+   Copyright 1998, 1999, 2000, 2002, 2004, 2005 Free Software Foundation, Inc.
 
    This file is part of BFD, the Binary File Descriptor library.
 
@@ -61,6 +61,9 @@ START_RELOC_NUMBERS (elf_i386_reloc_type
      RELOC_NUMBER (R_386_TLS_DTPMOD32, 35)
      RELOC_NUMBER (R_386_TLS_DTPOFF32, 36)
      RELOC_NUMBER (R_386_TLS_TPOFF32,  37)
+     RELOC_NUMBER (R_386_TLS_GOTDESC,  38)
+     RELOC_NUMBER (R_386_TLS_DESC,     39)
+     RELOC_NUMBER (R_386_TLS_DESC_CALL,40)
 
      /* Used by Intel.  */
      RELOC_NUMBER (R_386_USED_BY_INTEL_200, 200)
Index: include/elf/x86-64.h
===================================================================
RCS file: /cvs/uberbaum/./include/elf/x86-64.h,v
retrieving revision 1.8
diff -u -p -r1.8 x86-64.h
--- include/elf/x86-64.h 25 Jul 2005 15:41:07 -0000 1.8
+++ include/elf/x86-64.h 15 Sep 2005 22:39:19 -0000
@@ -1,5 +1,5 @@
 /* x86_64 ELF support for BFD.
-   Copyright (C) 2000, 2001, 2002, 2004 Free Software Foundation, Inc.
+   Copyright (C) 2000, 2001, 2002, 2004, 2005 Free Software Foundation, Inc.
    Contributed by Jan Hubicka <[hidden email]>
 
    This file is part of BFD, the Binary File Descriptor library.
@@ -53,6 +53,13 @@ START_RELOC_NUMBERS (elf_x86_64_reloc_ty
      RELOC_NUMBER (R_X86_64_GOTOFF64, 25)     /* 64 bit offset to GOT */
      RELOC_NUMBER (R_X86_64_GOTPC32,  26)     /* 32 bit signed pc relative
                                                  offset to GOT */
+     RELOC_NUMBER (R_X86_64_GOTPC32_TLSDESC, 27)
+      /* 32 bit signed pc relative
+ offset to TLS descriptor
+ in the GOT.  */
+     RELOC_NUMBER (R_X86_64_TLSDESC, 28)      /* 2x64-bit TLS descriptor.  */
+     RELOC_NUMBER (R_X86_64_TLSDESC_CALL, 29) /* Relaxable call through TLS
+ descriptor.  */
      RELOC_NUMBER (R_X86_64_GNU_VTINHERIT, 250)       /* GNU C++ hack  */
      RELOC_NUMBER (R_X86_64_GNU_VTENTRY, 251)         /* GNU C++ hack  */
 END_RELOC_NUMBERS (R_X86_64_max)
Index: bfd/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * reloc.c (BFD_RELOC_386_TLS_GOTDESC, BFD_RELOC_386_TLS_DESC,
        BFD_RELOC_386_TLS_DESC_CALL, BFD_RELOC_X86_64_GOTPC32_TLSDESC,
        BFD_RELOC_X86_64_TLSDESC, BFD_RELOC_X86_64_TLSDESC_CALL): New.
        * libbfd.h, bfd-in2.h: Rebuilt.
        * elf32-i386.c (elf_howto_table): New relocations.
        (R_386_tls): Adjust.
        (elf_i386_reloc_type_lookup): Map new relocations.
        (GOT_TLS_GDESC, GOT_TLS_GD_BOTH_P): New macros.
        (GOT_TLS_GD_P, GOT_TLS_GDESC_P, GOT_TLS_GD_ANY_P): New macros.
        (struct elf_i386_link_hash_entry): Add tlsdesc_got field.
        (struct elf_i386_obj_tdata): Add local_tlsdesc_gotent field.
        (elf_i386_local_tlsdesc_gotent): New macro.
        (struct elf_i386_link_hash_table): Add sgotplt_jump_table_size.
        (elf_i386_compute_jump_table_size): New macro.
        (link_hash_newfunc): Initialize tlsdesc_got.
        (elf_i386_link_hash_table_create): Set sgotplt_jump_table_size.
        (elf_i386_tls_transition): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf_i386_check_relocs): Likewise.  Allocate space for
        local_tlsdesc_gotent.
        (elf_i386_gc_sweep_hook): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (allocate_dynrelocs): Count function PLT relocations.  Reserve
        space for TLS descriptors and relocations.
        (elf_i386_size_dynamic_sections): Reserve space for TLS
        descriptors and relocations.  Set up sgotplt_jump_table_size.
        Don't zero reloc_count in srelplt.
        (elf_i386_always_size_sections): New.  Set up _TLS_MODULE_BASE_.
        (elf_i386_relocate_section): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf_i386_finish_dynamic_symbol): Use GOT_TLS_GD_ANY_P.
        (elf_backend_always_size_sections): Define.
        * elf64-x86-64.c (x86_64_elf_howto): Add R_X86_64_GOTPC32_TLSDESC,
        R_X86_64_TLSDESC, R_X86_64_TLSDESC_CALL.
        (R_X86_64_standard): Adjust.
        (x86_64_reloc_map): Map new relocs.
        (GOT_TLS_GDESC, GOT_TLS_GD_BOTH_P): New macros.
        (GOT_TLS_GD_P, GOT_TLS_GDESC_P, GOT_TLS_GD_ANY_P): New macros.
        (struct elf64_x86_64_link_hash_entry): Add tlsdesc_got field.
        (struct elf64_x86_64_obj_tdata): Add local_tlsdesc_gotent field.
        (elf64_x86_64_local_tlsdesc_gotent): New macro.
        (struct elf64_x86_64_link_hash_table): Add tlsdesc_plt,
        tlsdesc_got and sgotplt_jump_table_size fields.
        (elf64_x86_64_compute_jump_table_size): New macro.
        (link_hash_newfunc): Initialize tlsdesc_got.
        (elf64_x86_64_link_hash_table_create): Initialize new fields.
        (elf64_x86_64_tls_transition): Handle R_X86_64_GOTPC32_TLSDESC and
        R_X86_64_TLSDESC_CALL.
        (elf64_x86_64_check_relocs): Likewise.  Allocate space for
        local_tlsdesc_gotent.
        (elf64_x86_64_gc_sweep_hook): Handle R_X86_64_GOTPC32_TLSDESC and
        R_X86_64_TLSDESC_CALL.
        (allocate_dynrelocs): Count function PLT relocations.  Reserve
        space for TLS descriptors and relocations.
        (elf64_x86_64_size_dynamic_sections): Reserve space for TLS
        descriptors and relocations.  Set up sgotplt_jump_table_size,
        tlsdesc_plt and tlsdesc_got.  Make room for them.  Don't zero
        reloc_count in srelplt.  Add dynamic entries for DT_TLSDESC_PLT
        and DT_TLSDESC_GOT.
        (elf64_x86_64_always_size_sections): New.  Set up
        _TLS_MODULE_BASE_.
        (elf64_x86_64_relocate_section): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf64_x86_64_finish_dynamic_symbol): Use GOT_TLS_GD_ANY_P.
        (elf64_x86_64_finish_dynamic_sections): Set DT_TLSDESC_PLT and
        DT_TLSDESC_GOT.  Set up TLS descriptor lazy resolver PLT entry.
        (elf_backend_always_size_sections): Define.

Index: bfd/bfd-in2.h
===================================================================
RCS file: /cvs/uberbaum/./bfd/bfd-in2.h,v
retrieving revision 1.366
diff -u -p -r1.366 bfd-in2.h
--- bfd/bfd-in2.h 8 Sep 2005 12:49:18 -0000 1.366
+++ bfd/bfd-in2.h 15 Sep 2005 22:38:15 -0000
@@ -2647,6 +2647,9 @@ in the instruction.  */
   BFD_RELOC_386_TLS_DTPMOD32,
   BFD_RELOC_386_TLS_DTPOFF32,
   BFD_RELOC_386_TLS_TPOFF32,
+  BFD_RELOC_386_TLS_GOTDESC,
+  BFD_RELOC_386_TLS_DESC,
+  BFD_RELOC_386_TLS_DESC_CALL,
 
 /* x86-64/elf relocations  */
   BFD_RELOC_X86_64_GOT32,
@@ -2667,6 +2670,9 @@ in the instruction.  */
   BFD_RELOC_X86_64_TPOFF32,
   BFD_RELOC_X86_64_GOTOFF64,
   BFD_RELOC_X86_64_GOTPC32,
+  BFD_RELOC_X86_64_GOTPC32_TLSDESC,
+  BFD_RELOC_X86_64_TLSDESC,
+  BFD_RELOC_X86_64_TLSDESC_CALL,
 
 /* ns32k relocations  */
   BFD_RELOC_NS32K_IMM_8,
Index: bfd/elf32-i386.c
===================================================================
RCS file: /cvs/uberbaum/./bfd/elf32-i386.c,v
retrieving revision 1.149
diff -u -p -r1.149 elf32-i386.c
--- bfd/elf32-i386.c 31 Aug 2005 23:45:45 -0000 1.149
+++ bfd/elf32-i386.c 15 Sep 2005 22:38:17 -0000
@@ -126,9 +126,18 @@ static reloc_howto_type elf_howto_table[
   HOWTO(R_386_TLS_TPOFF32, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
  bfd_elf_generic_reloc, "R_386_TLS_TPOFF32",
  TRUE, 0xffffffff, 0xffffffff, FALSE),
+  HOWTO(R_386_TLS_GOTDESC, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
+ bfd_elf_generic_reloc, "R_386_TLS_GOTDESC",
+ TRUE, 0xffffffff, 0xffffffff, FALSE),
+  HOWTO(R_386_TLS_DESC, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
+ bfd_elf_generic_reloc, "R_386_TLS_DESC",
+ TRUE, 0xffffffff, 0xffffffff, FALSE),
+  HOWTO(R_386_TLS_DESC_CALL, 0, 0, 0, FALSE, 0, complain_overflow_dont,
+ bfd_elf_generic_reloc, "R_386_TLS_DESC_CALL",
+ FALSE, 0, 0, FALSE),
 
   /* Another gap.  */
-#define R_386_tls (R_386_TLS_TPOFF32 + 1 - R_386_tls_offset)
+#define R_386_tls (R_386_TLS_DESC_CALL + 1 - R_386_tls_offset)
 #define R_386_vt_offset (R_386_GNU_VTINHERIT - R_386_tls)
 
 /* GNU extension to record C++ vtable hierarchy.  */
@@ -292,6 +301,18 @@ elf_i386_reloc_type_lookup (bfd *abfd AT
       TRACE ("BFD_RELOC_386_TLS_TPOFF32");
       return &elf_howto_table[R_386_TLS_TPOFF32 - R_386_tls_offset];
 
+    case BFD_RELOC_386_TLS_GOTDESC:
+      TRACE ("BFD_RELOC_386_TLS_GOTDESC");
+      return &elf_howto_table[R_386_TLS_GOTDESC - R_386_tls_offset];
+
+    case BFD_RELOC_386_TLS_DESC:
+      TRACE ("BFD_RELOC_386_TLS_DESC");
+      return &elf_howto_table[R_386_TLS_DESC - R_386_tls_offset];
+
+    case BFD_RELOC_386_TLS_DESC_CALL:
+      TRACE ("BFD_RELOC_386_TLS_DESC_CALL");
+      return &elf_howto_table[R_386_TLS_DESC_CALL - R_386_tls_offset];
+
     case BFD_RELOC_VTABLE_INHERIT:
       TRACE ("BFD_RELOC_VTABLE_INHERIT");
       return &elf_howto_table[R_386_GNU_VTINHERIT - R_386_vt_offset];
@@ -559,7 +580,20 @@ struct elf_i386_link_hash_entry
 #define GOT_TLS_IE_POS 5
 #define GOT_TLS_IE_NEG 6
 #define GOT_TLS_IE_BOTH 7
+#define GOT_TLS_GDESC 8
+#define GOT_TLS_GD_BOTH_P(type) \
+  ((type) == (GOT_TLS_GD | GOT_TLS_GDESC))
+#define GOT_TLS_GD_P(type) \
+  ((type) == GOT_TLS_GD || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GDESC_P(type) \
+  ((type) == GOT_TLS_GDESC || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GD_ANY_P(type) \
+  (GOT_TLS_GD_P (type) || GOT_TLS_GDESC_P (type))
   unsigned char tls_type;
+
+  /* Offset of the GOTPLT entry reserved for the TLS descriptor,
+     starting at the end of the jump table.  */
+  bfd_vma tlsdesc_got;
 };
 
 #define elf_i386_hash_entry(ent) ((struct elf_i386_link_hash_entry *)(ent))
@@ -570,6 +604,9 @@ struct elf_i386_obj_tdata
 
   /* tls_type for each local got entry.  */
   char *local_got_tls_type;
+
+  /* GOTPLT entries for TLS descriptors.  */
+  bfd_vma *local_tlsdesc_gotent;
 };
 
 #define elf_i386_tdata(abfd) \
@@ -578,6 +615,9 @@ struct elf_i386_obj_tdata
 #define elf_i386_local_got_tls_type(abfd) \
   (elf_i386_tdata (abfd)->local_got_tls_type)
 
+#define elf_i386_local_tlsdesc_gotent(abfd) \
+  (elf_i386_tdata (abfd)->local_tlsdesc_gotent)
+
 static bfd_boolean
 elf_i386_mkobject (bfd *abfd)
 {
@@ -620,6 +660,10 @@ struct elf_i386_link_hash_table
     bfd_vma offset;
   } tls_ldm_got;
 
+  /* The amount of space used by the reserved portion of the sgotplt
+     section, plus whatever space is used by the jump slots.  */
+  bfd_vma sgotplt_jump_table_size;
+
   /* Small local sym to section mapping cache.  */
   struct sym_sec_cache sym_sec;
 };
@@ -629,6 +673,9 @@ struct elf_i386_link_hash_table
 #define elf_i386_hash_table(p) \
   ((struct elf_i386_link_hash_table *) ((p)->hash))
 
+#define elf_i386_compute_jump_table_size(htab) \
+  ((htab)->srelplt->reloc_count * 4)
+
 /* Create an entry in an i386 ELF linker hash table.  */
 
 static struct bfd_hash_entry *
@@ -655,6 +702,7 @@ link_hash_newfunc (struct bfd_hash_entry
       eh = (struct elf_i386_link_hash_entry *) entry;
       eh->dyn_relocs = NULL;
       eh->tls_type = GOT_UNKNOWN;
+      eh->tlsdesc_got = (bfd_vma) -1;
     }
 
   return entry;
@@ -686,6 +734,7 @@ elf_i386_link_hash_table_create (bfd *ab
   ret->sdynbss = NULL;
   ret->srelbss = NULL;
   ret->tls_ldm_got.refcount = 0;
+  ret->sgotplt_jump_table_size = 0;
   ret->sym_sec.abfd = NULL;
   ret->is_vxworks = 0;
   ret->srelplt2 = NULL;
@@ -848,6 +897,8 @@ elf_i386_tls_transition (struct bfd_link
   switch (r_type)
     {
     case R_386_TLS_GD:
+    case R_386_TLS_GOTDESC:
+    case R_386_TLS_DESC_CALL:
     case R_386_TLS_IE_32:
       if (is_local)
  return R_386_TLS_LE_32;
@@ -952,6 +1003,8 @@ elf_i386_check_relocs (bfd *abfd,
 
  case R_386_GOT32:
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
   /* This symbol requires a global offset table entry.  */
   {
     int tls_type, old_tls_type;
@@ -961,6 +1014,9 @@ elf_i386_check_relocs (bfd *abfd,
       default:
       case R_386_GOT32: tls_type = GOT_NORMAL; break;
       case R_386_TLS_GD: tls_type = GOT_TLS_GD; break;
+      case R_386_TLS_GOTDESC:
+      case R_386_TLS_DESC_CALL:
+ tls_type = GOT_TLS_GDESC; break;
       case R_386_TLS_IE_32:
  if (ELF32_R_TYPE (rel->r_info) == r_type)
   tls_type = GOT_TLS_IE_NEG;
@@ -990,13 +1046,16 @@ elf_i386_check_relocs (bfd *abfd,
     bfd_size_type size;
 
     size = symtab_hdr->sh_info;
-    size *= (sizeof (bfd_signed_vma) + sizeof(char));
+    size *= (sizeof (bfd_signed_vma)
+     + sizeof (bfd_vma) + sizeof(char));
     local_got_refcounts = bfd_zalloc (abfd, size);
     if (local_got_refcounts == NULL)
       return FALSE;
     elf_local_got_refcounts (abfd) = local_got_refcounts;
+    elf_i386_local_tlsdesc_gotent (abfd)
+      = (bfd_vma *) (local_got_refcounts + symtab_hdr->sh_info);
     elf_i386_local_got_tls_type (abfd)
-      = (char *) (local_got_refcounts + symtab_hdr->sh_info);
+      = (char *) (local_got_refcounts + 2 * symtab_hdr->sh_info);
   }
  local_got_refcounts[r_symndx] += 1;
  old_tls_type = elf_i386_local_got_tls_type (abfd) [r_symndx];
@@ -1007,11 +1066,14 @@ elf_i386_check_relocs (bfd *abfd,
     /* If a TLS symbol is accessed using IE at least once,
        there is no point to use dynamic model for it.  */
     else if (old_tls_type != tls_type && old_tls_type != GOT_UNKNOWN
-     && (old_tls_type != GOT_TLS_GD
+     && (! GOT_TLS_GD_ANY_P (old_tls_type)
  || (tls_type & GOT_TLS_IE) == 0))
       {
- if ((old_tls_type & GOT_TLS_IE) && tls_type == GOT_TLS_GD)
+ if ((old_tls_type & GOT_TLS_IE) && GOT_TLS_GD_ANY_P (tls_type))
   tls_type = old_tls_type;
+ else if (GOT_TLS_GD_ANY_P (old_tls_type)
+ && GOT_TLS_GD_ANY_P (tls_type))
+  tls_type |= old_tls_type;
  else
   {
     (*_bfd_error_handler)
@@ -1319,6 +1381,8 @@ elf_i386_gc_sweep_hook (bfd *abfd,
   break;
 
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
  case R_386_TLS_IE_32:
  case R_386_TLS_IE:
  case R_386_TLS_GOTIE:
@@ -1582,6 +1646,7 @@ allocate_dynrelocs (struct elf_link_hash
 
   /* We also need to make an entry in the .rel.plt section.  */
   htab->srelplt->size += sizeof (Elf32_External_Rel);
+  htab->srelplt->reloc_count++;
 
   if (htab->is_vxworks && !info->shared)
     {
@@ -1615,6 +1680,9 @@ allocate_dynrelocs (struct elf_link_hash
       h->needs_plt = 0;
     }
 
+  eh = (struct elf_i386_link_hash_entry *) h;
+  eh->tlsdesc_got = (bfd_vma) -1;
+
   /* If R_386_TLS_{IE_32,IE,GOTIE} symbol is now local to the binary,
      make it a R_386_TLS_LE_32 requiring no TLS entry.  */
   if (h->got.refcount > 0
@@ -1638,11 +1706,22 @@ allocate_dynrelocs (struct elf_link_hash
  }
 
       s = htab->sgot;
-      h->got.offset = s->size;
-      s->size += 4;
-      /* R_386_TLS_GD needs 2 consecutive GOT slots.  */
-      if (tls_type == GOT_TLS_GD || tls_type == GOT_TLS_IE_BOTH)
- s->size += 4;
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  eh->tlsdesc_got = htab->sgotplt->size
+    - elf_i386_compute_jump_table_size (htab);
+  htab->sgotplt->size += 8;
+  h->got.offset = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (tls_type)
+  || GOT_TLS_GD_P (tls_type))
+ {
+  h->got.offset = s->size;
+  s->size += 4;
+  /* R_386_TLS_GD needs 2 consecutive GOT slots.  */
+  if (GOT_TLS_GD_P (tls_type) || tls_type == GOT_TLS_IE_BOTH)
+    s->size += 4;
+ }
       dyn = htab->elf.dynamic_sections_created;
       /* R_386_TLS_IE_32 needs one dynamic relocation,
  R_386_TLS_IE resp. R_386_TLS_GOTIE needs one dynamic relocation,
@@ -1651,21 +1730,23 @@ allocate_dynrelocs (struct elf_link_hash
  global.  */
       if (tls_type == GOT_TLS_IE_BOTH)
  htab->srelgot->size += 2 * sizeof (Elf32_External_Rel);
-      else if ((tls_type == GOT_TLS_GD && h->dynindx == -1)
+      else if ((GOT_TLS_GD_P (tls_type) && h->dynindx == -1)
        || (tls_type & GOT_TLS_IE))
  htab->srelgot->size += sizeof (Elf32_External_Rel);
-      else if (tls_type == GOT_TLS_GD)
+      else if (GOT_TLS_GD_P (tls_type))
  htab->srelgot->size += 2 * sizeof (Elf32_External_Rel);
-      else if ((ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
- || h->root.type != bfd_link_hash_undefweak)
+      else if (! GOT_TLS_GDESC_P (tls_type)
+       && (ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
+   || h->root.type != bfd_link_hash_undefweak)
        && (info->shared
    || WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, 0, h)))
  htab->srelgot->size += sizeof (Elf32_External_Rel);
+      if (GOT_TLS_GDESC_P (tls_type))
+ htab->srelplt->size += sizeof (Elf32_External_Rel);
     }
   else
     h->got.offset = (bfd_vma) -1;
 
-  eh = (struct elf_i386_link_hash_entry *) h;
   if (eh->dyn_relocs == NULL)
     return TRUE;
 
@@ -1813,6 +1894,7 @@ elf_i386_size_dynamic_sections (bfd *out
       bfd_signed_vma *local_got;
       bfd_signed_vma *end_local_got;
       char *local_tls_type;
+      bfd_vma *local_tlsdesc_gotent;
       bfd_size_type locsymcount;
       Elf_Internal_Shdr *symtab_hdr;
       asection *srel;
@@ -1855,25 +1937,42 @@ elf_i386_size_dynamic_sections (bfd *out
       locsymcount = symtab_hdr->sh_info;
       end_local_got = local_got + locsymcount;
       local_tls_type = elf_i386_local_got_tls_type (ibfd);
+      local_tlsdesc_gotent = elf_i386_local_tlsdesc_gotent (ibfd);
       s = htab->sgot;
       srel = htab->srelgot;
-      for (; local_got < end_local_got; ++local_got, ++local_tls_type)
+      for (; local_got < end_local_got;
+   ++local_got, ++local_tls_type, ++local_tlsdesc_gotent)
  {
+  *local_tlsdesc_gotent = (bfd_vma) -1;
   if (*local_got > 0)
     {
-      *local_got = s->size;
-      s->size += 4;
-      if (*local_tls_type == GOT_TLS_GD
-  || *local_tls_type == GOT_TLS_IE_BOTH)
- s->size += 4;
+      if (GOT_TLS_GDESC_P (*local_tls_type))
+ {
+  *local_tlsdesc_gotent = htab->sgotplt->size
+    - elf_i386_compute_jump_table_size (htab);
+  htab->sgotplt->size += 8;
+  *local_got = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (*local_tls_type)
+  || GOT_TLS_GD_P (*local_tls_type))
+ {
+  *local_got = s->size;
+  s->size += 4;
+  if (GOT_TLS_GD_P (*local_tls_type)
+      || *local_tls_type == GOT_TLS_IE_BOTH)
+    s->size += 4;
+ }
       if (info->shared
-  || *local_tls_type == GOT_TLS_GD
+  || GOT_TLS_GD_ANY_P (*local_tls_type)
   || (*local_tls_type & GOT_TLS_IE))
  {
   if (*local_tls_type == GOT_TLS_IE_BOTH)
     srel->size += 2 * sizeof (Elf32_External_Rel);
-  else
+  else if (GOT_TLS_GD_P (*local_tls_type)
+   || ! GOT_TLS_GDESC_P (*local_tls_type))
     srel->size += sizeof (Elf32_External_Rel);
+  if (GOT_TLS_GDESC_P (*local_tls_type))
+    htab->srelplt->size += sizeof (Elf32_External_Rel);
  }
     }
   else
@@ -1917,6 +2016,14 @@ elf_i386_size_dynamic_sections (bfd *out
      sym dynamic relocs.  */
   elf_link_hash_traverse (&htab->elf, allocate_dynrelocs, (PTR) info);
 
+  /* For every jump slot reserved in the sgotplt, reloc_count is
+     incremented.  However, when we reserve space for TLS descriptors,
+     it's not incremented, so in order to compute the space reserved
+     for them, it suffices to multiply the reloc count by the jump
+     slot size.  */
+  if (htab->srelplt)
+    htab->sgotplt_jump_table_size = htab->srelplt->reloc_count * 4;
+
   /* We now have determined the sizes of the various dynamic sections.
      Allocate memory for them.  */
   relocs = FALSE;
@@ -1948,7 +2055,8 @@ elf_i386_size_dynamic_sections (bfd *out
 
   /* We use the reloc_count field as a counter if we need
      to copy relocs into the output file.  */
-  s->reloc_count = 0;
+  if (s != htab->srelplt)
+    s->reloc_count = 0;
  }
       else
  {
@@ -2035,6 +2143,41 @@ elf_i386_size_dynamic_sections (bfd *out
   return TRUE;
 }
 
+static bfd_boolean
+elf_i386_always_size_sections (bfd *output_bfd,
+       struct bfd_link_info *info)
+{
+  asection *tls_sec = elf_hash_table (info)->tls_sec;
+
+  if (tls_sec)
+    {
+      struct elf_link_hash_entry *tlsbase;
+
+      tlsbase = elf_link_hash_lookup (elf_hash_table (info),
+      "_TLS_MODULE_BASE_",
+      FALSE, FALSE, FALSE);
+
+      if (tlsbase && tlsbase->type == STT_TLS)
+ {
+  struct bfd_link_hash_entry *bh = NULL;
+  const struct elf_backend_data *bed
+    = get_elf_backend_data (output_bfd);
+
+  if (!(_bfd_generic_link_add_one_symbol
+ (info, output_bfd, "_TLS_MODULE_BASE_", BSF_LOCAL,
+ tls_sec, 0, NULL, FALSE,
+ bed->collect, &bh)))
+    return FALSE;
+  tlsbase = (struct elf_link_hash_entry *)bh;
+  tlsbase->def_regular = 1;
+  tlsbase->other = STV_HIDDEN;
+  (*bed->elf_backend_hide_symbol) (info, tlsbase, TRUE);
+ }
+    }
+
+  return TRUE;
+}
+
 /* Set the correct type for an x86 ELF section.  We do this by the
    section name, which is a hack, but ought to work.  */
 
@@ -2112,6 +2255,7 @@ elf_i386_relocate_section (bfd *output_b
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   bfd_vma *local_got_offsets;
+  bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
   Elf_Internal_Rela *relend;
 
@@ -2119,6 +2263,7 @@ elf_i386_relocate_section (bfd *output_b
   symtab_hdr = &elf_tdata (input_bfd)->symtab_hdr;
   sym_hashes = elf_sym_hashes (input_bfd);
   local_got_offsets = elf_local_got_offsets (input_bfd);
+  local_tlsdesc_gotents = elf_i386_local_tlsdesc_gotent (input_bfd);
 
   rel = relocs;
   relend = relocs + input_section->reloc_count;
@@ -2130,7 +2275,7 @@ elf_i386_relocate_section (bfd *output_b
       struct elf_link_hash_entry *h;
       Elf_Internal_Sym *sym;
       asection *sec;
-      bfd_vma off;
+      bfd_vma off, offplt;
       bfd_vma relocation;
       bfd_boolean unresolved_reloc;
       bfd_reloc_status_type r;
@@ -2552,6 +2697,8 @@ elf_i386_relocate_section (bfd *output_b
   /* Fall through */
 
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
  case R_386_TLS_IE_32:
  case R_386_TLS_GOTIE:
   r_type = elf_i386_tls_transition (info, r_type, h == NULL);
@@ -2566,7 +2713,9 @@ elf_i386_relocate_section (bfd *output_b
     }
   if (tls_type == GOT_TLS_IE)
     tls_type = GOT_TLS_IE_NEG;
-  if (r_type == R_386_TLS_GD)
+  if (r_type == R_386_TLS_GD
+      || r_type == R_386_TLS_GOTDESC
+      || r_type == R_386_TLS_DESC_CALL)
     {
       if (tls_type == GOT_TLS_IE_POS)
  r_type = R_386_TLS_GOTIE;
@@ -2640,6 +2789,67 @@ elf_i386_relocate_section (bfd *output_b
   rel++;
   continue;
  }
+      else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
+ {
+  /* GDesc -> LE transition.
+     It's originally something like:
+     leal x@tlsdesc(%ebx), %eax
+
+     aoliva FIXME.  Decide whether to change it to:
+     gs: .byte 0x65 ; movl $x@ntpoff, %eax
+     or
+     leal x@ntpoff, %eax
+    
+     Registers other than %eax may be set up here.  */
+  
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a leal adding ebx to a
+     32-bit offset into any register, although it's
+     probably almost always going to be eax.  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff >= 2);
+  type = bfd_get_8 (input_bfd, contents + roff - 2);
+  BFD_ASSERT (type == 0x8d);
+  val = bfd_get_8 (input_bfd, contents + roff - 1);
+  BFD_ASSERT ((val & 0xc7) == 0x83);
+  BFD_ASSERT (roff + 4 <= input_section->size);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x65, contents + roff - 2);
+  /* aoliva FIXME: remove the above and xor the byte
+     below with 0x86.  */
+  bfd_put_8 (output_bfd, 0xb8 | ((val >> 3) & 7),
+     contents + roff - 1);
+  bfd_put_32 (output_bfd, -tpoff (info, relocation),
+      contents + roff);
+  continue;
+ }
+      else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
+ {
+  /* GDesc -> LE transition.
+     It's originally:
+     call *(%eax)
+     Turn it into:
+     movl %eax, %eax  */
+  
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a call *(%eax).  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff + 2 <= input_section->size);
+  type = bfd_get_8 (input_bfd, contents + roff);
+  BFD_ASSERT (type == 0xff);
+  val = bfd_get_8 (input_bfd, contents + roff + 1);
+  BFD_ASSERT (val == 0x10);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x89, contents + roff);
+  bfd_put_8 (output_bfd, 0xc0, contents + roff + 1);
+  continue;
+ }
       else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_IE)
  {
   unsigned int val, type;
@@ -2754,13 +2964,17 @@ elf_i386_relocate_section (bfd *output_b
     abort ();
 
   if (h != NULL)
-    off = h->got.offset;
+    {
+      off = h->got.offset;
+      offplt = elf_i386_hash_entry (h)->tlsdesc_got;
+    }
   else
     {
       if (local_got_offsets == NULL)
  abort ();
 
       off = local_got_offsets[r_symndx];
+      offplt = local_tlsdesc_gotents[r_symndx];
     }
 
   if ((off & 1) != 0)
@@ -2770,35 +2984,77 @@ elf_i386_relocate_section (bfd *output_b
       Elf_Internal_Rela outrel;
       bfd_byte *loc;
       int dr_type, indx;
+      asection *sreloc;
 
       if (htab->srelgot == NULL)
  abort ();
 
+      indx = h && h->dynindx != -1 ? h->dynindx : 0;
+
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  outrel.r_info = ELF32_R_INFO (indx, R_386_TLS_DESC);
+  BFD_ASSERT (htab->sgotplt_jump_table_size + offplt + 8
+      <= htab->sgotplt->size);
+  outrel.r_offset = (htab->sgotplt->output_section->vma
+     + htab->sgotplt->output_offset
+     + offplt
+     + htab->sgotplt_jump_table_size);
+  sreloc = htab->srelplt;
+  loc = sreloc->contents;
+  loc += sreloc->reloc_count++
+    * sizeof (Elf32_External_Rel);
+  BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+      <= sreloc->contents + sreloc->size);
+  bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
+  if (indx == 0)
+    {
+      BFD_ASSERT (! unresolved_reloc);
+      bfd_put_32 (output_bfd,
+  relocation - dtpoff_base (info),
+  htab->sgotplt->contents + offplt
+  + htab->sgotplt_jump_table_size + 4);
+    }
+  else
+    {
+      bfd_put_32 (output_bfd, 0,
+  htab->sgotplt->contents + offplt
+  + htab->sgotplt_jump_table_size + 4);
+    }
+ }
+
+      sreloc = htab->srelgot;
+
       outrel.r_offset = (htab->sgot->output_section->vma
  + htab->sgot->output_offset + off);
 
-      indx = h && h->dynindx != -1 ? h->dynindx : 0;
-      if (r_type == R_386_TLS_GD)
+      if (GOT_TLS_GD_P (tls_type))
  dr_type = R_386_TLS_DTPMOD32;
+      else if (GOT_TLS_GDESC_P (tls_type))
+ goto dr_done;
       else if (tls_type == GOT_TLS_IE_POS)
  dr_type = R_386_TLS_TPOFF;
       else
  dr_type = R_386_TLS_TPOFF32;
+
       if (dr_type == R_386_TLS_TPOFF && indx == 0)
  bfd_put_32 (output_bfd, relocation - dtpoff_base (info),
     htab->sgot->contents + off);
       else if (dr_type == R_386_TLS_TPOFF32 && indx == 0)
  bfd_put_32 (output_bfd, dtpoff_base (info) - relocation,
     htab->sgot->contents + off);
-      else
+      else if (dr_type != R_386_TLS_DESC)
  bfd_put_32 (output_bfd, 0,
     htab->sgot->contents + off);
       outrel.r_info = ELF32_R_INFO (indx, dr_type);
-      loc = htab->srelgot->contents;
-      loc += htab->srelgot->reloc_count++ * sizeof (Elf32_External_Rel);
+
+      loc = sreloc->contents;
+      loc += sreloc->reloc_count++ * sizeof (Elf32_External_Rel);
+      BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+  <= sreloc->contents + sreloc->size);
       bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
 
-      if (r_type == R_386_TLS_GD)
+      if (GOT_TLS_GD_P (tls_type))
  {
   if (indx == 0)
     {
@@ -2814,8 +3070,10 @@ elf_i386_relocate_section (bfd *output_b
       outrel.r_info = ELF32_R_INFO (indx,
     R_386_TLS_DTPOFF32);
       outrel.r_offset += 4;
-      htab->srelgot->reloc_count++;
+      sreloc->reloc_count++;
       loc += sizeof (Elf32_External_Rel);
+      BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+  <= sreloc->contents + sreloc->size);
       bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
     }
  }
@@ -2826,25 +3084,33 @@ elf_i386_relocate_section (bfd *output_b
       htab->sgot->contents + off + 4);
   outrel.r_info = ELF32_R_INFO (indx, R_386_TLS_TPOFF);
   outrel.r_offset += 4;
-  htab->srelgot->reloc_count++;
+  sreloc->reloc_count++;
   loc += sizeof (Elf32_External_Rel);
   bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
  }
 
+    dr_done:
       if (h != NULL)
  h->got.offset |= 1;
       else
  local_got_offsets[r_symndx] |= 1;
     }
 
-  if (off >= (bfd_vma) -2)
+  if (off >= (bfd_vma) -2
+      && ! GOT_TLS_GDESC_P (tls_type))
     abort ();
-  if (r_type == ELF32_R_TYPE (rel->r_info))
+  if (r_type == R_386_TLS_GOTDESC
+      || r_type == R_386_TLS_DESC_CALL)
+    {
+      relocation = htab->sgotplt_jump_table_size + offplt;
+      unresolved_reloc = FALSE;
+    }
+  else if (r_type == ELF32_R_TYPE (rel->r_info))
     {
       bfd_vma g_o_t = htab->sgotplt->output_section->vma
       + htab->sgotplt->output_offset;
       relocation = htab->sgot->output_section->vma
-   + htab->sgot->output_offset + off - g_o_t;
+ + htab->sgot->output_offset + off - g_o_t;
       if ((r_type == R_386_TLS_IE || r_type == R_386_TLS_GOTIE)
   && tls_type == GOT_TLS_IE_BOTH)
  relocation += 4;
@@ -2852,7 +3118,7 @@ elf_i386_relocate_section (bfd *output_b
  relocation += g_o_t;
       unresolved_reloc = FALSE;
     }
-  else
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GD)
     {
       unsigned int val, type;
       bfd_vma roff;
@@ -2916,6 +3182,97 @@ elf_i386_relocate_section (bfd *output_b
       rel++;
       continue;
     }
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
+    {
+      /* GDesc -> IE transition.
+ It's originally something like:
+ leal x@tlsdesc(%ebx), %eax
+
+ aoliva FIXME: decide whether to change it to:
+ movl x@gotntpoff(%ebx), %eax # before movl %eax,%eax
+ or
+ leal x@gotntpoff(%ebx), %eax # before movl (%eax),%eax
+ but the latter won't work if we need to negate the
+ loaded value.
+    
+ Registers other than %eax may be set up here.  */
+  
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a leal adding ebx to a 32-bit
+ offset into any register, although it's probably
+ almost always going to be eax.  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff >= 2);
+      type = bfd_get_8 (input_bfd, contents + roff - 2);
+      BFD_ASSERT (type == 0x8d);
+      val = bfd_get_8 (input_bfd, contents + roff - 1);
+      BFD_ASSERT ((val & 0xc7) == 0x83);
+      BFD_ASSERT (roff + 4 <= input_section->size);
+
+      /* Now modify the instruction as appropriate.  */
+      /* To turn a leal into a movl in the form we use it, it
+ suffices to change the first byte from 0x8d to 0x8b.
+ aoliva FIXME: should we decide to keep the leal, all
+ we have to do is remove the statement below, and
+ adjust the relaxation of R_386_TLS_DESC_CALL.  */
+      bfd_put_8 (output_bfd, 0x8b, contents + roff - 2);
+
+      if (tls_type == GOT_TLS_IE_BOTH)
+ off += 4;
+
+      bfd_put_32 (output_bfd,
+  htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off
+  - htab->sgotplt->output_section->vma
+  - htab->sgotplt->output_offset,
+  contents + roff);
+      continue;
+    }
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
+    {
+      /* GDesc -> IE transition.
+ It's originally:
+ calll *(%eax)
+
+ aoliva FIXME: decide whether to change it to:
+ movl %eax,%eax # after movl x@gotntpoff(%ebx), %eax
+ or
+ movl (%eax),%eax # after leal x@gotntpoff(%ebx), %eax
+
+         Either one works unless we have to negate the
+         offset.  */
+  
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a call *(%eax).  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff + 2 <= input_section->size);
+      type = bfd_get_8 (input_bfd, contents + roff);
+      BFD_ASSERT (type == 0xff);
+      val = bfd_get_8 (input_bfd, contents + roff + 1);
+      BFD_ASSERT (val == 0x10);
+
+      /* Now modify the instruction as appropriate.  */
+      if (tls_type != GOT_TLS_IE_NEG)
+ {
+  /* movl %eax,%eax */
+  bfd_put_8 (output_bfd, 0x89, contents + roff);
+  bfd_put_8 (output_bfd, 0xc0, contents + roff + 1);
+ }
+      else
+ {
+  /* negl %eax */
+  bfd_put_8 (output_bfd, 0xf7, contents + roff);
+  bfd_put_8 (output_bfd, 0xd8, contents + roff + 1);
+ }
+
+      continue;
+    }
+  else
+    BFD_ASSERT (FALSE);
   break;
 
  case R_386_TLS_LDM:
@@ -3223,7 +3580,7 @@ elf_i386_finish_dynamic_symbol (bfd *out
     }
 
   if (h->got.offset != (bfd_vma) -1
-      && elf_i386_hash_entry(h)->tls_type != GOT_TLS_GD
+      && ! GOT_TLS_GD_ANY_P (elf_i386_hash_entry(h)->tls_type)
       && (elf_i386_hash_entry(h)->tls_type & GOT_TLS_IE) == 0)
     {
       Elf_Internal_Rela rel;
@@ -3558,6 +3915,7 @@ elf_i386_plt_sym_val (bfd_vma i, const a
 #define elf_backend_reloc_type_class      elf_i386_reloc_type_class
 #define elf_backend_relocate_section      elf_i386_relocate_section
 #define elf_backend_size_dynamic_sections     elf_i386_size_dynamic_sections
+#define elf_backend_always_size_sections      elf_i386_always_size_sections
 #define elf_backend_plt_sym_val      elf_i386_plt_sym_val
 
 #include "elf32-target.h"
Index: bfd/elf64-x86-64.c
===================================================================
RCS file: /cvs/uberbaum/./bfd/elf64-x86-64.c,v
retrieving revision 1.107
diff -u -p -r1.107 elf64-x86-64.c
--- bfd/elf64-x86-64.c 31 Aug 2005 23:45:46 -0000 1.107
+++ bfd/elf64-x86-64.c 15 Sep 2005 22:38:19 -0000
@@ -112,12 +112,24 @@ static reloc_howto_type x86_64_elf_howto
   HOWTO(R_X86_64_GOTPC32, 0, 2, 32, TRUE, 0, complain_overflow_signed,
  bfd_elf_generic_reloc, "R_X86_64_GOTPC32",
  FALSE, 0xffffffff, 0xffffffff, TRUE),
+  HOWTO(R_X86_64_GOTPC32_TLSDESC, 0, 2, 32, TRUE, 0,
+ complain_overflow_bitfield, bfd_elf_generic_reloc,
+ "R_X86_64_GOTPC32_TLSDESC",
+ FALSE, 0xffffffff, 0xffffffff, TRUE),
+  HOWTO(R_X86_64_TLSDESC, 0, 4, 64, FALSE, 0,
+ complain_overflow_bitfield, bfd_elf_generic_reloc,
+ "R_X86_64_TLSDESC",
+ FALSE, MINUS_ONE, MINUS_ONE, FALSE),
+  HOWTO(R_X86_64_TLSDESC_CALL, 0, 0, 0, FALSE, 0,
+ complain_overflow_dont, bfd_elf_generic_reloc,
+ "R_X86_64_TLSDESC_CALL",
+ FALSE, 0, 0, FALSE),
 
   /* We have a gap in the reloc numbers here.
      R_X86_64_standard counts the number up to this point, and
      R_X86_64_vt_offset is the value to subtract from a reloc type of
      R_X86_64_GNU_VT* to form an index into this table.  */
-#define R_X86_64_standard (R_X86_64_GOTPC32 + 1)
+#define R_X86_64_standard (R_X86_64_TLSDESC_CALL + 1)
 #define R_X86_64_vt_offset (R_X86_64_GNU_VTINHERIT - R_X86_64_standard)
 
 /* GNU extension to record C++ vtable hierarchy.  */
@@ -166,6 +178,9 @@ static const struct elf_reloc_map x86_64
   { BFD_RELOC_64_PCREL, R_X86_64_PC64, },
   { BFD_RELOC_X86_64_GOTOFF64, R_X86_64_GOTOFF64, },
   { BFD_RELOC_X86_64_GOTPC32, R_X86_64_GOTPC32, },
+  { BFD_RELOC_X86_64_GOTPC32_TLSDESC, R_X86_64_GOTPC32_TLSDESC, },
+  { BFD_RELOC_X86_64_TLSDESC, R_X86_64_TLSDESC, },
+  { BFD_RELOC_X86_64_TLSDESC_CALL, R_X86_64_TLSDESC_CALL, },
   { BFD_RELOC_VTABLE_INHERIT, R_X86_64_GNU_VTINHERIT, },
   { BFD_RELOC_VTABLE_ENTRY, R_X86_64_GNU_VTENTRY, },
 };
@@ -353,7 +368,20 @@ struct elf64_x86_64_link_hash_entry
 #define GOT_NORMAL 1
 #define GOT_TLS_GD 2
 #define GOT_TLS_IE 3
+#define GOT_TLS_GDESC 4
+#define GOT_TLS_GD_BOTH_P(type) \
+  ((type) == (GOT_TLS_GD | GOT_TLS_GDESC))
+#define GOT_TLS_GD_P(type) \
+  ((type) == GOT_TLS_GD || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GDESC_P(type) \
+  ((type) == GOT_TLS_GDESC || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GD_ANY_P(type) \
+  (GOT_TLS_GD_P (type) || GOT_TLS_GDESC_P (type))
   unsigned char tls_type;
+
+  /* Offset of the GOTPLT entry reserved for the TLS descriptor,
+     starting at the end of the jump table.  */
+  bfd_vma tlsdesc_got;
 };
 
 #define elf64_x86_64_hash_entry(ent) \
@@ -365,6 +393,9 @@ struct elf64_x86_64_obj_tdata
 
   /* tls_type for each local got entry.  */
   char *local_got_tls_type;
+
+  /* GOTPLT entries for TLS descriptors.  */
+  bfd_vma *local_tlsdesc_gotent;
 };
 
 #define elf64_x86_64_tdata(abfd) \
@@ -373,6 +404,8 @@ struct elf64_x86_64_obj_tdata
 #define elf64_x86_64_local_got_tls_type(abfd) \
   (elf64_x86_64_tdata (abfd)->local_got_tls_type)
 
+#define elf64_x86_64_local_tlsdesc_gotent(abfd) \
+  (elf64_x86_64_tdata (abfd)->local_tlsdesc_gotent)
 
 /* x86-64 ELF linker hash table.  */
 
@@ -389,11 +422,23 @@ struct elf64_x86_64_link_hash_table
   asection *sdynbss;
   asection *srelbss;
 
+  /* The offset into splt of the PLT entry for the TLS descriptor
+     resolver.  Special values are 0, if not necessary (or not found
+     to be necessary yet), and -1 if needed but not determined
+     yet.  */
+  bfd_vma tlsdesc_plt;
+  /* The offset into sgot of the GOT entry used by the PLT entry
+     above.  */
+  bfd_vma tlsdesc_got;
+
   union {
     bfd_signed_vma refcount;
     bfd_vma offset;
   } tls_ld_got;
 
+  /* The amount of space used by the jump slots in the GOT.  */
+  bfd_vma sgotplt_jump_table_size;
+
   /* Small local sym to section mapping cache.  */
   struct sym_sec_cache sym_sec;
 };
@@ -403,6 +448,9 @@ struct elf64_x86_64_link_hash_table
 #define elf64_x86_64_hash_table(p) \
   ((struct elf64_x86_64_link_hash_table *) ((p)->hash))
 
+#define elf64_x86_64_compute_jump_table_size(htab) \
+  ((htab)->srelplt->reloc_count * GOT_ENTRY_SIZE)
+
 /* Create an entry in an x86-64 ELF linker hash table. */
 
 static struct bfd_hash_entry *
@@ -428,6 +476,7 @@ link_hash_newfunc (struct bfd_hash_entry
       eh = (struct elf64_x86_64_link_hash_entry *) entry;
       eh->dyn_relocs = NULL;
       eh->tls_type = GOT_UNKNOWN;
+      eh->tlsdesc_got = (bfd_vma) -1;
     }
 
   return entry;
@@ -459,7 +508,10 @@ elf64_x86_64_link_hash_table_create (bfd
   ret->sdynbss = NULL;
   ret->srelbss = NULL;
   ret->sym_sec.abfd = NULL;
+  ret->tlsdesc_plt = 0;
+  ret->tlsdesc_got = 0;
   ret->tls_ld_got.refcount = 0;
+  ret->sgotplt_jump_table_size = 0;
 
   return &ret->elf.root;
 }
@@ -619,6 +671,8 @@ elf64_x86_64_tls_transition (struct bfd_
   switch (r_type)
     {
     case R_X86_64_TLSGD:
+    case R_X86_64_GOTPC32_TLSDESC:
+    case R_X86_64_TLSDESC_CALL:
     case R_X86_64_GOTTPOFF:
       if (is_local)
  return R_X86_64_TPOFF32;
@@ -709,6 +763,8 @@ elf64_x86_64_check_relocs (bfd *abfd, st
  case R_X86_64_GOT32:
  case R_X86_64_GOTPCREL:
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
   /* This symbol requires a global offset table entry. */
   {
     int tls_type, old_tls_type;
@@ -718,6 +774,9 @@ elf64_x86_64_check_relocs (bfd *abfd, st
       default: tls_type = GOT_NORMAL; break;
       case R_X86_64_TLSGD: tls_type = GOT_TLS_GD; break;
       case R_X86_64_GOTTPOFF: tls_type = GOT_TLS_IE; break;
+      case R_X86_64_GOTPC32_TLSDESC:
+      case R_X86_64_TLSDESC_CALL:
+ tls_type = GOT_TLS_GDESC; break;
       }
 
     if (h != NULL)
@@ -736,14 +795,17 @@ elf64_x86_64_check_relocs (bfd *abfd, st
     bfd_size_type size;
 
     size = symtab_hdr->sh_info;
-    size *= sizeof (bfd_signed_vma) + sizeof (char);
+    size *= sizeof (bfd_signed_vma)
+      + sizeof (bfd_vma) + sizeof (char);
     local_got_refcounts = ((bfd_signed_vma *)
    bfd_zalloc (abfd, size));
     if (local_got_refcounts == NULL)
       return FALSE;
     elf_local_got_refcounts (abfd) = local_got_refcounts;
+    elf64_x86_64_local_tlsdesc_gotent (abfd)
+      = (bfd_vma *) (local_got_refcounts + symtab_hdr->sh_info);
     elf64_x86_64_local_got_tls_type (abfd)
-      = (char *) (local_got_refcounts + symtab_hdr->sh_info);
+      = (char *) (local_got_refcounts + 2 * symtab_hdr->sh_info);
   }
  local_got_refcounts[r_symndx] += 1;
  old_tls_type
@@ -753,10 +815,14 @@ elf64_x86_64_check_relocs (bfd *abfd, st
     /* If a TLS symbol is accessed using IE at least once,
        there is no point to use dynamic model for it.  */
     if (old_tls_type != tls_type && old_tls_type != GOT_UNKNOWN
- && (old_tls_type != GOT_TLS_GD || tls_type != GOT_TLS_IE))
+ && (! GOT_TLS_GD_ANY_P (old_tls_type)
+    || tls_type != GOT_TLS_IE))
       {
- if (old_tls_type == GOT_TLS_IE && tls_type == GOT_TLS_GD)
+ if (old_tls_type == GOT_TLS_IE && GOT_TLS_GD_ANY_P (tls_type))
   tls_type = old_tls_type;
+ else if (GOT_TLS_GD_ANY_P (old_tls_type)
+ && GOT_TLS_GD_ANY_P (tls_type))
+  tls_type |= old_tls_type;
  else
   {
     (*_bfd_error_handler)
@@ -1104,6 +1170,8 @@ elf64_x86_64_gc_sweep_hook (bfd *abfd, s
   break;
 
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
  case R_X86_64_GOTTPOFF:
  case R_X86_64_GOT32:
  case R_X86_64_GOTPCREL:
@@ -1371,6 +1439,7 @@ allocate_dynrelocs (struct elf_link_hash
 
   /* We also need to make an entry in the .rela.plt section.  */
   htab->srelplt->size += sizeof (Elf64_External_Rela);
+  htab->srelplt->reloc_count++;
  }
       else
  {
@@ -1384,6 +1453,9 @@ allocate_dynrelocs (struct elf_link_hash
       h->needs_plt = 0;
     }
 
+  eh = (struct elf64_x86_64_link_hash_entry *) h;
+  eh->tlsdesc_got = (bfd_vma) -1;
+  
   /* If R_X86_64_GOTTPOFF symbol is now local to the binary,
      make it a R_X86_64_TPOFF32 requiring no GOT entry.  */
   if (h->got.refcount > 0
@@ -1406,31 +1478,46 @@ allocate_dynrelocs (struct elf_link_hash
     return FALSE;
  }
 
-      s = htab->sgot;
-      h->got.offset = s->size;
-      s->size += GOT_ENTRY_SIZE;
-      /* R_X86_64_TLSGD needs 2 consecutive GOT slots.  */
-      if (tls_type == GOT_TLS_GD)
- s->size += GOT_ENTRY_SIZE;
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  eh->tlsdesc_got = htab->sgotplt->size
+    - elf64_x86_64_compute_jump_table_size (htab);
+  htab->sgotplt->size += 2 * GOT_ENTRY_SIZE;
+  h->got.offset = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (tls_type)
+  || GOT_TLS_GD_P (tls_type))
+ {
+  s = htab->sgot;
+  h->got.offset = s->size;
+  s->size += GOT_ENTRY_SIZE;
+  if (GOT_TLS_GD_P (tls_type))
+    s->size += GOT_ENTRY_SIZE;
+ }
       dyn = htab->elf.dynamic_sections_created;
       /* R_X86_64_TLSGD needs one dynamic relocation if local symbol
  and two if global.
  R_X86_64_GOTTPOFF needs one dynamic relocation.  */
-      if ((tls_type == GOT_TLS_GD && h->dynindx == -1)
+      if ((GOT_TLS_GD_P (tls_type) && h->dynindx == -1)
   || tls_type == GOT_TLS_IE)
  htab->srelgot->size += sizeof (Elf64_External_Rela);
-      else if (tls_type == GOT_TLS_GD)
+      else if (GOT_TLS_GD_P (tls_type))
  htab->srelgot->size += 2 * sizeof (Elf64_External_Rela);
-      else if ((ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
- || h->root.type != bfd_link_hash_undefweak)
+      else if (! GOT_TLS_GDESC_P (tls_type)
+       && (ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
+   || h->root.type != bfd_link_hash_undefweak)
        && (info->shared
    || WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, 0, h)))
  htab->srelgot->size += sizeof (Elf64_External_Rela);
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  htab->srelplt->size += sizeof (Elf64_External_Rela);
+  htab->tlsdesc_plt = (bfd_vma) -1;
+ }
     }
   else
     h->got.offset = (bfd_vma) -1;
 
-  eh = (struct elf64_x86_64_link_hash_entry *) h;
   if (eh->dyn_relocs == NULL)
     return TRUE;
 
@@ -1578,6 +1665,7 @@ elf64_x86_64_size_dynamic_sections (bfd
       bfd_signed_vma *local_got;
       bfd_signed_vma *end_local_got;
       char *local_tls_type;
+      bfd_vma *local_tlsdesc_gotent;
       bfd_size_type locsymcount;
       Elf_Internal_Shdr *symtab_hdr;
       asection *srel;
@@ -1621,20 +1709,43 @@ elf64_x86_64_size_dynamic_sections (bfd
       locsymcount = symtab_hdr->sh_info;
       end_local_got = local_got + locsymcount;
       local_tls_type = elf64_x86_64_local_got_tls_type (ibfd);
+      local_tlsdesc_gotent = elf64_x86_64_local_tlsdesc_gotent (ibfd);
       s = htab->sgot;
       srel = htab->srelgot;
-      for (; local_got < end_local_got; ++local_got, ++local_tls_type)
+      for (; local_got < end_local_got;
+   ++local_got, ++local_tls_type, ++local_tlsdesc_gotent)
  {
+  *local_tlsdesc_gotent = (bfd_vma) -1;
   if (*local_got > 0)
     {
-      *local_got = s->size;
-      s->size += GOT_ENTRY_SIZE;
-      if (*local_tls_type == GOT_TLS_GD)
- s->size += GOT_ENTRY_SIZE;
+      if (GOT_TLS_GDESC_P (*local_tls_type))
+ {
+  *local_tlsdesc_gotent = htab->sgotplt->size
+    - elf64_x86_64_compute_jump_table_size (htab);
+  htab->sgotplt->size += 2 * GOT_ENTRY_SIZE;
+  *local_got = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (*local_tls_type)
+  || GOT_TLS_GD_P (*local_tls_type))
+ {
+  *local_got = s->size;
+  s->size += GOT_ENTRY_SIZE;
+  if (GOT_TLS_GD_P (*local_tls_type))
+    s->size += GOT_ENTRY_SIZE;
+ }
       if (info->shared
-  || *local_tls_type == GOT_TLS_GD
+  || GOT_TLS_GD_ANY_P (*local_tls_type)
   || *local_tls_type == GOT_TLS_IE)
- srel->size += sizeof (Elf64_External_Rela);
+ {
+  if (GOT_TLS_GDESC_P (*local_tls_type))
+    {
+      htab->srelplt->size += sizeof (Elf64_External_Rela);
+      htab->tlsdesc_plt = (bfd_vma) -1;
+    }
+  if (! GOT_TLS_GDESC_P (*local_tls_type)
+      || GOT_TLS_GD_P (*local_tls_type))
+    srel->size += sizeof (Elf64_External_Rela);
+ }
     }
   else
     *local_got = (bfd_vma) -1;
@@ -1656,6 +1767,27 @@ elf64_x86_64_size_dynamic_sections (bfd
      sym dynamic relocs.  */
   elf_link_hash_traverse (&htab->elf, allocate_dynrelocs, (PTR) info);
 
+  /* For every jump slot reserved in the sgotplt, reloc_count is
+     incremented.  However, when we reserve space for TLS descriptors,
+     it's not incremented, so in order to compute the space reserved
+     for them, it suffices to multiply the reloc count by the jump
+     slot size.  */
+  if (htab->srelplt)
+    htab->sgotplt_jump_table_size
+      = elf64_x86_64_compute_jump_table_size (htab);
+
+  if (htab->tlsdesc_plt)
+    {
+      htab->tlsdesc_got = htab->sgot->size;
+      htab->sgot->size += GOT_ENTRY_SIZE;
+      /* Reserve room for the initial entry.
+ FIXME: we could probably do away with it in this case.  */
+      if (htab->splt->size == 0)
+ htab->splt->size += PLT_ENTRY_SIZE;
+      htab->tlsdesc_plt = htab->splt->size;
+      htab->splt->size += PLT_ENTRY_SIZE;
+    }
+
   /* We now have determined the sizes of the various dynamic sections.
      Allocate memory for them.  */
   relocs = FALSE;
@@ -1679,7 +1811,8 @@ elf64_x86_64_size_dynamic_sections (bfd
 
   /* We use the reloc_count field as a counter if we need
      to copy relocs into the output file.  */
-  s->reloc_count = 0;
+  if (s != htab->srelplt)
+    s->reloc_count = 0;
  }
       else
  {
@@ -1739,6 +1872,11 @@ elf64_x86_64_size_dynamic_sections (bfd
       || !add_dynamic_entry (DT_PLTREL, DT_RELA)
       || !add_dynamic_entry (DT_JMPREL, 0))
     return FALSE;
+
+  if (htab->tlsdesc_plt
+      && (!add_dynamic_entry (DT_TLSDESC_PLT, 0)
+  || !add_dynamic_entry (DT_TLSDESC_GOT, 0)))
+    return FALSE;
  }
 
       if (relocs)
@@ -1766,6 +1904,41 @@ elf64_x86_64_size_dynamic_sections (bfd
   return TRUE;
 }
 
+static bfd_boolean
+elf64_x86_64_always_size_sections (bfd *output_bfd,
+   struct bfd_link_info *info)
+{
+  asection *tls_sec = elf_hash_table (info)->tls_sec;
+
+  if (tls_sec)
+    {
+      struct elf_link_hash_entry *tlsbase;
+
+      tlsbase = elf_link_hash_lookup (elf_hash_table (info),
+      "_TLS_MODULE_BASE_",
+      FALSE, FALSE, FALSE);
+
+      if (tlsbase && tlsbase->type == STT_TLS)
+ {
+  struct bfd_link_hash_entry *bh = NULL;
+  const struct elf_backend_data *bed
+    = get_elf_backend_data (output_bfd);
+
+  if (!(_bfd_generic_link_add_one_symbol
+ (info, output_bfd, "_TLS_MODULE_BASE_", BSF_LOCAL,
+ tls_sec, 0, NULL, FALSE,
+ bed->collect, &bh)))
+    return FALSE;
+  tlsbase = (struct elf_link_hash_entry *)bh;
+  tlsbase->def_regular = 1;
+  tlsbase->other = STV_HIDDEN;
+  (*bed->elf_backend_hide_symbol) (info, tlsbase, TRUE);
+ }
+    }
+
+  return TRUE;
+}
+
 /* Return the base VMA address which should be subtracted from real addresses
    when resolving @dtpoff relocation.
    This is PT_TLS segment p_vaddr.  */
@@ -1824,6 +1997,7 @@ elf64_x86_64_relocate_section (bfd *outp
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   bfd_vma *local_got_offsets;
+  bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
   Elf_Internal_Rela *relend;
 
@@ -1834,6 +2008,7 @@ elf64_x86_64_relocate_section (bfd *outp
   symtab_hdr = &elf_tdata (input_bfd)->symtab_hdr;
   sym_hashes = elf_sym_hashes (input_bfd);
   local_got_offsets = elf_local_got_offsets (input_bfd);
+  local_tlsdesc_gotents = elf64_x86_64_local_tlsdesc_gotent (input_bfd);
 
   rel = relocs;
   relend = relocs + input_section->reloc_count;
@@ -1845,7 +2020,7 @@ elf64_x86_64_relocate_section (bfd *outp
       struct elf_link_hash_entry *h;
       Elf_Internal_Sym *sym;
       asection *sec;
-      bfd_vma off;
+      bfd_vma off, offplt;
       bfd_vma relocation;
       bfd_boolean unresolved_reloc;
       bfd_reloc_status_type r;
@@ -2204,6 +2379,8 @@ elf64_x86_64_relocate_section (bfd *outp
   break;
 
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
  case R_X86_64_GOTTPOFF:
   r_type = elf64_x86_64_tls_transition (info, r_type, h == NULL);
   tls_type = GOT_UNKNOWN;
@@ -2215,7 +2392,9 @@ elf64_x86_64_relocate_section (bfd *outp
       if (!info->shared && h->dynindx == -1 && tls_type == GOT_TLS_IE)
  r_type = R_X86_64_TPOFF32;
     }
-  if (r_type == R_X86_64_TLSGD)
+  if (r_type == R_X86_64_TLSGD
+      || r_type == R_X86_64_GOTPC32_TLSDESC
+      || r_type == R_X86_64_TLSDESC_CALL)
     {
       if (tls_type == GOT_TLS_IE)
  r_type = R_X86_64_GOTTPOFF;
@@ -2257,6 +2436,67 @@ elf64_x86_64_relocate_section (bfd *outp
   rel++;
   continue;
  }
+      else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
+ {
+  /* GDesc -> LE transition.
+     It's originally something like:
+     leaq x@tlsdesc(%rip), %rax
+
+     Change it to:
+     movl $x@tpoff, %rax
+    
+     Registers other than %rax may be set up here.  */
+  
+  unsigned int val, type, type2;
+  bfd_vma roff;
+
+  /* First, make sure it's a leaq adding rip to a
+     32-bit offset into any register, although it's
+     probably almost always going to be rax.  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff >= 3);
+  type = bfd_get_8 (input_bfd, contents + roff - 3);
+  BFD_ASSERT ((type & 0xfb) == 0x48);
+  type2 = bfd_get_8 (input_bfd, contents + roff - 2);
+  BFD_ASSERT (type2 == 0x8d);
+  val = bfd_get_8 (input_bfd, contents + roff - 1);
+  BFD_ASSERT ((val & 0xc7) == 0x05);
+  BFD_ASSERT (roff + 4 <= input_section->size);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x48 | ((type >> 2) & 1),
+     contents + roff - 3);
+  bfd_put_8 (output_bfd, 0xc7, contents + roff - 2);
+  bfd_put_8 (output_bfd, 0xc0 | ((val >> 3) & 7),
+     contents + roff - 1);
+  bfd_put_32 (output_bfd, tpoff (info, relocation),
+      contents + roff);
+  continue;
+ }
+      else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSDESC_CALL)
+ {
+  /* GDesc -> LE transition.
+     It's originally:
+     call *(%rax)
+     Turn it into:
+     rex64 nop.  */
+  
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a call *(%rax).  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff + 2 <= input_section->size);
+  type = bfd_get_8 (input_bfd, contents + roff);
+  BFD_ASSERT (type == 0xff);
+  val = bfd_get_8 (input_bfd, contents + roff + 1);
+  BFD_ASSERT (val == 0x10);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x48, contents + roff);
+  bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
+  continue;
+ }
       else
  {
   unsigned int val, type, reg;
@@ -2322,13 +2562,17 @@ elf64_x86_64_relocate_section (bfd *outp
     abort ();
 
   if (h != NULL)
-    off = h->got.offset;
+    {
+      off = h->got.offset;
+      offplt = elf64_x86_64_hash_entry (h)->tlsdesc_got;
+    }
   else
     {
       if (local_got_offsets == NULL)
  abort ();
 
       off = local_got_offsets[r_symndx];
+      offplt = local_tlsdesc_gotents[r_symndx];
     }
 
   if ((off & 1) != 0)
@@ -2338,30 +2582,61 @@ elf64_x86_64_relocate_section (bfd *outp
       Elf_Internal_Rela outrel;
       bfd_byte *loc;
       int dr_type, indx;
+      asection *sreloc;
 
       if (htab->srelgot == NULL)
  abort ();
 
+      indx = h && h->dynindx != -1 ? h->dynindx : 0;
+
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  outrel.r_info = ELF64_R_INFO (indx, R_X86_64_TLSDESC);
+  BFD_ASSERT (htab->sgotplt_jump_table_size + offplt
+      + 2 * GOT_ENTRY_SIZE <= htab->sgotplt->size);
+  outrel.r_offset = (htab->sgotplt->output_section->vma
+     + htab->sgotplt->output_offset
+     + offplt
+     + htab->sgotplt_jump_table_size);
+  sreloc = htab->srelplt;
+  loc = sreloc->contents;
+  loc += sreloc->reloc_count++
+    * sizeof (Elf64_External_Rela);
+  BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+      <= sreloc->contents + sreloc->size);
+  if (indx == 0)
+    outrel.r_addend = relocation - dtpoff_base (info);
+  else
+    outrel.r_addend = 0;
+  bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
+ }
+
+      sreloc = htab->srelgot;
+
       outrel.r_offset = (htab->sgot->output_section->vma
  + htab->sgot->output_offset + off);
 
-      indx = h && h->dynindx != -1 ? h->dynindx : 0;
-      if (r_type == R_X86_64_TLSGD)
+      if (GOT_TLS_GD_P (tls_type))
  dr_type = R_X86_64_DTPMOD64;
+      else if (GOT_TLS_GDESC_P (tls_type))
+ goto dr_done;
       else
  dr_type = R_X86_64_TPOFF64;
 
       bfd_put_64 (output_bfd, 0, htab->sgot->contents + off);
       outrel.r_addend = 0;
-      if (dr_type == R_X86_64_TPOFF64 && indx == 0)
+      if ((dr_type == R_X86_64_TPOFF64
+   || dr_type == R_X86_64_TLSDESC) && indx == 0)
  outrel.r_addend = relocation - dtpoff_base (info);
       outrel.r_info = ELF64_R_INFO (indx, dr_type);
 
-      loc = htab->srelgot->contents;
-      loc += htab->srelgot->reloc_count++ * sizeof (Elf64_External_Rela);
+      loc = sreloc->contents;
+      loc += sreloc->reloc_count++ * sizeof (Elf64_External_Rela);
+      BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+  <= sreloc->contents + sreloc->size);
       bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
 
-      if (r_type == R_X86_64_TLSGD)
+      if (GOT_TLS_GD_P (tls_type))
  {
   if (indx == 0)
     {
@@ -2377,27 +2652,37 @@ elf64_x86_64_relocate_section (bfd *outp
       outrel.r_info = ELF64_R_INFO (indx,
     R_X86_64_DTPOFF64);
       outrel.r_offset += GOT_ENTRY_SIZE;
-      htab->srelgot->reloc_count++;
+      sreloc->reloc_count++;
       loc += sizeof (Elf64_External_Rela);
+      BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+  <= sreloc->contents + sreloc->size);
       bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
     }
  }
 
+    dr_done:
       if (h != NULL)
  h->got.offset |= 1;
       else
  local_got_offsets[r_symndx] |= 1;
     }
 
-  if (off >= (bfd_vma) -2)
+  if (off >= (bfd_vma) -2
+      && ! GOT_TLS_GDESC_P (tls_type))
     abort ();
   if (r_type == ELF64_R_TYPE (rel->r_info))
     {
-      relocation = htab->sgot->output_section->vma
-   + htab->sgot->output_offset + off;
+      if (r_type == R_X86_64_GOTPC32_TLSDESC
+  || r_type == R_X86_64_TLSDESC_CALL)
+ relocation = htab->sgotplt->output_section->vma
+  + htab->sgotplt->output_offset
+  + offplt + htab->sgotplt_jump_table_size;
+      else
+ relocation = htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off;
       unresolved_reloc = FALSE;
     }
-  else
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSGD)
     {
       unsigned int i;
       static unsigned char tlsgd[8]
@@ -2437,6 +2722,79 @@ elf64_x86_64_relocate_section (bfd *outp
       rel++;
       continue;
     }
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
+    {
+      /* GDesc -> IE transition.
+ It's originally something like:
+ leaq x@tlsdesc(%rip), %rax
+
+ Change it to:
+ movq x@gottpoff(%rip), %rax # before rex64 nop
+    
+ Registers other than %rax may be set up here.  */
+  
+      unsigned int val, type, type2;
+      bfd_vma roff;
+
+      /* First, make sure it's a leaq adding rip to a 32-bit
+ offset into any register, although it's probably
+ almost always going to be rax.  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff >= 3);
+      type = bfd_get_8 (input_bfd, contents + roff - 3);
+      BFD_ASSERT ((type & 0xfb) == 0x48);
+      type2 = bfd_get_8 (input_bfd, contents + roff - 2);
+      BFD_ASSERT (type2 == 0x8d);
+      val = bfd_get_8 (input_bfd, contents + roff - 1);
+      BFD_ASSERT ((val & 0xc7) == 0x05);
+      BFD_ASSERT (roff + 4 <= input_section->size);
+
+      /* Now modify the instruction as appropriate.  */
+      /* To turn a leaq into a movq in the form we use it, it
+ suffices to change the second byte from 0x8d to 0x8b.
+ aoliva FIXME: should we decide to keep the leaq, all
+ we have to do is remove the statement below, and
+ adjust the relaxation of R_X86_64_TLSDESC_CALL.  */
+      bfd_put_8 (output_bfd, 0x8b, contents + roff - 2);
+
+      bfd_put_32 (output_bfd,
+  htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off
+  - rel->r_offset
+  - input_section->output_section->vma
+  - input_section->output_offset
+  - 4,
+  contents + roff);
+      continue;
+    }
+  else if (ELF64_R_TYPE (rel->r_info) ==
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Andreas Jaeger
Alexandre Oliva <[hidden email]> writes:

> Index: include/elf/x86-64.h
> ===================================================================
> RCS file: /cvs/uberbaum/./include/elf/x86-64.h,v
> retrieving revision 1.8
> diff -u -p -r1.8 x86-64.h
> --- include/elf/x86-64.h 25 Jul 2005 15:41:07 -0000 1.8
> +++ include/elf/x86-64.h 15 Sep 2005 22:39:19 -0000
> @@ -1,5 +1,5 @@
>  /* x86_64 ELF support for BFD.
> -   Copyright (C) 2000, 2001, 2002, 2004 Free Software Foundation, Inc.
> +   Copyright (C) 2000, 2001, 2002, 2004, 2005 Free Software Foundation, Inc.
>     Contributed by Jan Hubicka <[hidden email]>
>  
>     This file is part of BFD, the Binary File Descriptor library.
> @@ -53,6 +53,13 @@ START_RELOC_NUMBERS (elf_x86_64_reloc_ty
>       RELOC_NUMBER (R_X86_64_GOTOFF64, 25)     /* 64 bit offset to GOT */
>       RELOC_NUMBER (R_X86_64_GOTPC32,  26)     /* 32 bit signed pc relative
>                                                   offset to GOT */
> +     RELOC_NUMBER (R_X86_64_GOTPC32_TLSDESC, 27)
> +      /* 32 bit signed pc relative
> + offset to TLS descriptor
> + in the GOT.  */
> +     RELOC_NUMBER (R_X86_64_TLSDESC, 28)      /* 2x64-bit TLS descriptor.  */
> +     RELOC_NUMBER (R_X86_64_TLSDESC_CALL, 29) /* Relaxable call through TLS
> + descriptor.  */
Please check the current x86-64 ABI, the relocations until 31 are all
taken and reserved, you cannot use the above ones,

Andreas
--
 Andreas Jaeger, [hidden email], http://www.suse.de/~aj
  SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 N├╝rnberg, Germany
   GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Jan Beulich
>Please check the current x86-64 ABI, the relocations until 31 are all
>taken and reserved, you cannot use the above ones,

It's actually 33 meanwhile. And the same proposal that adds 32 and 33
to x86-64 also suggests using 38 on i386 for a different purpose. See
the thread starting at
http://www.x86-64.org/mailing_lists/list?listname=discuss&listnum=0

Jan
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
On Sep 16, 2005, "Jan Beulich" <[hidden email]> wrote:

>> Please check the current x86-64 ABI, the relocations until 31 are all
>> taken and reserved, you cannot use the above ones,

> It's actually 33 meanwhile. And the same proposal that adds 32 and 33
> to x86-64 also suggests using 38 on i386 for a different purpose. See
> the thread starting at
> http://www.x86-64.org/mailing_lists/list?listname=discuss&listnum=0

Thanks, I've renumbered the relocations to avoid conflicts and, while
at that, the dynamic table entries to a more appropriate range.

Here's a patch for the x86-64 ABI document that adds the new
relocations and references the new dynamic table numbers, referring to
the latest version of my specs on the web for details.

I'll post updated patches for binutils and glibc as soon as I get
confirmation that the 3 relocation numbers are at least reserved for
this purpose.

Thanks,


Index: ChangeLog
from  Alexandre Oliva  <[hidden email]>

        * object-files.tex (Relocation Types): Add
        R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL and
        R_X86_64_TLSDESC.  Add pointer to description.
        * dl.tex (Procedure Linkage Table): Mention lazy relocation of TLS
        descriptors.

Index: dl.tex
===================================================================
RCS file: /cvs/Repository/x86-64-ABI/dl.tex,v
retrieving revision 1.31
diff -u -p -r1.31 dl.tex
--- dl.tex 24 Aug 2005 15:11:42 -0000 1.31
+++ dl.tex 16 Sep 2005 18:10:44 -0000
@@ -265,6 +265,14 @@ evaluates procedure linkage table entrie
 resolution and relocation until the first execution of a table entry.
 \index{procedure linkage table|)}
 
+Relocation entries of type \codeindex{R_X86_64_TLSDESC} may also be
+subject to lazy relocation, using a single entry in the procedure
+linkage table and in the global offset table, at locations given by
+\texttt{DT_TLSDESC_PLT} and \texttt{DT_TLSDESC_GOT}, respectively, as
+described in ``Thread-Local Storage Descriptors for IA32 and
+AMD64/EM64T''\footnote{This document is currently available via
+  \url{http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
+
 \subsubsection{Large Models}
 
 In the small and medium code models the size of both the PLT and the GOT
Index: object-files.tex
===================================================================
RCS file: /cvs/Repository/x86-64-ABI/object-files.tex,v
retrieving revision 1.36
diff -u -p -r1.36 object-files.tex
--- object-files.tex 14 Sep 2005 22:36:07 -0000 1.36
+++ object-files.tex 16 Sep 2005 18:10:46 -0000
@@ -448,6 +448,9 @@ the relocation addend.
       \texttt{R_X86_64_GOTPC32} & 26 & \textit{word32} & \texttt{GOT + A - P} \\
       \texttt{R_X86_64_SIZE32} & 32 & \textit{word32} & \texttt{Z + A} \\
       \texttt{R_X86_64_SIZE64} & 33 & \textit{word64} & \texttt{Z + A} \\
+      \texttt{R_X86_64_GOTPC32_TLSDESC} & 34 & \textit{word32} &  \\
+      \texttt{R_X86_64_TLSDESC_CALL} & 35 & none &  \\
+      \texttt{R_X86_64_TLSDESC} & 36 & \textit{word64}$\times 2$ & \\
 %      \texttt{R_X86_64_GOT64} & 16 & \textit{word64} & \texttt{G + A} \\
 %      \texttt{R_X86_64_PLT64} & 17 & \textit{word64} & \texttt{L + A - P} \\
     \end{tabular}
@@ -501,7 +504,14 @@ The relocations \texttt{R_X86_64_DPTMOD6
 of the Thread-Local Storage ABI extensions and are documented in the
 document called ``ELF Handling for Thread-Local
 Storage''\footnote{This document is currently available via
-  \url{http://people.redhat.com/drepper/tls.pdf}}\index{Thread-Local Storage}.
+  \url{http://people.redhat.com/drepper/tls.pdf}}\index{Thread-Local
+  Storage}.  The relocations \texttt{R_X86_64_GOTPC32_TLSDESC},
+\texttt{R_X86_64_TLSDESC_CALL} and \texttt{R_X86_64_TLSDESC} are also
+used for Thread-Local Storage, but are not documented there as of this
+writing.  A description can be found in the document ``Thread-Local
+Storage Descriptors for IA32 and AMD64/EM64T''\footnote{This document
+  is currently available via
+  \url{http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
 
 \end{sloppypar}
 


--
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}
Reply | Threaded
Open this post in threaded view
|

RE: RFC: TLS improvements for IA32 and AMD64/EM64T

Menezes, Evandro
In reply to this post by Alexandre Oliva-2
Alexandre,

> Here's a patch for the x86-64 ABI document that adds the new
> relocations and references the new dynamic table numbers,
> referring to the latest version of my specs on the web for details.

Please add the respective calculations and coding examples.

Thanks,

_______________________________________________________
Evandro Menezes                          GNU Tools Team
512-602-9940                                        AMD
[hidden email]                      Austin, TX

Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
On Sep 16, 2005, "Menezes, Evandro" <[hidden email]> wrote:

> Alexandre,
>> Here's a patch for the x86-64 ABI document that adds the new
>> relocations and references the new dynamic table numbers,
>> referring to the latest version of my specs on the web for details.

> Please add the respective calculations and coding examples.

Please read the document referenced in the patch, for starters.  In it
you'll see the exact spelling of the coding samples is not final yet,
and it doesn't make sense to maintain yet another copy of this until
it settles down.  Also, you'll find that the calculations are not
quite possible to express in the way other relocations are expressed;
suggestions are welcome.  Finally, what's wrong with following the
existing practice of referring to TLS specs elsewhere?

The point of this posting was more to reserve the relocation numbers
for these purposes (the purpose of the relocations is quite solid
already, even though the numbers have changed as recently as
yesterday), but I'm yet to do some more performance tests with some
minor variations of the code sequences to choose the best one.  I
don't want to have to maintain all this information in sync between
multiple specs documents and the several different packages that
implement them; having a single specs document is much better for now.

Thanks for your, ehrm, valuable feedback :-/

--
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
In reply to this post by Alexandre Oliva-2
On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:

> On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:
>> Over the past few months, I've been working on porting to IA32 and
>> AMD64/EM64T the interesting bits of the TLS design I came up with for
>> FR-V, achieving some impressive speedups along with slight code size
>> reductions in the most common cases.

>> Although the design is not set in stone yet, it's fully implemented
>> and functional with patches I'm about to post for binutils, gcc and
>> glibc mainline, as follow-ups to this message, except that the GCC
>> patch will go to gcc-patches, as expected.

> Here's the patch for binutils.

> I'm not entirely happy with two aspects of the patch:

> - the way I managed to emit the `call *(%[er]ax)' instruction from
>   `call *variable@TLSCALL(%[er]ax)', dropping the offset from the
>   instruction but still emitting the relocation, seems fragile to me,
>   but there were not additional bits available to do something
>   cleaner.  Any suggestions on a better approach?

> - local_tlsdesc_gotent is probably too wasteful, since very few of all
>   local symbols are going to require TLS descriptor entries.  I hope
>   this is not too much of a problem, but I could introduce another
>   data structure if people feel strongly about it.


> Also note the several FIXMEs with decisions yet to be made on exact
> instructions to be generated in several cases.  I'm yet to develop
> some means to better evaluate the performance of each alternative, but
> even then, I have limited hardware to test on.  I'd welcome feedback
> from people more familiar with performance features of various
> x86-compatible processors.  Anyone?  Thanks in advance,

> Here's the patch.  Built and tested on x86_64-linux-gnu and
> i686-pc-linux-gnu.  Ok to install?

Updated patch, using different relocation numbers, and different
dynamic table numbers as well.  Same tests run and passed.  Ok to
install?



--
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}

binutils-20050917.patch.bz2 (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: RFC: TLS improvements for IA32 and AMD64/EM64T

Menezes, Evandro
In reply to this post by Alexandre Oliva-2
Alexandre,

> Please read the document referenced in the patch, for
> starters.  In it you'll see the exact spelling of the coding
> samples is not final yet, and it doesn't make sense to
> maintain yet another copy of this until it settles down.  

When it does, it'll be added to the ABI then.  Not before.  For now, it's OK to reserve the relocation numbers in this mailing list.  

> Also, you'll find that the calculations are not quite
> possible to express in the way other relocations are
> expressed; suggestions are welcome.  

State so, perhaps in a note, expanding what they mean.

> Finally, what's wrong
> with following the existing practice of referring to TLS
> specs elsewhere?

The intent is that the x86-64 ABI remains a stand-alone document as much as possible.  It's not quite there yet, but adding yet another external reference sets it back even further.

BTW, the TLS reference is slated to be incorporated into the x86-64 ABI.

> The point of this posting was more to reserve the relocation
> numbers for these purposes (the purpose of the relocations is
> quite solid already, even though the numbers have changed as
> recently as yesterday), but I'm yet to do some more
> performance tests with some minor variations of the code
> sequences to choose the best one.  I don't want to have to
> maintain all this information in sync between multiple specs
> documents and the several different packages that implement
> them; having a single specs document is much better for now.

That's fine.  When it reaches a mature state, patches against the ABI will be more than welcome.

Thanks,

_______________________________________________________
Evandro Menezes                          GNU Tools Team
512-602-9940                                        AMD
[hidden email]                      Austin, TX

Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
In reply to this post by Alexandre Oliva-2
On Sep 17, 2005, Alexandre Oliva <[hidden email]> wrote:

> On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:
>> On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:
>>> Over the past few months, I've been working on porting to IA32 and
>>> AMD64/EM64T the interesting bits of the TLS design I came up with for
>>> FR-V, achieving some impressive speedups along with slight code size
>>> reductions in the most common cases.
>> Here's the patch.  Built and tested on x86_64-linux-gnu and
>> i686-pc-linux-gnu.  Ok to install?

> Updated patch, using different relocation numbers, and different
> dynamic table numbers as well.  Same tests run and passed.  Ok to
> install?

Updated again.  Only significant change is that we no longer emit the
new dynamic table entries on x86-64 if we're linking with -z now,
since then we know we won't be resolving the relocations lazily, and
so the entries are not used at all.

Index: bfd/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * reloc.c (BFD_RELOC_386_TLS_GOTDESC, BFD_RELOC_386_TLS_DESC,
        BFD_RELOC_386_TLS_DESC_CALL, BFD_RELOC_X86_64_GOTPC32_TLSDESC,
        BFD_RELOC_X86_64_TLSDESC, BFD_RELOC_X86_64_TLSDESC_CALL): New.
        * libbfd.h, bfd-in2.h: Rebuilt.
        * elf32-i386.c (elf_howto_table): New relocations.
        (R_386_tls): Adjust.
        (elf_i386_reloc_type_lookup): Map new relocations.
        (GOT_TLS_GDESC, GOT_TLS_GD_BOTH_P): New macros.
        (GOT_TLS_GD_P, GOT_TLS_GDESC_P, GOT_TLS_GD_ANY_P): New macros.
        (struct elf_i386_link_hash_entry): Add tlsdesc_got field.
        (struct elf_i386_obj_tdata): Add local_tlsdesc_gotent field.
        (elf_i386_local_tlsdesc_gotent): New macro.
        (struct elf_i386_link_hash_table): Add sgotplt_jump_table_size.
        (elf_i386_compute_jump_table_size): New macro.
        (link_hash_newfunc): Initialize tlsdesc_got.
        (elf_i386_link_hash_table_create): Set sgotplt_jump_table_size.
        (elf_i386_tls_transition): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf_i386_check_relocs): Likewise.  Allocate space for
        local_tlsdesc_gotent.
        (elf_i386_gc_sweep_hook): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (allocate_dynrelocs): Count function PLT relocations.  Reserve
        space for TLS descriptors and relocations.
        (elf_i386_size_dynamic_sections): Reserve space for TLS
        descriptors and relocations.  Set up sgotplt_jump_table_size.
        Don't zero reloc_count in srelplt.
        (elf_i386_always_size_sections): New.  Set up _TLS_MODULE_BASE_.
        (elf_i386_relocate_section): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf_i386_finish_dynamic_symbol): Use GOT_TLS_GD_ANY_P.
        (elf_backend_always_size_sections): Define.
        * elf64-x86-64.c (x86_64_elf_howto): Add R_X86_64_GOTPC32_TLSDESC,
        R_X86_64_TLSDESC, R_X86_64_TLSDESC_CALL.
        (R_X86_64_standard): Adjust.
        (x86_64_reloc_map): Map new relocs.
        (elf64_x86_64_rtype_to_howto): New, split out of...
        (elf64_x86_64_info_to_howto): ... this function, and...
        (elf64_x86_64_reloc_type_lookup): ... use it to map elf_reloc_val.
        (GOT_TLS_GDESC, GOT_TLS_GD_BOTH_P): New macros.
        (GOT_TLS_GD_P, GOT_TLS_GDESC_P, GOT_TLS_GD_ANY_P): New macros.
        (struct elf64_x86_64_link_hash_entry): Add tlsdesc_got field.
        (struct elf64_x86_64_obj_tdata): Add local_tlsdesc_gotent field.
        (elf64_x86_64_local_tlsdesc_gotent): New macro.
        (struct elf64_x86_64_link_hash_table): Add tlsdesc_plt,
        tlsdesc_got and sgotplt_jump_table_size fields.
        (elf64_x86_64_compute_jump_table_size): New macro.
        (link_hash_newfunc): Initialize tlsdesc_got.
        (elf64_x86_64_link_hash_table_create): Initialize new fields.
        (elf64_x86_64_tls_transition): Handle R_X86_64_GOTPC32_TLSDESC and
        R_X86_64_TLSDESC_CALL.
        (elf64_x86_64_check_relocs): Likewise.  Allocate space for
        local_tlsdesc_gotent.
        (elf64_x86_64_gc_sweep_hook): Handle R_X86_64_GOTPC32_TLSDESC and
        R_X86_64_TLSDESC_CALL.
        (allocate_dynrelocs): Count function PLT relocations.  Reserve
        space for TLS descriptors and relocations.
        (elf64_x86_64_size_dynamic_sections): Reserve space for TLS
        descriptors and relocations.  Set up sgotplt_jump_table_size,
        tlsdesc_plt and tlsdesc_got.  Make room for them.  Don't zero
        reloc_count in srelplt.  Add dynamic entries for DT_TLSDESC_PLT
        and DT_TLSDESC_GOT.
        (elf64_x86_64_always_size_sections): New.  Set up
        _TLS_MODULE_BASE_.
        (elf64_x86_64_relocate_section): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf64_x86_64_finish_dynamic_symbol): Use GOT_TLS_GD_ANY_P.
        (elf64_x86_64_finish_dynamic_sections): Set DT_TLSDESC_PLT and
        DT_TLSDESC_GOT.  Set up TLS descriptor lazy resolver PLT entry.
        (elf_backend_always_size_sections): Define.

Index: binutils/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * readelf.c (get_dynamic_type): Handle DT_TLSDESC_GOT and
        DT_TLSDESC_PLT.

Index: gas/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * config/tc-i386.c (tc_i386_fix_adjustable): Handle
        BFD_RELOC_386_TLS_GOTDESC, BFD_RELOC_386_TLS_DESC_CALL,
        BFD_RELOC_X86_64_GOTPC32_TLSDESC, BFD_RELOC_X86_64_TLSDESC_CALL.
        (optimize_disp): Emit fix up for BFD_RELOC_386_TLS_DESC_CALL and
        BFD_RELOC_X86_64_TLSDESC_CALL immediately, and clear the
        displacement bits.
        (build_modrm_byte): Set up zero modrm for TLS desc calls.
        (lex_got): Handle @tlsdesc and @tlscall.
        (md_apply_fix, tc_gen_reloc): Handle the new relocations.

Index: include/elf/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * common.h (DT_TLSDESC_GOT, DT_TLSDESC_PLT): New.
        * i386.h (R_386_TLS_GOTDESC, R_386_TLS_DESC_CALL, R_386_TLS_DESC):
        New.
        * x86-64.h (R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL,
        R_X86_64_TLSDESC): New.

Index: ld/testsuite/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * ld-i386/i386.exp: Run on x86_64-*-linux* and amd64-*-linux*.
        Add new tests.
        * ld-i386/pcrel16.d: Add -melf_i386.
        * ld-i386/pcrel8.d: Likewise.
        * ld-i386/tlsbindesc.dd: New.
        * ld-i386/tlsbindesc.rd: New.
        * ld-i386/tlsbindesc.s: New.
        * ld-i386/tlsbindesc.sd: New.
        * ld-i386/tlsbindesc.td: New.
        * ld-i386/tlsdesc.dd: New.
        * ld-i386/tlsdesc.rd: New.
        * ld-i386/tlsdesc.s: New.
        * ld-i386/tlsdesc.sd: New.
        * ld-i386/tlsdesc.td: New.
        * ld-i386/tlsgdesc.dd: New.
        * ld-i386/tlsgdesc.rd: New.
        * ld-i386/tlsgdesc.s: New.
        * ld-x86-64/x86-64.exp: Run new tests.
        * ld-x86-64/tlsbindesc.dd: New.
        * ld-x86-64/tlsbindesc.rd: New.
        * ld-x86-64/tlsbindesc.s: New.
        * ld-x86-64/tlsbindesc.sd: New.
        * ld-x86-64/tlsbindesc.td: New.
        * ld-x86-64/tlsdesc.dd: New.
        * ld-x86-64/tlsdesc.pd: New.
        * ld-x86-64/tlsdesc.rd: New.
        * ld-x86-64/tlsdesc.s: New.
        * ld-x86-64/tlsdesc.sd: New.
        * ld-x86-64/tlsdesc.td: New.
        * ld-x86-64/tlsgdesc.dd: New.
        * ld-x86-64/tlsgdesc.rd: New.
        * ld-x86-64/tlsgdesc.s: New.

Index: bfd/bfd-in2.h
===================================================================
RCS file: /cvs/uberbaum/./bfd/bfd-in2.h,v
retrieving revision 1.366
diff -u -p -r1.366 bfd-in2.h
--- bfd/bfd-in2.h 8 Sep 2005 12:49:18 -0000 1.366
+++ bfd/bfd-in2.h 21 Sep 2005 06:12:08 -0000
@@ -2647,6 +2647,9 @@ in the instruction.  */
   BFD_RELOC_386_TLS_DTPMOD32,
   BFD_RELOC_386_TLS_DTPOFF32,
   BFD_RELOC_386_TLS_TPOFF32,
+  BFD_RELOC_386_TLS_GOTDESC,
+  BFD_RELOC_386_TLS_DESC_CALL,
+  BFD_RELOC_386_TLS_DESC,
 
 /* x86-64/elf relocations  */
   BFD_RELOC_X86_64_GOT32,
@@ -2667,6 +2670,9 @@ in the instruction.  */
   BFD_RELOC_X86_64_TPOFF32,
   BFD_RELOC_X86_64_GOTOFF64,
   BFD_RELOC_X86_64_GOTPC32,
+  BFD_RELOC_X86_64_GOTPC32_TLSDESC,
+  BFD_RELOC_X86_64_TLSDESC_CALL,
+  BFD_RELOC_X86_64_TLSDESC,
 
 /* ns32k relocations  */
   BFD_RELOC_NS32K_IMM_8,
Index: bfd/elf32-i386.c
===================================================================
RCS file: /cvs/uberbaum/./bfd/elf32-i386.c,v
retrieving revision 1.149
diff -u -p -r1.149 elf32-i386.c
--- bfd/elf32-i386.c 31 Aug 2005 23:45:45 -0000 1.149
+++ bfd/elf32-i386.c 21 Sep 2005 06:12:10 -0000
@@ -126,9 +126,19 @@ static reloc_howto_type elf_howto_table[
   HOWTO(R_386_TLS_TPOFF32, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
  bfd_elf_generic_reloc, "R_386_TLS_TPOFF32",
  TRUE, 0xffffffff, 0xffffffff, FALSE),
+  EMPTY_HOWTO (38),
+  HOWTO(R_386_TLS_GOTDESC, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
+ bfd_elf_generic_reloc, "R_386_TLS_GOTDESC",
+ TRUE, 0xffffffff, 0xffffffff, FALSE),
+  HOWTO(R_386_TLS_DESC_CALL, 0, 0, 0, FALSE, 0, complain_overflow_dont,
+ bfd_elf_generic_reloc, "R_386_TLS_DESC_CALL",
+ FALSE, 0, 0, FALSE),
+  HOWTO(R_386_TLS_DESC, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
+ bfd_elf_generic_reloc, "R_386_TLS_DESC",
+ TRUE, 0xffffffff, 0xffffffff, FALSE),
 
   /* Another gap.  */
-#define R_386_tls (R_386_TLS_TPOFF32 + 1 - R_386_tls_offset)
+#define R_386_tls (R_386_TLS_DESC + 1 - R_386_tls_offset)
 #define R_386_vt_offset (R_386_GNU_VTINHERIT - R_386_tls)
 
 /* GNU extension to record C++ vtable hierarchy.  */
@@ -292,6 +302,18 @@ elf_i386_reloc_type_lookup (bfd *abfd AT
       TRACE ("BFD_RELOC_386_TLS_TPOFF32");
       return &elf_howto_table[R_386_TLS_TPOFF32 - R_386_tls_offset];
 
+    case BFD_RELOC_386_TLS_GOTDESC:
+      TRACE ("BFD_RELOC_386_TLS_GOTDESC");
+      return &elf_howto_table[R_386_TLS_GOTDESC - R_386_tls_offset];
+
+    case BFD_RELOC_386_TLS_DESC_CALL:
+      TRACE ("BFD_RELOC_386_TLS_DESC_CALL");
+      return &elf_howto_table[R_386_TLS_DESC_CALL - R_386_tls_offset];
+
+    case BFD_RELOC_386_TLS_DESC:
+      TRACE ("BFD_RELOC_386_TLS_DESC");
+      return &elf_howto_table[R_386_TLS_DESC - R_386_tls_offset];
+
     case BFD_RELOC_VTABLE_INHERIT:
       TRACE ("BFD_RELOC_VTABLE_INHERIT");
       return &elf_howto_table[R_386_GNU_VTINHERIT - R_386_vt_offset];
@@ -559,7 +581,20 @@ struct elf_i386_link_hash_entry
 #define GOT_TLS_IE_POS 5
 #define GOT_TLS_IE_NEG 6
 #define GOT_TLS_IE_BOTH 7
+#define GOT_TLS_GDESC 8
+#define GOT_TLS_GD_BOTH_P(type) \
+  ((type) == (GOT_TLS_GD | GOT_TLS_GDESC))
+#define GOT_TLS_GD_P(type) \
+  ((type) == GOT_TLS_GD || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GDESC_P(type) \
+  ((type) == GOT_TLS_GDESC || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GD_ANY_P(type) \
+  (GOT_TLS_GD_P (type) || GOT_TLS_GDESC_P (type))
   unsigned char tls_type;
+
+  /* Offset of the GOTPLT entry reserved for the TLS descriptor,
+     starting at the end of the jump table.  */
+  bfd_vma tlsdesc_got;
 };
 
 #define elf_i386_hash_entry(ent) ((struct elf_i386_link_hash_entry *)(ent))
@@ -570,6 +605,9 @@ struct elf_i386_obj_tdata
 
   /* tls_type for each local got entry.  */
   char *local_got_tls_type;
+
+  /* GOTPLT entries for TLS descriptors.  */
+  bfd_vma *local_tlsdesc_gotent;
 };
 
 #define elf_i386_tdata(abfd) \
@@ -578,6 +616,9 @@ struct elf_i386_obj_tdata
 #define elf_i386_local_got_tls_type(abfd) \
   (elf_i386_tdata (abfd)->local_got_tls_type)
 
+#define elf_i386_local_tlsdesc_gotent(abfd) \
+  (elf_i386_tdata (abfd)->local_tlsdesc_gotent)
+
 static bfd_boolean
 elf_i386_mkobject (bfd *abfd)
 {
@@ -620,6 +661,10 @@ struct elf_i386_link_hash_table
     bfd_vma offset;
   } tls_ldm_got;
 
+  /* The amount of space used by the reserved portion of the sgotplt
+     section, plus whatever space is used by the jump slots.  */
+  bfd_vma sgotplt_jump_table_size;
+
   /* Small local sym to section mapping cache.  */
   struct sym_sec_cache sym_sec;
 };
@@ -629,6 +674,9 @@ struct elf_i386_link_hash_table
 #define elf_i386_hash_table(p) \
   ((struct elf_i386_link_hash_table *) ((p)->hash))
 
+#define elf_i386_compute_jump_table_size(htab) \
+  ((htab)->srelplt->reloc_count * 4)
+
 /* Create an entry in an i386 ELF linker hash table.  */
 
 static struct bfd_hash_entry *
@@ -655,6 +703,7 @@ link_hash_newfunc (struct bfd_hash_entry
       eh = (struct elf_i386_link_hash_entry *) entry;
       eh->dyn_relocs = NULL;
       eh->tls_type = GOT_UNKNOWN;
+      eh->tlsdesc_got = (bfd_vma) -1;
     }
 
   return entry;
@@ -686,6 +735,7 @@ elf_i386_link_hash_table_create (bfd *ab
   ret->sdynbss = NULL;
   ret->srelbss = NULL;
   ret->tls_ldm_got.refcount = 0;
+  ret->sgotplt_jump_table_size = 0;
   ret->sym_sec.abfd = NULL;
   ret->is_vxworks = 0;
   ret->srelplt2 = NULL;
@@ -848,6 +898,8 @@ elf_i386_tls_transition (struct bfd_link
   switch (r_type)
     {
     case R_386_TLS_GD:
+    case R_386_TLS_GOTDESC:
+    case R_386_TLS_DESC_CALL:
     case R_386_TLS_IE_32:
       if (is_local)
  return R_386_TLS_LE_32;
@@ -952,6 +1004,8 @@ elf_i386_check_relocs (bfd *abfd,
 
  case R_386_GOT32:
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
   /* This symbol requires a global offset table entry.  */
   {
     int tls_type, old_tls_type;
@@ -961,6 +1015,9 @@ elf_i386_check_relocs (bfd *abfd,
       default:
       case R_386_GOT32: tls_type = GOT_NORMAL; break;
       case R_386_TLS_GD: tls_type = GOT_TLS_GD; break;
+      case R_386_TLS_GOTDESC:
+      case R_386_TLS_DESC_CALL:
+ tls_type = GOT_TLS_GDESC; break;
       case R_386_TLS_IE_32:
  if (ELF32_R_TYPE (rel->r_info) == r_type)
   tls_type = GOT_TLS_IE_NEG;
@@ -990,13 +1047,16 @@ elf_i386_check_relocs (bfd *abfd,
     bfd_size_type size;
 
     size = symtab_hdr->sh_info;
-    size *= (sizeof (bfd_signed_vma) + sizeof(char));
+    size *= (sizeof (bfd_signed_vma)
+     + sizeof (bfd_vma) + sizeof(char));
     local_got_refcounts = bfd_zalloc (abfd, size);
     if (local_got_refcounts == NULL)
       return FALSE;
     elf_local_got_refcounts (abfd) = local_got_refcounts;
+    elf_i386_local_tlsdesc_gotent (abfd)
+      = (bfd_vma *) (local_got_refcounts + symtab_hdr->sh_info);
     elf_i386_local_got_tls_type (abfd)
-      = (char *) (local_got_refcounts + symtab_hdr->sh_info);
+      = (char *) (local_got_refcounts + 2 * symtab_hdr->sh_info);
   }
  local_got_refcounts[r_symndx] += 1;
  old_tls_type = elf_i386_local_got_tls_type (abfd) [r_symndx];
@@ -1007,11 +1067,14 @@ elf_i386_check_relocs (bfd *abfd,
     /* If a TLS symbol is accessed using IE at least once,
        there is no point to use dynamic model for it.  */
     else if (old_tls_type != tls_type && old_tls_type != GOT_UNKNOWN
-     && (old_tls_type != GOT_TLS_GD
+     && (! GOT_TLS_GD_ANY_P (old_tls_type)
  || (tls_type & GOT_TLS_IE) == 0))
       {
- if ((old_tls_type & GOT_TLS_IE) && tls_type == GOT_TLS_GD)
+ if ((old_tls_type & GOT_TLS_IE) && GOT_TLS_GD_ANY_P (tls_type))
   tls_type = old_tls_type;
+ else if (GOT_TLS_GD_ANY_P (old_tls_type)
+ && GOT_TLS_GD_ANY_P (tls_type))
+  tls_type |= old_tls_type;
  else
   {
     (*_bfd_error_handler)
@@ -1319,6 +1382,8 @@ elf_i386_gc_sweep_hook (bfd *abfd,
   break;
 
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
  case R_386_TLS_IE_32:
  case R_386_TLS_IE:
  case R_386_TLS_GOTIE:
@@ -1582,6 +1647,7 @@ allocate_dynrelocs (struct elf_link_hash
 
   /* We also need to make an entry in the .rel.plt section.  */
   htab->srelplt->size += sizeof (Elf32_External_Rel);
+  htab->srelplt->reloc_count++;
 
   if (htab->is_vxworks && !info->shared)
     {
@@ -1615,6 +1681,9 @@ allocate_dynrelocs (struct elf_link_hash
       h->needs_plt = 0;
     }
 
+  eh = (struct elf_i386_link_hash_entry *) h;
+  eh->tlsdesc_got = (bfd_vma) -1;
+
   /* If R_386_TLS_{IE_32,IE,GOTIE} symbol is now local to the binary,
      make it a R_386_TLS_LE_32 requiring no TLS entry.  */
   if (h->got.refcount > 0
@@ -1638,11 +1707,22 @@ allocate_dynrelocs (struct elf_link_hash
  }
 
       s = htab->sgot;
-      h->got.offset = s->size;
-      s->size += 4;
-      /* R_386_TLS_GD needs 2 consecutive GOT slots.  */
-      if (tls_type == GOT_TLS_GD || tls_type == GOT_TLS_IE_BOTH)
- s->size += 4;
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  eh->tlsdesc_got = htab->sgotplt->size
+    - elf_i386_compute_jump_table_size (htab);
+  htab->sgotplt->size += 8;
+  h->got.offset = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (tls_type)
+  || GOT_TLS_GD_P (tls_type))
+ {
+  h->got.offset = s->size;
+  s->size += 4;
+  /* R_386_TLS_GD needs 2 consecutive GOT slots.  */
+  if (GOT_TLS_GD_P (tls_type) || tls_type == GOT_TLS_IE_BOTH)
+    s->size += 4;
+ }
       dyn = htab->elf.dynamic_sections_created;
       /* R_386_TLS_IE_32 needs one dynamic relocation,
  R_386_TLS_IE resp. R_386_TLS_GOTIE needs one dynamic relocation,
@@ -1651,21 +1731,23 @@ allocate_dynrelocs (struct elf_link_hash
  global.  */
       if (tls_type == GOT_TLS_IE_BOTH)
  htab->srelgot->size += 2 * sizeof (Elf32_External_Rel);
-      else if ((tls_type == GOT_TLS_GD && h->dynindx == -1)
+      else if ((GOT_TLS_GD_P (tls_type) && h->dynindx == -1)
        || (tls_type & GOT_TLS_IE))
  htab->srelgot->size += sizeof (Elf32_External_Rel);
-      else if (tls_type == GOT_TLS_GD)
+      else if (GOT_TLS_GD_P (tls_type))
  htab->srelgot->size += 2 * sizeof (Elf32_External_Rel);
-      else if ((ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
- || h->root.type != bfd_link_hash_undefweak)
+      else if (! GOT_TLS_GDESC_P (tls_type)
+       && (ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
+   || h->root.type != bfd_link_hash_undefweak)
        && (info->shared
    || WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, 0, h)))
  htab->srelgot->size += sizeof (Elf32_External_Rel);
+      if (GOT_TLS_GDESC_P (tls_type))
+ htab->srelplt->size += sizeof (Elf32_External_Rel);
     }
   else
     h->got.offset = (bfd_vma) -1;
 
-  eh = (struct elf_i386_link_hash_entry *) h;
   if (eh->dyn_relocs == NULL)
     return TRUE;
 
@@ -1813,6 +1895,7 @@ elf_i386_size_dynamic_sections (bfd *out
       bfd_signed_vma *local_got;
       bfd_signed_vma *end_local_got;
       char *local_tls_type;
+      bfd_vma *local_tlsdesc_gotent;
       bfd_size_type locsymcount;
       Elf_Internal_Shdr *symtab_hdr;
       asection *srel;
@@ -1855,25 +1938,42 @@ elf_i386_size_dynamic_sections (bfd *out
       locsymcount = symtab_hdr->sh_info;
       end_local_got = local_got + locsymcount;
       local_tls_type = elf_i386_local_got_tls_type (ibfd);
+      local_tlsdesc_gotent = elf_i386_local_tlsdesc_gotent (ibfd);
       s = htab->sgot;
       srel = htab->srelgot;
-      for (; local_got < end_local_got; ++local_got, ++local_tls_type)
+      for (; local_got < end_local_got;
+   ++local_got, ++local_tls_type, ++local_tlsdesc_gotent)
  {
+  *local_tlsdesc_gotent = (bfd_vma) -1;
   if (*local_got > 0)
     {
-      *local_got = s->size;
-      s->size += 4;
-      if (*local_tls_type == GOT_TLS_GD
-  || *local_tls_type == GOT_TLS_IE_BOTH)
- s->size += 4;
+      if (GOT_TLS_GDESC_P (*local_tls_type))
+ {
+  *local_tlsdesc_gotent = htab->sgotplt->size
+    - elf_i386_compute_jump_table_size (htab);
+  htab->sgotplt->size += 8;
+  *local_got = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (*local_tls_type)
+  || GOT_TLS_GD_P (*local_tls_type))
+ {
+  *local_got = s->size;
+  s->size += 4;
+  if (GOT_TLS_GD_P (*local_tls_type)
+      || *local_tls_type == GOT_TLS_IE_BOTH)
+    s->size += 4;
+ }
       if (info->shared
-  || *local_tls_type == GOT_TLS_GD
+  || GOT_TLS_GD_ANY_P (*local_tls_type)
   || (*local_tls_type & GOT_TLS_IE))
  {
   if (*local_tls_type == GOT_TLS_IE_BOTH)
     srel->size += 2 * sizeof (Elf32_External_Rel);
-  else
+  else if (GOT_TLS_GD_P (*local_tls_type)
+   || ! GOT_TLS_GDESC_P (*local_tls_type))
     srel->size += sizeof (Elf32_External_Rel);
+  if (GOT_TLS_GDESC_P (*local_tls_type))
+    htab->srelplt->size += sizeof (Elf32_External_Rel);
  }
     }
   else
@@ -1917,6 +2017,14 @@ elf_i386_size_dynamic_sections (bfd *out
      sym dynamic relocs.  */
   elf_link_hash_traverse (&htab->elf, allocate_dynrelocs, (PTR) info);
 
+  /* For every jump slot reserved in the sgotplt, reloc_count is
+     incremented.  However, when we reserve space for TLS descriptors,
+     it's not incremented, so in order to compute the space reserved
+     for them, it suffices to multiply the reloc count by the jump
+     slot size.  */
+  if (htab->srelplt)
+    htab->sgotplt_jump_table_size = htab->srelplt->reloc_count * 4;
+
   /* We now have determined the sizes of the various dynamic sections.
      Allocate memory for them.  */
   relocs = FALSE;
@@ -1948,7 +2056,8 @@ elf_i386_size_dynamic_sections (bfd *out
 
   /* We use the reloc_count field as a counter if we need
      to copy relocs into the output file.  */
-  s->reloc_count = 0;
+  if (s != htab->srelplt)
+    s->reloc_count = 0;
  }
       else
  {
@@ -2035,6 +2144,41 @@ elf_i386_size_dynamic_sections (bfd *out
   return TRUE;
 }
 
+static bfd_boolean
+elf_i386_always_size_sections (bfd *output_bfd,
+       struct bfd_link_info *info)
+{
+  asection *tls_sec = elf_hash_table (info)->tls_sec;
+
+  if (tls_sec)
+    {
+      struct elf_link_hash_entry *tlsbase;
+
+      tlsbase = elf_link_hash_lookup (elf_hash_table (info),
+      "_TLS_MODULE_BASE_",
+      FALSE, FALSE, FALSE);
+
+      if (tlsbase && tlsbase->type == STT_TLS)
+ {
+  struct bfd_link_hash_entry *bh = NULL;
+  const struct elf_backend_data *bed
+    = get_elf_backend_data (output_bfd);
+
+  if (!(_bfd_generic_link_add_one_symbol
+ (info, output_bfd, "_TLS_MODULE_BASE_", BSF_LOCAL,
+ tls_sec, 0, NULL, FALSE,
+ bed->collect, &bh)))
+    return FALSE;
+  tlsbase = (struct elf_link_hash_entry *)bh;
+  tlsbase->def_regular = 1;
+  tlsbase->other = STV_HIDDEN;
+  (*bed->elf_backend_hide_symbol) (info, tlsbase, TRUE);
+ }
+    }
+
+  return TRUE;
+}
+
 /* Set the correct type for an x86 ELF section.  We do this by the
    section name, which is a hack, but ought to work.  */
 
@@ -2112,6 +2256,7 @@ elf_i386_relocate_section (bfd *output_b
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   bfd_vma *local_got_offsets;
+  bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
   Elf_Internal_Rela *relend;
 
@@ -2119,6 +2264,7 @@ elf_i386_relocate_section (bfd *output_b
   symtab_hdr = &elf_tdata (input_bfd)->symtab_hdr;
   sym_hashes = elf_sym_hashes (input_bfd);
   local_got_offsets = elf_local_got_offsets (input_bfd);
+  local_tlsdesc_gotents = elf_i386_local_tlsdesc_gotent (input_bfd);
 
   rel = relocs;
   relend = relocs + input_section->reloc_count;
@@ -2130,7 +2276,7 @@ elf_i386_relocate_section (bfd *output_b
       struct elf_link_hash_entry *h;
       Elf_Internal_Sym *sym;
       asection *sec;
-      bfd_vma off;
+      bfd_vma off, offplt;
       bfd_vma relocation;
       bfd_boolean unresolved_reloc;
       bfd_reloc_status_type r;
@@ -2552,6 +2698,8 @@ elf_i386_relocate_section (bfd *output_b
   /* Fall through */
 
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
  case R_386_TLS_IE_32:
  case R_386_TLS_GOTIE:
   r_type = elf_i386_tls_transition (info, r_type, h == NULL);
@@ -2566,7 +2714,9 @@ elf_i386_relocate_section (bfd *output_b
     }
   if (tls_type == GOT_TLS_IE)
     tls_type = GOT_TLS_IE_NEG;
-  if (r_type == R_386_TLS_GD)
+  if (r_type == R_386_TLS_GD
+      || r_type == R_386_TLS_GOTDESC
+      || r_type == R_386_TLS_DESC_CALL)
     {
       if (tls_type == GOT_TLS_IE_POS)
  r_type = R_386_TLS_GOTIE;
@@ -2640,6 +2790,67 @@ elf_i386_relocate_section (bfd *output_b
   rel++;
   continue;
  }
+      else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
+ {
+  /* GDesc -> LE transition.
+     It's originally something like:
+     leal x@tlsdesc(%ebx), %eax
+
+     aoliva FIXME.  Decide whether to change it to:
+     gs: .byte 0x65 ; movl $x@ntpoff, %eax
+     or
+     leal x@ntpoff, %eax
+    
+     Registers other than %eax may be set up here.  */
+  
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a leal adding ebx to a
+     32-bit offset into any register, although it's
+     probably almost always going to be eax.  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff >= 2);
+  type = bfd_get_8 (input_bfd, contents + roff - 2);
+  BFD_ASSERT (type == 0x8d);
+  val = bfd_get_8 (input_bfd, contents + roff - 1);
+  BFD_ASSERT ((val & 0xc7) == 0x83);
+  BFD_ASSERT (roff + 4 <= input_section->size);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x65, contents + roff - 2);
+  /* aoliva FIXME: remove the above and xor the byte
+     below with 0x86.  */
+  bfd_put_8 (output_bfd, 0xb8 | ((val >> 3) & 7),
+     contents + roff - 1);
+  bfd_put_32 (output_bfd, -tpoff (info, relocation),
+      contents + roff);
+  continue;
+ }
+      else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
+ {
+  /* GDesc -> LE transition.
+     It's originally:
+     call *(%eax)
+     Turn it into:
+     movl %eax, %eax  */
+  
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a call *(%eax).  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff + 2 <= input_section->size);
+  type = bfd_get_8 (input_bfd, contents + roff);
+  BFD_ASSERT (type == 0xff);
+  val = bfd_get_8 (input_bfd, contents + roff + 1);
+  BFD_ASSERT (val == 0x10);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x89, contents + roff);
+  bfd_put_8 (output_bfd, 0xc0, contents + roff + 1);
+  continue;
+ }
       else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_IE)
  {
   unsigned int val, type;
@@ -2754,13 +2965,17 @@ elf_i386_relocate_section (bfd *output_b
     abort ();
 
   if (h != NULL)
-    off = h->got.offset;
+    {
+      off = h->got.offset;
+      offplt = elf_i386_hash_entry (h)->tlsdesc_got;
+    }
   else
     {
       if (local_got_offsets == NULL)
  abort ();
 
       off = local_got_offsets[r_symndx];
+      offplt = local_tlsdesc_gotents[r_symndx];
     }
 
   if ((off & 1) != 0)
@@ -2770,35 +2985,77 @@ elf_i386_relocate_section (bfd *output_b
       Elf_Internal_Rela outrel;
       bfd_byte *loc;
       int dr_type, indx;
+      asection *sreloc;
 
       if (htab->srelgot == NULL)
  abort ();
 
+      indx = h && h->dynindx != -1 ? h->dynindx : 0;
+
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  outrel.r_info = ELF32_R_INFO (indx, R_386_TLS_DESC);
+  BFD_ASSERT (htab->sgotplt_jump_table_size + offplt + 8
+      <= htab->sgotplt->size);
+  outrel.r_offset = (htab->sgotplt->output_section->vma
+     + htab->sgotplt->output_offset
+     + offplt
+     + htab->sgotplt_jump_table_size);
+  sreloc = htab->srelplt;
+  loc = sreloc->contents;
+  loc += sreloc->reloc_count++
+    * sizeof (Elf32_External_Rel);
+  BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+      <= sreloc->contents + sreloc->size);
+  bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
+  if (indx == 0)
+    {
+      BFD_ASSERT (! unresolved_reloc);
+      bfd_put_32 (output_bfd,
+  relocation - dtpoff_base (info),
+  htab->sgotplt->contents + offplt
+  + htab->sgotplt_jump_table_size + 4);
+    }
+  else
+    {
+      bfd_put_32 (output_bfd, 0,
+  htab->sgotplt->contents + offplt
+  + htab->sgotplt_jump_table_size + 4);
+    }
+ }
+
+      sreloc = htab->srelgot;
+
       outrel.r_offset = (htab->sgot->output_section->vma
  + htab->sgot->output_offset + off);
 
-      indx = h && h->dynindx != -1 ? h->dynindx : 0;
-      if (r_type == R_386_TLS_GD)
+      if (GOT_TLS_GD_P (tls_type))
  dr_type = R_386_TLS_DTPMOD32;
+      else if (GOT_TLS_GDESC_P (tls_type))
+ goto dr_done;
       else if (tls_type == GOT_TLS_IE_POS)
  dr_type = R_386_TLS_TPOFF;
       else
  dr_type = R_386_TLS_TPOFF32;
+
       if (dr_type == R_386_TLS_TPOFF && indx == 0)
  bfd_put_32 (output_bfd, relocation - dtpoff_base (info),
     htab->sgot->contents + off);
       else if (dr_type == R_386_TLS_TPOFF32 && indx == 0)
  bfd_put_32 (output_bfd, dtpoff_base (info) - relocation,
     htab->sgot->contents + off);
-      else
+      else if (dr_type != R_386_TLS_DESC)
  bfd_put_32 (output_bfd, 0,
     htab->sgot->contents + off);
       outrel.r_info = ELF32_R_INFO (indx, dr_type);
-      loc = htab->srelgot->contents;
-      loc += htab->srelgot->reloc_count++ * sizeof (Elf32_External_Rel);
+
+      loc = sreloc->contents;
+      loc += sreloc->reloc_count++ * sizeof (Elf32_External_Rel);
+      BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+  <= sreloc->contents + sreloc->size);
       bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
 
-      if (r_type == R_386_TLS_GD)
+      if (GOT_TLS_GD_P (tls_type))
  {
   if (indx == 0)
     {
@@ -2814,8 +3071,10 @@ elf_i386_relocate_section (bfd *output_b
       outrel.r_info = ELF32_R_INFO (indx,
     R_386_TLS_DTPOFF32);
       outrel.r_offset += 4;
-      htab->srelgot->reloc_count++;
+      sreloc->reloc_count++;
       loc += sizeof (Elf32_External_Rel);
+      BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+  <= sreloc->contents + sreloc->size);
       bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
     }
  }
@@ -2826,25 +3085,33 @@ elf_i386_relocate_section (bfd *output_b
       htab->sgot->contents + off + 4);
   outrel.r_info = ELF32_R_INFO (indx, R_386_TLS_TPOFF);
   outrel.r_offset += 4;
-  htab->srelgot->reloc_count++;
+  sreloc->reloc_count++;
   loc += sizeof (Elf32_External_Rel);
   bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
  }
 
+    dr_done:
       if (h != NULL)
  h->got.offset |= 1;
       else
  local_got_offsets[r_symndx] |= 1;
     }
 
-  if (off >= (bfd_vma) -2)
+  if (off >= (bfd_vma) -2
+      && ! GOT_TLS_GDESC_P (tls_type))
     abort ();
-  if (r_type == ELF32_R_TYPE (rel->r_info))
+  if (r_type == R_386_TLS_GOTDESC
+      || r_type == R_386_TLS_DESC_CALL)
+    {
+      relocation = htab->sgotplt_jump_table_size + offplt;
+      unresolved_reloc = FALSE;
+    }
+  else if (r_type == ELF32_R_TYPE (rel->r_info))
     {
       bfd_vma g_o_t = htab->sgotplt->output_section->vma
       + htab->sgotplt->output_offset;
       relocation = htab->sgot->output_section->vma
-   + htab->sgot->output_offset + off - g_o_t;
+ + htab->sgot->output_offset + off - g_o_t;
       if ((r_type == R_386_TLS_IE || r_type == R_386_TLS_GOTIE)
   && tls_type == GOT_TLS_IE_BOTH)
  relocation += 4;
@@ -2852,7 +3119,7 @@ elf_i386_relocate_section (bfd *output_b
  relocation += g_o_t;
       unresolved_reloc = FALSE;
     }
-  else
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GD)
     {
       unsigned int val, type;
       bfd_vma roff;
@@ -2916,6 +3183,97 @@ elf_i386_relocate_section (bfd *output_b
       rel++;
       continue;
     }
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
+    {
+      /* GDesc -> IE transition.
+ It's originally something like:
+ leal x@tlsdesc(%ebx), %eax
+
+ aoliva FIXME: decide whether to change it to:
+ movl x@gotntpoff(%ebx), %eax # before movl %eax,%eax
+ or
+ leal x@gotntpoff(%ebx), %eax # before movl (%eax),%eax
+ but the latter won't work if we need to negate the
+ loaded value.
+    
+ Registers other than %eax may be set up here.  */
+  
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a leal adding ebx to a 32-bit
+ offset into any register, although it's probably
+ almost always going to be eax.  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff >= 2);
+      type = bfd_get_8 (input_bfd, contents + roff - 2);
+      BFD_ASSERT (type == 0x8d);
+      val = bfd_get_8 (input_bfd, contents + roff - 1);
+      BFD_ASSERT ((val & 0xc7) == 0x83);
+      BFD_ASSERT (roff + 4 <= input_section->size);
+
+      /* Now modify the instruction as appropriate.  */
+      /* To turn a leal into a movl in the form we use it, it
+ suffices to change the first byte from 0x8d to 0x8b.
+ aoliva FIXME: should we decide to keep the leal, all
+ we have to do is remove the statement below, and
+ adjust the relaxation of R_386_TLS_DESC_CALL.  */
+      bfd_put_8 (output_bfd, 0x8b, contents + roff - 2);
+
+      if (tls_type == GOT_TLS_IE_BOTH)
+ off += 4;
+
+      bfd_put_32 (output_bfd,
+  htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off
+  - htab->sgotplt->output_section->vma
+  - htab->sgotplt->output_offset,
+  contents + roff);
+      continue;
+    }
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
+    {
+      /* GDesc -> IE transition.
+ It's originally:
+ calll *(%eax)
+
+ aoliva FIXME: decide whether to change it to:
+ movl %eax,%eax # after movl x@gotntpoff(%ebx), %eax
+ or
+ movl (%eax),%eax # after leal x@gotntpoff(%ebx), %eax
+
+         Either one works unless we have to negate the
+         offset.  */
+  
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a call *(%eax).  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff + 2 <= input_section->size);
+      type = bfd_get_8 (input_bfd, contents + roff);
+      BFD_ASSERT (type == 0xff);
+      val = bfd_get_8 (input_bfd, contents + roff + 1);
+      BFD_ASSERT (val == 0x10);
+
+      /* Now modify the instruction as appropriate.  */
+      if (tls_type != GOT_TLS_IE_NEG)
+ {
+  /* movl %eax,%eax */
+  bfd_put_8 (output_bfd, 0x89, contents + roff);
+  bfd_put_8 (output_bfd, 0xc0, contents + roff + 1);
+ }
+      else
+ {
+  /* negl %eax */
+  bfd_put_8 (output_bfd, 0xf7, contents + roff);
+  bfd_put_8 (output_bfd, 0xd8, contents + roff + 1);
+ }
+
+      continue;
+    }
+  else
+    BFD_ASSERT (FALSE);
   break;
 
  case R_386_TLS_LDM:
@@ -3223,7 +3581,7 @@ elf_i386_finish_dynamic_symbol (bfd *out
     }
 
   if (h->got.offset != (bfd_vma) -1
-      && elf_i386_hash_entry(h)->tls_type != GOT_TLS_GD
+      && ! GOT_TLS_GD_ANY_P (elf_i386_hash_entry(h)->tls_type)
       && (elf_i386_hash_entry(h)->tls_type & GOT_TLS_IE) == 0)
     {
       Elf_Internal_Rela rel;
@@ -3558,6 +3916,7 @@ elf_i386_plt_sym_val (bfd_vma i, const a
 #define elf_backend_reloc_type_class      elf_i386_reloc_type_class
 #define elf_backend_relocate_section      elf_i386_relocate_section
 #define elf_backend_size_dynamic_sections     elf_i386_size_dynamic_sections
+#define elf_backend_always_size_sections      elf_i386_always_size_sections
 #define elf_backend_plt_sym_val      elf_i386_plt_sym_val
 
 #include "elf32-target.h"
Index: bfd/elf64-x86-64.c
===================================================================
RCS file: /cvs/uberbaum/./bfd/elf64-x86-64.c,v
retrieving revision 1.107
diff -u -p -r1.107 elf64-x86-64.c
--- bfd/elf64-x86-64.c 31 Aug 2005 23:45:46 -0000 1.107
+++ bfd/elf64-x86-64.c 21 Sep 2005 06:12:12 -0000
@@ -112,12 +112,31 @@ static reloc_howto_type x86_64_elf_howto
   HOWTO(R_X86_64_GOTPC32, 0, 2, 32, TRUE, 0, complain_overflow_signed,
  bfd_elf_generic_reloc, "R_X86_64_GOTPC32",
  FALSE, 0xffffffff, 0xffffffff, TRUE),
+  EMPTY_HOWTO (27),
+  EMPTY_HOWTO (28),
+  EMPTY_HOWTO (29),
+  EMPTY_HOWTO (30),
+  EMPTY_HOWTO (31),
+  EMPTY_HOWTO (32),
+  EMPTY_HOWTO (33),
+  HOWTO(R_X86_64_GOTPC32_TLSDESC, 0, 2, 32, TRUE, 0,
+ complain_overflow_bitfield, bfd_elf_generic_reloc,
+ "R_X86_64_GOTPC32_TLSDESC",
+ FALSE, 0xffffffff, 0xffffffff, TRUE),
+  HOWTO(R_X86_64_TLSDESC_CALL, 0, 0, 0, FALSE, 0,
+ complain_overflow_dont, bfd_elf_generic_reloc,
+ "R_X86_64_TLSDESC_CALL",
+ FALSE, 0, 0, FALSE),
+  HOWTO(R_X86_64_TLSDESC, 0, 4, 64, FALSE, 0,
+ complain_overflow_bitfield, bfd_elf_generic_reloc,
+ "R_X86_64_TLSDESC",
+ FALSE, MINUS_ONE, MINUS_ONE, FALSE),
 
   /* We have a gap in the reloc numbers here.
      R_X86_64_standard counts the number up to this point, and
      R_X86_64_vt_offset is the value to subtract from a reloc type of
      R_X86_64_GNU_VT* to form an index into this table.  */
-#define R_X86_64_standard (R_X86_64_GOTPC32 + 1)
+#define R_X86_64_standard (R_X86_64_TLSDESC + 1)
 #define R_X86_64_vt_offset (R_X86_64_GNU_VTINHERIT - R_X86_64_standard)
 
 /* GNU extension to record C++ vtable hierarchy.  */
@@ -166,14 +185,38 @@ static const struct elf_reloc_map x86_64
   { BFD_RELOC_64_PCREL, R_X86_64_PC64, },
   { BFD_RELOC_X86_64_GOTOFF64, R_X86_64_GOTOFF64, },
   { BFD_RELOC_X86_64_GOTPC32, R_X86_64_GOTPC32, },
+  { BFD_RELOC_X86_64_GOTPC32_TLSDESC, R_X86_64_GOTPC32_TLSDESC, },
+  { BFD_RELOC_X86_64_TLSDESC_CALL, R_X86_64_TLSDESC_CALL, },
+  { BFD_RELOC_X86_64_TLSDESC, R_X86_64_TLSDESC, },
   { BFD_RELOC_VTABLE_INHERIT, R_X86_64_GNU_VTINHERIT, },
   { BFD_RELOC_VTABLE_ENTRY, R_X86_64_GNU_VTENTRY, },
 };
 
+static reloc_howto_type *
+elf64_x86_64_rtype_to_howto (bfd *abfd, unsigned r_type)
+{
+  unsigned i;
+
+  if (r_type < (unsigned int) R_X86_64_GNU_VTINHERIT
+      || r_type >= (unsigned int) R_X86_64_max)
+    {
+      if (r_type >= (unsigned int) R_X86_64_standard)
+ {
+  (*_bfd_error_handler) (_("%B: invalid relocation type %d"),
+ abfd, (int) r_type);
+  r_type = R_X86_64_NONE;
+ }
+      i = r_type;
+    }
+  else
+    i = r_type - (unsigned int) R_X86_64_vt_offset;
+  BFD_ASSERT (x86_64_elf_howto_table[i].type == r_type);
+  return &x86_64_elf_howto_table[i];
+}
 
 /* Given a BFD reloc type, return a HOWTO structure.  */
 static reloc_howto_type *
-elf64_x86_64_reloc_type_lookup (bfd *abfd ATTRIBUTE_UNUSED,
+elf64_x86_64_reloc_type_lookup (bfd *abfd,
  bfd_reloc_code_real_type code)
 {
   unsigned int i;
@@ -182,7 +225,8 @@ elf64_x86_64_reloc_type_lookup (bfd *abf
        i++)
     {
       if (x86_64_reloc_map[i].bfd_reloc_val == code)
- return &x86_64_elf_howto_table[i];
+ return elf64_x86_64_rtype_to_howto (abfd,
+    x86_64_reloc_map[i].elf_reloc_val);
     }
   return 0;
 }
@@ -193,23 +237,10 @@ static void
 elf64_x86_64_info_to_howto (bfd *abfd ATTRIBUTE_UNUSED, arelent *cache_ptr,
     Elf_Internal_Rela *dst)
 {
-  unsigned r_type, i;
+  unsigned r_type;
 
   r_type = ELF64_R_TYPE (dst->r_info);
-  if (r_type < (unsigned int) R_X86_64_GNU_VTINHERIT
-      || r_type >= (unsigned int) R_X86_64_max)
-    {
-      if (r_type >= (unsigned int) R_X86_64_standard)
- {
-  (*_bfd_error_handler) (_("%B: invalid relocation type %d"),
- abfd, (int) r_type);
-  r_type = R_X86_64_NONE;
- }
-      i = r_type;
-    }
-  else
-    i = r_type - (unsigned int) R_X86_64_vt_offset;
-  cache_ptr->howto = &x86_64_elf_howto_table[i];
+  cache_ptr->howto = elf64_x86_64_rtype_to_howto (abfd, r_type);
   BFD_ASSERT (r_type == cache_ptr->howto->type);
 }
 
@@ -353,7 +384,20 @@ struct elf64_x86_64_link_hash_entry
 #define GOT_NORMAL 1
 #define GOT_TLS_GD 2
 #define GOT_TLS_IE 3
+#define GOT_TLS_GDESC 4
+#define GOT_TLS_GD_BOTH_P(type) \
+  ((type) == (GOT_TLS_GD | GOT_TLS_GDESC))
+#define GOT_TLS_GD_P(type) \
+  ((type) == GOT_TLS_GD || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GDESC_P(type) \
+  ((type) == GOT_TLS_GDESC || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GD_ANY_P(type) \
+  (GOT_TLS_GD_P (type) || GOT_TLS_GDESC_P (type))
   unsigned char tls_type;
+
+  /* Offset of the GOTPLT entry reserved for the TLS descriptor,
+     starting at the end of the jump table.  */
+  bfd_vma tlsdesc_got;
 };
 
 #define elf64_x86_64_hash_entry(ent) \
@@ -365,6 +409,9 @@ struct elf64_x86_64_obj_tdata
 
   /* tls_type for each local got entry.  */
   char *local_got_tls_type;
+
+  /* GOTPLT entries for TLS descriptors.  */
+  bfd_vma *local_tlsdesc_gotent;
 };
 
 #define elf64_x86_64_tdata(abfd) \
@@ -373,6 +420,8 @@ struct elf64_x86_64_obj_tdata
 #define elf64_x86_64_local_got_tls_type(abfd) \
   (elf64_x86_64_tdata (abfd)->local_got_tls_type)
 
+#define elf64_x86_64_local_tlsdesc_gotent(abfd) \
+  (elf64_x86_64_tdata (abfd)->local_tlsdesc_gotent)
 
 /* x86-64 ELF linker hash table.  */
 
@@ -389,11 +438,23 @@ struct elf64_x86_64_link_hash_table
   asection *sdynbss;
   asection *srelbss;
 
+  /* The offset into splt of the PLT entry for the TLS descriptor
+     resolver.  Special values are 0, if not necessary (or not found
+     to be necessary yet), and -1 if needed but not determined
+     yet.  */
+  bfd_vma tlsdesc_plt;
+  /* The offset into sgot of the GOT entry used by the PLT entry
+     above.  */
+  bfd_vma tlsdesc_got;
+
   union {
     bfd_signed_vma refcount;
     bfd_vma offset;
   } tls_ld_got;
 
+  /* The amount of space used by the jump slots in the GOT.  */
+  bfd_vma sgotplt_jump_table_size;
+
   /* Small local sym to section mapping cache.  */
   struct sym_sec_cache sym_sec;
 };
@@ -403,6 +464,9 @@ struct elf64_x86_64_link_hash_table
 #define elf64_x86_64_hash_table(p) \
   ((struct elf64_x86_64_link_hash_table *) ((p)->hash))
 
+#define elf64_x86_64_compute_jump_table_size(htab) \
+  ((htab)->srelplt->reloc_count * GOT_ENTRY_SIZE)
+
 /* Create an entry in an x86-64 ELF linker hash table. */
 
 static struct bfd_hash_entry *
@@ -428,6 +492,7 @@ link_hash_newfunc (struct bfd_hash_entry
       eh = (struct elf64_x86_64_link_hash_entry *) entry;
       eh->dyn_relocs = NULL;
       eh->tls_type = GOT_UNKNOWN;
+      eh->tlsdesc_got = (bfd_vma) -1;
     }
 
   return entry;
@@ -459,7 +524,10 @@ elf64_x86_64_link_hash_table_create (bfd
   ret->sdynbss = NULL;
   ret->srelbss = NULL;
   ret->sym_sec.abfd = NULL;
+  ret->tlsdesc_plt = 0;
+  ret->tlsdesc_got = 0;
   ret->tls_ld_got.refcount = 0;
+  ret->sgotplt_jump_table_size = 0;
 
   return &ret->elf.root;
 }
@@ -619,6 +687,8 @@ elf64_x86_64_tls_transition (struct bfd_
   switch (r_type)
     {
     case R_X86_64_TLSGD:
+    case R_X86_64_GOTPC32_TLSDESC:
+    case R_X86_64_TLSDESC_CALL:
     case R_X86_64_GOTTPOFF:
       if (is_local)
  return R_X86_64_TPOFF32;
@@ -709,6 +779,8 @@ elf64_x86_64_check_relocs (bfd *abfd, st
  case R_X86_64_GOT32:
  case R_X86_64_GOTPCREL:
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
   /* This symbol requires a global offset table entry. */
   {
     int tls_type, old_tls_type;
@@ -718,6 +790,9 @@ elf64_x86_64_check_relocs (bfd *abfd, st
       default: tls_type = GOT_NORMAL; break;
       case R_X86_64_TLSGD: tls_type = GOT_TLS_GD; break;
       case R_X86_64_GOTTPOFF: tls_type = GOT_TLS_IE; break;
+      case R_X86_64_GOTPC32_TLSDESC:
+      case R_X86_64_TLSDESC_CALL:
+ tls_type = GOT_TLS_GDESC; break;
       }
 
     if (h != NULL)
@@ -736,14 +811,17 @@ elf64_x86_64_check_relocs (bfd *abfd, st
     bfd_size_type size;
 
     size = symtab_hdr->sh_info;
-    size *= sizeof (bfd_signed_vma) + sizeof (char);
+    size *= sizeof (bfd_signed_vma)
+      + sizeof (bfd_vma) + sizeof (char);
     local_got_refcounts = ((bfd_signed_vma *)
    bfd_zalloc (abfd, size));
     if (local_got_refcounts == NULL)
       return FALSE;
     elf_local_got_refcounts (abfd) = local_got_refcounts;
+    elf64_x86_64_local_tlsdesc_gotent (abfd)
+      = (bfd_vma *) (local_got_refcounts + symtab_hdr->sh_info);
     elf64_x86_64_local_got_tls_type (abfd)
-      = (char *) (local_got_refcounts + symtab_hdr->sh_info);
+      = (char *) (local_got_refcounts + 2 * symtab_hdr->sh_info);
   }
  local_got_refcounts[r_symndx] += 1;
  old_tls_type
@@ -753,10 +831,14 @@ elf64_x86_64_check_relocs (bfd *abfd, st
     /* If a TLS symbol is accessed using IE at least once,
        there is no point to use dynamic model for it.  */
     if (old_tls_type != tls_type && old_tls_type != GOT_UNKNOWN
- && (old_tls_type != GOT_TLS_GD || tls_type != GOT_TLS_IE))
+ && (! GOT_TLS_GD_ANY_P (old_tls_type)
+    || tls_type != GOT_TLS_IE))
       {
- if (old_tls_type == GOT_TLS_IE && tls_type == GOT_TLS_GD)
+ if (old_tls_type == GOT_TLS_IE && GOT_TLS_GD_ANY_P (tls_type))
   tls_type = old_tls_type;
+ else if (GOT_TLS_GD_ANY_P (old_tls_type)
+ && GOT_TLS_GD_ANY_P (tls_type))
+  tls_type |= old_tls_type;
  else
   {
     (*_bfd_error_handler)
@@ -1104,6 +1186,8 @@ elf64_x86_64_gc_sweep_hook (bfd *abfd, s
   break;
 
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
  case R_X86_64_GOTTPOFF:
  case R_X86_64_GOT32:
  case R_X86_64_GOTPCREL:
@@ -1371,6 +1455,7 @@ allocate_dynrelocs (struct elf_link_hash
 
   /* We also need to make an entry in the .rela.plt section.  */
   htab->srelplt->size += sizeof (Elf64_External_Rela);
+  htab->srelplt->reloc_count++;
  }
       else
  {
@@ -1384,6 +1469,9 @@ allocate_dynrelocs (struct elf_link_hash
       h->needs_plt = 0;
     }
 
+  eh = (struct elf64_x86_64_link_hash_entry *) h;
+  eh->tlsdesc_got = (bfd_vma) -1;
+  
   /* If R_X86_64_GOTTPOFF symbol is now local to the binary,
      make it a R_X86_64_TPOFF32 requiring no GOT entry.  */
   if (h->got.refcount > 0
@@ -1406,31 +1494,46 @@ allocate_dynrelocs (struct elf_link_hash
     return FALSE;
  }
 
-      s = htab->sgot;
-      h->got.offset = s->size;
-      s->size += GOT_ENTRY_SIZE;
-      /* R_X86_64_TLSGD needs 2 consecutive GOT slots.  */
-      if (tls_type == GOT_TLS_GD)
- s->size += GOT_ENTRY_SIZE;
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  eh->tlsdesc_got = htab->sgotplt->size
+    - elf64_x86_64_compute_jump_table_size (htab);
+  htab->sgotplt->size += 2 * GOT_ENTRY_SIZE;
+  h->got.offset = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (tls_type)
+  || GOT_TLS_GD_P (tls_type))
+ {
+  s = htab->sgot;
+  h->got.offset = s->size;
+  s->size += GOT_ENTRY_SIZE;
+  if (GOT_TLS_GD_P (tls_type))
+    s->size += GOT_ENTRY_SIZE;
+ }
       dyn = htab->elf.dynamic_sections_created;
       /* R_X86_64_TLSGD needs one dynamic relocation if local symbol
  and two if global.
  R_X86_64_GOTTPOFF needs one dynamic relocation.  */
-      if ((tls_type == GOT_TLS_GD && h->dynindx == -1)
+      if ((GOT_TLS_GD_P (tls_type) && h->dynindx == -1)
   || tls_type == GOT_TLS_IE)
  htab->srelgot->size += sizeof (Elf64_External_Rela);
-      else if (tls_type == GOT_TLS_GD)
+      else if (GOT_TLS_GD_P (tls_type))
  htab->srelgot->size += 2 * sizeof (Elf64_External_Rela);
-      else if ((ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
- || h->root.type != bfd_link_hash_undefweak)
+      else if (! GOT_TLS_GDESC_P (tls_type)
+       && (ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
+   || h->root.type != bfd_link_hash_undefweak)
        && (info->shared
    || WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, 0, h)))
  htab->srelgot->size += sizeof (Elf64_External_Rela);
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  htab->srelplt->size += sizeof (Elf64_External_Rela);
+  htab->tlsdesc_plt = (bfd_vma) -1;
+ }
     }
   else
     h->got.offset = (bfd_vma) -1;
 
-  eh = (struct elf64_x86_64_link_hash_entry *) h;
   if (eh->dyn_relocs == NULL)
     return TRUE;
 
@@ -1578,6 +1681,7 @@ elf64_x86_64_size_dynamic_sections (bfd
       bfd_signed_vma *local_got;
       bfd_signed_vma *end_local_got;
       char *local_tls_type;
+      bfd_vma *local_tlsdesc_gotent;
       bfd_size_type locsymcount;
       Elf_Internal_Shdr *symtab_hdr;
       asection *srel;
@@ -1621,20 +1725,43 @@ elf64_x86_64_size_dynamic_sections (bfd
       locsymcount = symtab_hdr->sh_info;
       end_local_got = local_got + locsymcount;
       local_tls_type = elf64_x86_64_local_got_tls_type (ibfd);
+      local_tlsdesc_gotent = elf64_x86_64_local_tlsdesc_gotent (ibfd);
       s = htab->sgot;
       srel = htab->srelgot;
-      for (; local_got < end_local_got; ++local_got, ++local_tls_type)
+      for (; local_got < end_local_got;
+   ++local_got, ++local_tls_type, ++local_tlsdesc_gotent)
  {
+  *local_tlsdesc_gotent = (bfd_vma) -1;
   if (*local_got > 0)
     {
-      *local_got = s->size;
-      s->size += GOT_ENTRY_SIZE;
-      if (*local_tls_type == GOT_TLS_GD)
- s->size += GOT_ENTRY_SIZE;
+      if (GOT_TLS_GDESC_P (*local_tls_type))
+ {
+  *local_tlsdesc_gotent = htab->sgotplt->size
+    - elf64_x86_64_compute_jump_table_size (htab);
+  htab->sgotplt->size += 2 * GOT_ENTRY_SIZE;
+  *local_got = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (*local_tls_type)
+  || GOT_TLS_GD_P (*local_tls_type))
+ {
+  *local_got = s->size;
+  s->size += GOT_ENTRY_SIZE;
+  if (GOT_TLS_GD_P (*local_tls_type))
+    s->size += GOT_ENTRY_SIZE;
+ }
       if (info->shared
-  || *local_tls_type == GOT_TLS_GD
+  || GOT_TLS_GD_ANY_P (*local_tls_type)
   || *local_tls_type == GOT_TLS_IE)
- srel->size += sizeof (Elf64_External_Rela);
+ {
+  if (GOT_TLS_GDESC_P (*local_tls_type))
+    {
+      htab->srelplt->size += sizeof (Elf64_External_Rela);
+      htab->tlsdesc_plt = (bfd_vma) -1;
+    }
+  if (! GOT_TLS_GDESC_P (*local_tls_type)
+      || GOT_TLS_GD_P (*local_tls_type))
+    srel->size += sizeof (Elf64_External_Rela);
+ }
     }
   else
     *local_got = (bfd_vma) -1;
@@ -1656,6 +1783,34 @@ elf64_x86_64_size_dynamic_sections (bfd
      sym dynamic relocs.  */
   elf_link_hash_traverse (&htab->elf, allocate_dynrelocs, (PTR) info);
 
+  /* For every jump slot reserved in the sgotplt, reloc_count is
+     incremented.  However, when we reserve space for TLS descriptors,
+     it's not incremented, so in order to compute the space reserved
+     for them, it suffices to multiply the reloc count by the jump
+     slot size.  */
+  if (htab->srelplt)
+    htab->sgotplt_jump_table_size
+      = elf64_x86_64_compute_jump_table_size (htab);
+
+  if (htab->tlsdesc_plt)
+    {
+      /* If we're not using lazy TLS relocations, don't generate the
+ PLT and GOT entries they require.  */
+      if ((info->flags & DF_BIND_NOW))
+ htab->tlsdesc_plt = 0;
+      else
+ {
+  htab->tlsdesc_got = htab->sgot->size;
+  htab->sgot->size += GOT_ENTRY_SIZE;
+  /* Reserve room for the initial entry.
+     FIXME: we could probably do away with it in this case.  */
+  if (htab->splt->size == 0)
+    htab->splt->size += PLT_ENTRY_SIZE;
+  htab->tlsdesc_plt = htab->splt->size;
+  htab->splt->size += PLT_ENTRY_SIZE;
+ }
+    }
+
   /* We now have determined the sizes of the various dynamic sections.
      Allocate memory for them.  */
   relocs = FALSE;
@@ -1679,7 +1834,8 @@ elf64_x86_64_size_dynamic_sections (bfd
 
   /* We use the reloc_count field as a counter if we need
      to copy relocs into the output file.  */
-  s->reloc_count = 0;
+  if (s != htab->srelplt)
+    s->reloc_count = 0;
  }
       else
  {
@@ -1739,6 +1895,11 @@ elf64_x86_64_size_dynamic_sections (bfd
       || !add_dynamic_entry (DT_PLTREL, DT_RELA)
       || !add_dynamic_entry (DT_JMPREL, 0))
     return FALSE;
+
+  if (htab->tlsdesc_plt
+      && (!add_dynamic_entry (DT_TLSDESC_PLT, 0)
+  || !add_dynamic_entry (DT_TLSDESC_GOT, 0)))
+    return FALSE;
  }
 
       if (relocs)
@@ -1766,6 +1927,41 @@ elf64_x86_64_size_dynamic_sections (bfd
   return TRUE;
 }
 
+static bfd_boolean
+elf64_x86_64_always_size_sections (bfd *output_bfd,
+   struct bfd_link_info *info)
+{
+  asection *tls_sec = elf_hash_table (info)->tls_sec;
+
+  if (tls_sec)
+    {
+      struct elf_link_hash_entry *tlsbase;
+
+      tlsbase = elf_link_hash_lookup (elf_hash_table (info),
+      "_TLS_MODULE_BASE_",
+      FALSE, FALSE, FALSE);
+
+      if (tlsbase && tlsbase->type == STT_TLS)
+ {
+  struct bfd_link_hash_entry *bh = NULL;
+  const struct elf_backend_data *bed
+    = get_elf_backend_data (output_bfd);
+
+  if (!(_bfd_generic_link_add_one_symbol
+ (info, output_bfd, "_TLS_MODULE_BASE_", BSF_LOCAL,
+ tls_sec, 0, NULL, FALSE,
+ bed->collect, &bh)))
+    return FALSE;
+  tlsbase = (struct elf_link_hash_entry *)bh;
+  tlsbase->def_regular = 1;
+  tlsbase->other = STV_HIDDEN;
+  (*bed->elf_backend_hide_symbol) (info, tlsbase, TRUE);
+ }
+    }
+
+  return TRUE;
+}
+
 /* Return the base VMA address which should be subtracted from real addresses
    when resolving @dtpoff relocation.
    This is PT_TLS segment p_vaddr.  */
@@ -1824,6 +2020,7 @@ elf64_x86_64_relocate_section (bfd *outp
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   bfd_vma *local_got_offsets;
+  bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
   Elf_Internal_Rela *relend;
 
@@ -1834,6 +2031,7 @@ elf64_x86_64_relocate_section (bfd *outp
   symtab_hdr = &elf_tdata (input_bfd)->symtab_hdr;
   sym_hashes = elf_sym_hashes (input_bfd);
   local_got_offsets = elf_local_got_offsets (input_bfd);
+  local_tlsdesc_gotents = elf64_x86_64_local_tlsdesc_gotent (input_bfd);
 
   rel = relocs;
   relend = relocs + input_section->reloc_count;
@@ -1845,7 +2043,7 @@ elf64_x86_64_relocate_section (bfd *outp
       struct elf_link_hash_entry *h;
       Elf_Internal_Sym *sym;
       asection *sec;
-      bfd_vma off;
+      bfd_vma off, offplt;
       bfd_vma relocation;
       bfd_boolean unresolved_reloc;
       bfd_reloc_status_type r;
@@ -2204,6 +2402,8 @@ elf64_x86_64_relocate_section (bfd *outp
   break;
 
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
  case R_X86_64_GOTTPOFF:
   r_type = elf64_x86_64_tls_transition (info, r_type, h == NULL);
   tls_type = GOT_UNKNOWN;
@@ -2215,7 +2415,9 @@ elf64_x86_64_relocate_section (bfd *outp
       if (!info->shared && h->dynindx == -1 && tls_type == GOT_TLS_IE)
  r_type = R_X86_64_TPOFF32;
     }
-  if (r_type == R_X86_64_TLSGD)
+  if (r_type == R_X86_64_TLSGD
+      || r_type == R_X86_64_GOTPC32_TLSDESC
+      || r_type == R_X86_64_TLSDESC_CALL)
     {
       if (tls_type == GOT_TLS_IE)
  r_type = R_X86_64_GOTTPOFF;
@@ -2257,6 +2459,67 @@ elf64_x86_64_relocate_section (bfd *outp
   rel++;
   continue;
  }
+      else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
+ {
+  /* GDesc -> LE transition.
+     It's originally something like:
+     leaq x@tlsdesc(%rip), %rax
+
+     Change it to:
+     movl $x@tpoff, %rax
+    
+     Registers other than %rax may be set up here.  */
+  
+  unsigned int val, type, type2;
+  bfd_vma roff;
+
+  /* First, make sure it's a leaq adding rip to a
+     32-bit offset into any register, although it's
+     probably almost always going to be rax.  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff >= 3);
+  type = bfd_get_8 (input_bfd, contents + roff - 3);
+  BFD_ASSERT ((type & 0xfb) == 0x48);
+  type2 = bfd_get_8 (input_bfd, contents + roff - 2);
+  BFD_ASSERT (type2 == 0x8d);
+  val = bfd_get_8 (input_bfd, contents + roff - 1);
+  BFD_ASSERT ((val & 0xc7) == 0x05);
+  BFD_ASSERT (roff + 4 <= input_section->size);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x48 | ((type >> 2) & 1),
+     contents + roff - 3);
+  bfd_put_8 (output_bfd, 0xc7, contents + roff - 2);
+  bfd_put_8 (output_bfd, 0xc0 | ((val >> 3) & 7),
+     contents + roff - 1);
+  bfd_put_32 (output_bfd, tpoff (info, relocation),
+      contents + roff);
+  continue;
+ }
+      else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSDESC_CALL)
+ {
+  /* GDesc -> LE transition.
+     It's originally:
+     call *(%rax)
+     Turn it into:
+     rex64 nop.  */
+  
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a call *(%rax).  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff + 2 <= input_section->size);
+  type = bfd_get_8 (input_bfd, contents + roff);
+  BFD_ASSERT (type == 0xff);
+  val = bfd_get_8 (input_bfd, contents + roff + 1);
+  BFD_ASSERT (val == 0x10);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x48, contents + roff);
+  bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
+  continue;
+ }
       else
  {
   unsigned int val, type, reg;
@@ -2322,13 +2585,17 @@ elf64_x86_64_relocate_section (bfd *outp
     abort ();
 
   if (h != NULL)
-    off = h->got.offset;
+    {
+      off = h->got.offset;
+      offplt = elf64_x86_64_hash_entry (h)->tlsdesc_got;
+    }
   else
     {
       if (local_got_offsets == NULL)
  abort ();
 
       off = local_got_offsets[r_symndx];
+      offplt = local_tlsdesc_gotents[r_symndx];
     }
 
   if ((off & 1) != 0)
@@ -2338,30 +2605,61 @@ elf64_x86_64_relocate_section (bfd *outp
       Elf_Internal_Rela outrel;
       bfd_byte *loc;
       int dr_type, indx;
+      asection *sreloc;
 
       if (htab->srelgot == NULL)
  abort ();
 
+      indx = h && h->dynindx != -1 ? h->dynindx : 0;
+
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  outrel.r_info = ELF64_R_INFO (indx, R_X86_64_TLSDESC);
+  BFD_ASSERT (htab->sgotplt_jump_table_size + offplt
+      + 2 * GOT_ENTRY_SIZE <= htab->sgotplt->size);
+  outrel.r_offset = (htab->sgotplt->output_section->vma
+     + htab->sgotplt->output_offset
+     + offplt
+     + htab->sgotplt_jump_table_size);
+  sreloc = htab->srelplt;
+  loc = sreloc->contents;
+  loc += sreloc->reloc_count++
+    * sizeof (Elf64_External_Rela);
+  BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+      <= sreloc->contents + sreloc->size);
+  if (indx == 0)
+    outrel.r_addend = relocation - dtpoff_base (info);
+  else
+    outrel.r_addend = 0;
+  bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
+ }
+
+      sreloc = htab->srelgot;
+
       outrel.r_offset = (htab->sgot->output_section->vma
  + htab->sgot->output_offset + off);
 
-      indx = h && h->dynindx != -1 ? h->dynindx : 0;
-      if (r_type == R_X86_64_TLSGD)
+      if (GOT_TLS_GD_P (tls_type))
  dr_type = R_X86_64_DTPMOD64;
+      else if (GOT_TLS_GDESC_P (tls_type))
+ goto dr_done;
       else
  dr_type = R_X86_64_TPOFF64;
 
       bfd_put_64 (output_bfd, 0, htab->sgot->contents + off);
       outrel.r_addend = 0;
-      if (dr_type == R_X86_64_TPOFF64 && indx == 0)
+      if ((dr_type == R_X86_64_TPOFF64
+   || dr_type == R_X86_64_TLSDESC) && indx == 0)
  outrel.r_addend = relocation - dtpoff_base (info);
       outrel.r_info = ELF64_R_INFO (indx, dr_type);
 
-      loc = htab->srelgot->contents;
-      loc += htab->srelgot->reloc_count++ * sizeof (Elf64_External_Rela);
+      loc = sreloc->contents;
+      loc += sreloc->reloc_count++ * sizeof (Elf64_External_Rela);
+      BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+  <= sreloc->contents + sreloc->size);
       bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
 
-      if (r_type == R_X86_64_TLSGD)
+      if (GOT_TLS_GD_P (tls_type))
  {
   if (indx == 0)
     {
@@ -2377,27 +2675,37 @@ elf64_x86_64_relocate_section (bfd *outp
       outrel.r_info = ELF64_R_INFO (indx,
     R_X86_64_DTPOFF64);
       outrel.r_offset += GOT_ENTRY_SIZE;
-      htab->srelgot->reloc_count++;
+      sreloc->reloc_count++;
       loc += sizeof (Elf64_External_Rela);
+      BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+  <= sreloc->contents + sreloc->size);
       bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
     }
  }
 
+    dr_done:
       if (h != NULL)
  h->got.offset |= 1;
       else
  local_got_offsets[r_symndx] |= 1;
     }
 
-  if (off >= (bfd_vma) -2)
+  if (off >= (bfd_vma) -2
+      && ! GOT_TLS_GDESC_P (tls_type))
     abort ();
   if (r_type == ELF64_R_TYPE (rel->r_info))
     {
-      relocation = htab->sgot->output_section->vma
-   + htab->sgot->output_offset + off;
+      if (r_type == R_X86_64_GOTPC32_TLSDESC
+  || r_type == R_X86_64_TLSDESC_CALL)
+ relocation = htab->sgotplt->output_section->vma
+  + htab->sgotplt->output_offset
+  + offplt + htab->sgotplt_jump_table_size;
+      else
+ relocation = htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off;
       unresolved_reloc = FALSE;
     }
-  else
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSGD)
     {
       unsigned int i;
       static unsigned char tlsgd[8]
@@ -2437,6 +2745,79 @@ elf64_x86_64_relocate_section (bfd *outp
       rel++;
       continue;
     }
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
+    {
+      /* GDesc -> IE transition.
+ It's originally something like:
+ leaq x@tlsdesc(%rip), %rax
+
+ Change it to:
+ movq x@gottpoff(%rip), %rax # before rex64 nop
+    
+ Registers other than %rax may be set up here.  */
+  
+      unsigned int val, type, type2;
+      bfd_vma roff;
+
+      /* First, make sure it's a leaq adding rip to a 32-bit
+ offset into any register, although it's probably
+ almost always going to be rax.  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff >= 3);
+      type = bfd_get_8 (input_bfd, contents + roff - 3);
+      BFD_ASSERT ((type & 0xfb) == 0x48);
+      type2 = bfd_get_8 (input_bfd, contents + roff - 2);
+      BFD_ASSERT (type2 == 0x8d);
+      val = bfd_get_8 (input_bfd, contents + roff - 1);
+      BFD_ASSERT ((val & 0xc7) == 0x05);
+      BFD_ASSERT (roff + 4 <= inp
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
On Sep 22, 2005, Alexandre Oliva <[hidden email]> wrote:

> On Sep 17, 2005, Alexandre Oliva <[hidden email]> wrote:
>> On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:
>>> On Sep 16, 2005, Alexandre Oliva <[hidden email]> wrote:
>>>> Over the past few months, I've been working on porting to IA32 and
>>>> AMD64/EM64T the interesting bits of the TLS design I came up with for
>>>> FR-V, achieving some impressive speedups along with slight code size
>>>> reductions in the most common cases.
>>> Here's the patch.  Built and tested on x86_64-linux-gnu and
>>> i686-pc-linux-gnu.  Ok to install?

>> Updated patch, using different relocation numbers, and different
>> dynamic table numbers as well.  Same tests run and passed.  Ok to
>> install?

> Updated again.  Only significant change is that we no longer emit the
> new dynamic table entries on x86-64 if we're linking with -z now,
> since then we know we won't be resolving the relocations lazily, and
> so the entries are not used at all.

One more update.  This time I've modified a little bit the code
generated to pad relaxations, to get the best performance according to
my benchmarking (not a lot of difference, but still), and adjusted the
testsuite to match.

Ok to install?


Index: bfd/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * reloc.c (BFD_RELOC_386_TLS_GOTDESC, BFD_RELOC_386_TLS_DESC,
        BFD_RELOC_386_TLS_DESC_CALL, BFD_RELOC_X86_64_GOTPC32_TLSDESC,
        BFD_RELOC_X86_64_TLSDESC, BFD_RELOC_X86_64_TLSDESC_CALL): New.
        * libbfd.h, bfd-in2.h: Rebuilt.
        * elf32-i386.c (elf_howto_table): New relocations.
        (R_386_tls): Adjust.
        (elf_i386_reloc_type_lookup): Map new relocations.
        (GOT_TLS_GDESC, GOT_TLS_GD_BOTH_P): New macros.
        (GOT_TLS_GD_P, GOT_TLS_GDESC_P, GOT_TLS_GD_ANY_P): New macros.
        (struct elf_i386_link_hash_entry): Add tlsdesc_got field.
        (struct elf_i386_obj_tdata): Add local_tlsdesc_gotent field.
        (elf_i386_local_tlsdesc_gotent): New macro.
        (struct elf_i386_link_hash_table): Add sgotplt_jump_table_size.
        (elf_i386_compute_jump_table_size): New macro.
        (link_hash_newfunc): Initialize tlsdesc_got.
        (elf_i386_link_hash_table_create): Set sgotplt_jump_table_size.
        (elf_i386_tls_transition): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf_i386_check_relocs): Likewise.  Allocate space for
        local_tlsdesc_gotent.
        (elf_i386_gc_sweep_hook): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (allocate_dynrelocs): Count function PLT relocations.  Reserve
        space for TLS descriptors and relocations.
        (elf_i386_size_dynamic_sections): Reserve space for TLS
        descriptors and relocations.  Set up sgotplt_jump_table_size.
        Don't zero reloc_count in srelplt.
        (elf_i386_always_size_sections): New.  Set up _TLS_MODULE_BASE_.
        (elf_i386_relocate_section): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf_i386_finish_dynamic_symbol): Use GOT_TLS_GD_ANY_P.
        (elf_backend_always_size_sections): Define.
        * elf64-x86-64.c (x86_64_elf_howto): Add R_X86_64_GOTPC32_TLSDESC,
        R_X86_64_TLSDESC, R_X86_64_TLSDESC_CALL.
        (R_X86_64_standard): Adjust.
        (x86_64_reloc_map): Map new relocs.
        (elf64_x86_64_rtype_to_howto): New, split out of...
        (elf64_x86_64_info_to_howto): ... this function, and...
        (elf64_x86_64_reloc_type_lookup): ... use it to map elf_reloc_val.
        (GOT_TLS_GDESC, GOT_TLS_GD_BOTH_P): New macros.
        (GOT_TLS_GD_P, GOT_TLS_GDESC_P, GOT_TLS_GD_ANY_P): New macros.
        (struct elf64_x86_64_link_hash_entry): Add tlsdesc_got field.
        (struct elf64_x86_64_obj_tdata): Add local_tlsdesc_gotent field.
        (elf64_x86_64_local_tlsdesc_gotent): New macro.
        (struct elf64_x86_64_link_hash_table): Add tlsdesc_plt,
        tlsdesc_got and sgotplt_jump_table_size fields.
        (elf64_x86_64_compute_jump_table_size): New macro.
        (link_hash_newfunc): Initialize tlsdesc_got.
        (elf64_x86_64_link_hash_table_create): Initialize new fields.
        (elf64_x86_64_tls_transition): Handle R_X86_64_GOTPC32_TLSDESC and
        R_X86_64_TLSDESC_CALL.
        (elf64_x86_64_check_relocs): Likewise.  Allocate space for
        local_tlsdesc_gotent.
        (elf64_x86_64_gc_sweep_hook): Handle R_X86_64_GOTPC32_TLSDESC and
        R_X86_64_TLSDESC_CALL.
        (allocate_dynrelocs): Count function PLT relocations.  Reserve
        space for TLS descriptors and relocations.
        (elf64_x86_64_size_dynamic_sections): Reserve space for TLS
        descriptors and relocations.  Set up sgotplt_jump_table_size,
        tlsdesc_plt and tlsdesc_got.  Make room for them.  Don't zero
        reloc_count in srelplt.  Add dynamic entries for DT_TLSDESC_PLT
        and DT_TLSDESC_GOT.
        (elf64_x86_64_always_size_sections): New.  Set up
        _TLS_MODULE_BASE_.
        (elf64_x86_64_relocate_section): Handle R_386_TLS_GOTDESC and
        R_386_TLS_DESC_CALL.
        (elf64_x86_64_finish_dynamic_symbol): Use GOT_TLS_GD_ANY_P.
        (elf64_x86_64_finish_dynamic_sections): Set DT_TLSDESC_PLT and
        DT_TLSDESC_GOT.  Set up TLS descriptor lazy resolver PLT entry.
        (elf_backend_always_size_sections): Define.

Index: binutils/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * readelf.c (get_dynamic_type): Handle DT_TLSDESC_GOT and
        DT_TLSDESC_PLT.

Index: gas/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * config/tc-i386.c (tc_i386_fix_adjustable): Handle
        BFD_RELOC_386_TLS_GOTDESC, BFD_RELOC_386_TLS_DESC_CALL,
        BFD_RELOC_X86_64_GOTPC32_TLSDESC, BFD_RELOC_X86_64_TLSDESC_CALL.
        (optimize_disp): Emit fix up for BFD_RELOC_386_TLS_DESC_CALL and
        BFD_RELOC_X86_64_TLSDESC_CALL immediately, and clear the
        displacement bits.
        (build_modrm_byte): Set up zero modrm for TLS desc calls.
        (lex_got): Handle @tlsdesc and @tlscall.
        (md_apply_fix, tc_gen_reloc): Handle the new relocations.

Index: include/elf/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * common.h (DT_TLSDESC_GOT, DT_TLSDESC_PLT): New.
        * i386.h (R_386_TLS_GOTDESC, R_386_TLS_DESC_CALL, R_386_TLS_DESC):
        New.
        * x86-64.h (R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL,
        R_X86_64_TLSDESC): New.

Index: ld/testsuite/ChangeLog
from  Alexandre Oliva  <[hidden email]>

        Introduce TLS descriptors for i386 and x86_64.
        * ld-i386/i386.exp: Run on x86_64-*-linux* and amd64-*-linux*.
        Add new tests.
        * ld-i386/pcrel16.d: Add -melf_i386.
        * ld-i386/pcrel8.d: Likewise.
        * ld-i386/tlsbindesc.dd: New.
        * ld-i386/tlsbindesc.rd: New.
        * ld-i386/tlsbindesc.s: New.
        * ld-i386/tlsbindesc.sd: New.
        * ld-i386/tlsbindesc.td: New.
        * ld-i386/tlsdesc.dd: New.
        * ld-i386/tlsdesc.rd: New.
        * ld-i386/tlsdesc.s: New.
        * ld-i386/tlsdesc.sd: New.
        * ld-i386/tlsdesc.td: New.
        * ld-i386/tlsgdesc.dd: New.
        * ld-i386/tlsgdesc.rd: New.
        * ld-i386/tlsgdesc.s: New.
        * ld-x86-64/x86-64.exp: Run new tests.
        * ld-x86-64/tlsbindesc.dd: New.
        * ld-x86-64/tlsbindesc.rd: New.
        * ld-x86-64/tlsbindesc.s: New.
        * ld-x86-64/tlsbindesc.sd: New.
        * ld-x86-64/tlsbindesc.td: New.
        * ld-x86-64/tlsdesc.dd: New.
        * ld-x86-64/tlsdesc.pd: New.
        * ld-x86-64/tlsdesc.rd: New.
        * ld-x86-64/tlsdesc.s: New.
        * ld-x86-64/tlsdesc.sd: New.
        * ld-x86-64/tlsdesc.td: New.
        * ld-x86-64/tlsgdesc.dd: New.
        * ld-x86-64/tlsgdesc.rd: New.
        * ld-x86-64/tlsgdesc.s: New.

Index: bfd/bfd-in2.h
===================================================================
--- bfd/bfd-in2.h.orig 2006-01-13 18:13:26.000000000 -0500
+++ bfd/bfd-in2.h 2006-01-13 18:14:55.000000000 -0500
@@ -2661,6 +2661,9 @@
   BFD_RELOC_386_TLS_DTPMOD32,
   BFD_RELOC_386_TLS_DTPOFF32,
   BFD_RELOC_386_TLS_TPOFF32,
+  BFD_RELOC_386_TLS_GOTDESC,
+  BFD_RELOC_386_TLS_DESC_CALL,
+  BFD_RELOC_386_TLS_DESC,
 
 /* x86-64/elf relocations  */
   BFD_RELOC_X86_64_GOT32,
@@ -2681,6 +2684,9 @@
   BFD_RELOC_X86_64_TPOFF32,
   BFD_RELOC_X86_64_GOTOFF64,
   BFD_RELOC_X86_64_GOTPC32,
+  BFD_RELOC_X86_64_GOTPC32_TLSDESC,
+  BFD_RELOC_X86_64_TLSDESC_CALL,
+  BFD_RELOC_X86_64_TLSDESC,
 
 /* ns32k relocations  */
   BFD_RELOC_NS32K_IMM_8,
Index: bfd/elf32-i386.c
===================================================================
--- bfd/elf32-i386.c.orig 2006-01-13 18:13:26.000000000 -0500
+++ bfd/elf32-i386.c 2006-01-13 18:14:55.000000000 -0500
@@ -126,9 +126,19 @@
   HOWTO(R_386_TLS_TPOFF32, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
  bfd_elf_generic_reloc, "R_386_TLS_TPOFF32",
  TRUE, 0xffffffff, 0xffffffff, FALSE),
+  EMPTY_HOWTO (38),
+  HOWTO(R_386_TLS_GOTDESC, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
+ bfd_elf_generic_reloc, "R_386_TLS_GOTDESC",
+ TRUE, 0xffffffff, 0xffffffff, FALSE),
+  HOWTO(R_386_TLS_DESC_CALL, 0, 0, 0, FALSE, 0, complain_overflow_dont,
+ bfd_elf_generic_reloc, "R_386_TLS_DESC_CALL",
+ FALSE, 0, 0, FALSE),
+  HOWTO(R_386_TLS_DESC, 0, 2, 32, FALSE, 0, complain_overflow_bitfield,
+ bfd_elf_generic_reloc, "R_386_TLS_DESC",
+ TRUE, 0xffffffff, 0xffffffff, FALSE),
 
   /* Another gap.  */
-#define R_386_tls (R_386_TLS_TPOFF32 + 1 - R_386_tls_offset)
+#define R_386_tls (R_386_TLS_DESC + 1 - R_386_tls_offset)
 #define R_386_vt_offset (R_386_GNU_VTINHERIT - R_386_tls)
 
 /* GNU extension to record C++ vtable hierarchy.  */
@@ -292,6 +302,18 @@
       TRACE ("BFD_RELOC_386_TLS_TPOFF32");
       return &elf_howto_table[R_386_TLS_TPOFF32 - R_386_tls_offset];
 
+    case BFD_RELOC_386_TLS_GOTDESC:
+      TRACE ("BFD_RELOC_386_TLS_GOTDESC");
+      return &elf_howto_table[R_386_TLS_GOTDESC - R_386_tls_offset];
+
+    case BFD_RELOC_386_TLS_DESC_CALL:
+      TRACE ("BFD_RELOC_386_TLS_DESC_CALL");
+      return &elf_howto_table[R_386_TLS_DESC_CALL - R_386_tls_offset];
+
+    case BFD_RELOC_386_TLS_DESC:
+      TRACE ("BFD_RELOC_386_TLS_DESC");
+      return &elf_howto_table[R_386_TLS_DESC - R_386_tls_offset];
+
     case BFD_RELOC_VTABLE_INHERIT:
       TRACE ("BFD_RELOC_VTABLE_INHERIT");
       return &elf_howto_table[R_386_GNU_VTINHERIT - R_386_vt_offset];
@@ -559,7 +581,20 @@
 #define GOT_TLS_IE_POS 5
 #define GOT_TLS_IE_NEG 6
 #define GOT_TLS_IE_BOTH 7
+#define GOT_TLS_GDESC 8
+#define GOT_TLS_GD_BOTH_P(type) \
+  ((type) == (GOT_TLS_GD | GOT_TLS_GDESC))
+#define GOT_TLS_GD_P(type) \
+  ((type) == GOT_TLS_GD || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GDESC_P(type) \
+  ((type) == GOT_TLS_GDESC || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GD_ANY_P(type) \
+  (GOT_TLS_GD_P (type) || GOT_TLS_GDESC_P (type))
   unsigned char tls_type;
+
+  /* Offset of the GOTPLT entry reserved for the TLS descriptor,
+     starting at the end of the jump table.  */
+  bfd_vma tlsdesc_got;
 };
 
 #define elf_i386_hash_entry(ent) ((struct elf_i386_link_hash_entry *)(ent))
@@ -570,6 +605,9 @@
 
   /* tls_type for each local got entry.  */
   char *local_got_tls_type;
+
+  /* GOTPLT entries for TLS descriptors.  */
+  bfd_vma *local_tlsdesc_gotent;
 };
 
 #define elf_i386_tdata(abfd) \
@@ -578,6 +616,9 @@
 #define elf_i386_local_got_tls_type(abfd) \
   (elf_i386_tdata (abfd)->local_got_tls_type)
 
+#define elf_i386_local_tlsdesc_gotent(abfd) \
+  (elf_i386_tdata (abfd)->local_tlsdesc_gotent)
+
 static bfd_boolean
 elf_i386_mkobject (bfd *abfd)
 {
@@ -620,6 +661,10 @@
     bfd_vma offset;
   } tls_ldm_got;
 
+  /* The amount of space used by the reserved portion of the sgotplt
+     section, plus whatever space is used by the jump slots.  */
+  bfd_vma sgotplt_jump_table_size;
+
   /* Small local sym to section mapping cache.  */
   struct sym_sec_cache sym_sec;
 };
@@ -629,6 +674,9 @@
 #define elf_i386_hash_table(p) \
   ((struct elf_i386_link_hash_table *) ((p)->hash))
 
+#define elf_i386_compute_jump_table_size(htab) \
+  ((htab)->srelplt->reloc_count * 4)
+
 /* Create an entry in an i386 ELF linker hash table.  */
 
 static struct bfd_hash_entry *
@@ -655,6 +703,7 @@
       eh = (struct elf_i386_link_hash_entry *) entry;
       eh->dyn_relocs = NULL;
       eh->tls_type = GOT_UNKNOWN;
+      eh->tlsdesc_got = (bfd_vma) -1;
     }
 
   return entry;
@@ -686,6 +735,7 @@
   ret->sdynbss = NULL;
   ret->srelbss = NULL;
   ret->tls_ldm_got.refcount = 0;
+  ret->sgotplt_jump_table_size = 0;
   ret->sym_sec.abfd = NULL;
   ret->is_vxworks = 0;
   ret->srelplt2 = NULL;
@@ -845,6 +895,8 @@
   switch (r_type)
     {
     case R_386_TLS_GD:
+    case R_386_TLS_GOTDESC:
+    case R_386_TLS_DESC_CALL:
     case R_386_TLS_IE_32:
       if (is_local)
  return R_386_TLS_LE_32;
@@ -949,6 +1001,8 @@
 
  case R_386_GOT32:
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
   /* This symbol requires a global offset table entry.  */
   {
     int tls_type, old_tls_type;
@@ -958,6 +1012,9 @@
       default:
       case R_386_GOT32: tls_type = GOT_NORMAL; break;
       case R_386_TLS_GD: tls_type = GOT_TLS_GD; break;
+      case R_386_TLS_GOTDESC:
+      case R_386_TLS_DESC_CALL:
+ tls_type = GOT_TLS_GDESC; break;
       case R_386_TLS_IE_32:
  if (ELF32_R_TYPE (rel->r_info) == r_type)
   tls_type = GOT_TLS_IE_NEG;
@@ -987,13 +1044,16 @@
     bfd_size_type size;
 
     size = symtab_hdr->sh_info;
-    size *= (sizeof (bfd_signed_vma) + sizeof(char));
+    size *= (sizeof (bfd_signed_vma)
+     + sizeof (bfd_vma) + sizeof(char));
     local_got_refcounts = bfd_zalloc (abfd, size);
     if (local_got_refcounts == NULL)
       return FALSE;
     elf_local_got_refcounts (abfd) = local_got_refcounts;
+    elf_i386_local_tlsdesc_gotent (abfd)
+      = (bfd_vma *) (local_got_refcounts + symtab_hdr->sh_info);
     elf_i386_local_got_tls_type (abfd)
-      = (char *) (local_got_refcounts + symtab_hdr->sh_info);
+      = (char *) (local_got_refcounts + 2 * symtab_hdr->sh_info);
   }
  local_got_refcounts[r_symndx] += 1;
  old_tls_type = elf_i386_local_got_tls_type (abfd) [r_symndx];
@@ -1004,11 +1064,14 @@
     /* If a TLS symbol is accessed using IE at least once,
        there is no point to use dynamic model for it.  */
     else if (old_tls_type != tls_type && old_tls_type != GOT_UNKNOWN
-     && (old_tls_type != GOT_TLS_GD
+     && (! GOT_TLS_GD_ANY_P (old_tls_type)
  || (tls_type & GOT_TLS_IE) == 0))
       {
- if ((old_tls_type & GOT_TLS_IE) && tls_type == GOT_TLS_GD)
+ if ((old_tls_type & GOT_TLS_IE) && GOT_TLS_GD_ANY_P (tls_type))
   tls_type = old_tls_type;
+ else if (GOT_TLS_GD_ANY_P (old_tls_type)
+ && GOT_TLS_GD_ANY_P (tls_type))
+  tls_type |= old_tls_type;
  else
   {
     (*_bfd_error_handler)
@@ -1316,6 +1379,8 @@
   break;
 
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
  case R_386_TLS_IE_32:
  case R_386_TLS_IE:
  case R_386_TLS_GOTIE:
@@ -1579,6 +1644,7 @@
 
   /* We also need to make an entry in the .rel.plt section.  */
   htab->srelplt->size += sizeof (Elf32_External_Rel);
+  htab->srelplt->reloc_count++;
 
   if (htab->is_vxworks && !info->shared)
     {
@@ -1612,6 +1678,9 @@
       h->needs_plt = 0;
     }
 
+  eh = (struct elf_i386_link_hash_entry *) h;
+  eh->tlsdesc_got = (bfd_vma) -1;
+
   /* If R_386_TLS_{IE_32,IE,GOTIE} symbol is now local to the binary,
      make it a R_386_TLS_LE_32 requiring no TLS entry.  */
   if (h->got.refcount > 0
@@ -1635,11 +1704,22 @@
  }
 
       s = htab->sgot;
-      h->got.offset = s->size;
-      s->size += 4;
-      /* R_386_TLS_GD needs 2 consecutive GOT slots.  */
-      if (tls_type == GOT_TLS_GD || tls_type == GOT_TLS_IE_BOTH)
- s->size += 4;
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  eh->tlsdesc_got = htab->sgotplt->size
+    - elf_i386_compute_jump_table_size (htab);
+  htab->sgotplt->size += 8;
+  h->got.offset = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (tls_type)
+  || GOT_TLS_GD_P (tls_type))
+ {
+  h->got.offset = s->size;
+  s->size += 4;
+  /* R_386_TLS_GD needs 2 consecutive GOT slots.  */
+  if (GOT_TLS_GD_P (tls_type) || tls_type == GOT_TLS_IE_BOTH)
+    s->size += 4;
+ }
       dyn = htab->elf.dynamic_sections_created;
       /* R_386_TLS_IE_32 needs one dynamic relocation,
  R_386_TLS_IE resp. R_386_TLS_GOTIE needs one dynamic relocation,
@@ -1648,21 +1728,23 @@
  global.  */
       if (tls_type == GOT_TLS_IE_BOTH)
  htab->srelgot->size += 2 * sizeof (Elf32_External_Rel);
-      else if ((tls_type == GOT_TLS_GD && h->dynindx == -1)
+      else if ((GOT_TLS_GD_P (tls_type) && h->dynindx == -1)
        || (tls_type & GOT_TLS_IE))
  htab->srelgot->size += sizeof (Elf32_External_Rel);
-      else if (tls_type == GOT_TLS_GD)
+      else if (GOT_TLS_GD_P (tls_type))
  htab->srelgot->size += 2 * sizeof (Elf32_External_Rel);
-      else if ((ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
- || h->root.type != bfd_link_hash_undefweak)
+      else if (! GOT_TLS_GDESC_P (tls_type)
+       && (ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
+   || h->root.type != bfd_link_hash_undefweak)
        && (info->shared
    || WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, 0, h)))
  htab->srelgot->size += sizeof (Elf32_External_Rel);
+      if (GOT_TLS_GDESC_P (tls_type))
+ htab->srelplt->size += sizeof (Elf32_External_Rel);
     }
   else
     h->got.offset = (bfd_vma) -1;
 
-  eh = (struct elf_i386_link_hash_entry *) h;
   if (eh->dyn_relocs == NULL)
     return TRUE;
 
@@ -1810,6 +1892,7 @@
       bfd_signed_vma *local_got;
       bfd_signed_vma *end_local_got;
       char *local_tls_type;
+      bfd_vma *local_tlsdesc_gotent;
       bfd_size_type locsymcount;
       Elf_Internal_Shdr *symtab_hdr;
       asection *srel;
@@ -1852,25 +1935,42 @@
       locsymcount = symtab_hdr->sh_info;
       end_local_got = local_got + locsymcount;
       local_tls_type = elf_i386_local_got_tls_type (ibfd);
+      local_tlsdesc_gotent = elf_i386_local_tlsdesc_gotent (ibfd);
       s = htab->sgot;
       srel = htab->srelgot;
-      for (; local_got < end_local_got; ++local_got, ++local_tls_type)
+      for (; local_got < end_local_got;
+   ++local_got, ++local_tls_type, ++local_tlsdesc_gotent)
  {
+  *local_tlsdesc_gotent = (bfd_vma) -1;
   if (*local_got > 0)
     {
-      *local_got = s->size;
-      s->size += 4;
-      if (*local_tls_type == GOT_TLS_GD
-  || *local_tls_type == GOT_TLS_IE_BOTH)
- s->size += 4;
+      if (GOT_TLS_GDESC_P (*local_tls_type))
+ {
+  *local_tlsdesc_gotent = htab->sgotplt->size
+    - elf_i386_compute_jump_table_size (htab);
+  htab->sgotplt->size += 8;
+  *local_got = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (*local_tls_type)
+  || GOT_TLS_GD_P (*local_tls_type))
+ {
+  *local_got = s->size;
+  s->size += 4;
+  if (GOT_TLS_GD_P (*local_tls_type)
+      || *local_tls_type == GOT_TLS_IE_BOTH)
+    s->size += 4;
+ }
       if (info->shared
-  || *local_tls_type == GOT_TLS_GD
+  || GOT_TLS_GD_ANY_P (*local_tls_type)
   || (*local_tls_type & GOT_TLS_IE))
  {
   if (*local_tls_type == GOT_TLS_IE_BOTH)
     srel->size += 2 * sizeof (Elf32_External_Rel);
-  else
+  else if (GOT_TLS_GD_P (*local_tls_type)
+   || ! GOT_TLS_GDESC_P (*local_tls_type))
     srel->size += sizeof (Elf32_External_Rel);
+  if (GOT_TLS_GDESC_P (*local_tls_type))
+    htab->srelplt->size += sizeof (Elf32_External_Rel);
  }
     }
   else
@@ -1914,6 +2014,14 @@
      sym dynamic relocs.  */
   elf_link_hash_traverse (&htab->elf, allocate_dynrelocs, (PTR) info);
 
+  /* For every jump slot reserved in the sgotplt, reloc_count is
+     incremented.  However, when we reserve space for TLS descriptors,
+     it's not incremented, so in order to compute the space reserved
+     for them, it suffices to multiply the reloc count by the jump
+     slot size.  */
+  if (htab->srelplt)
+    htab->sgotplt_jump_table_size = htab->srelplt->reloc_count * 4;
+
   /* We now have determined the sizes of the various dynamic sections.
      Allocate memory for them.  */
   relocs = FALSE;
@@ -1945,7 +2053,8 @@
 
   /* We use the reloc_count field as a counter if we need
      to copy relocs into the output file.  */
-  s->reloc_count = 0;
+  if (s != htab->srelplt)
+    s->reloc_count = 0;
  }
       else
  {
@@ -2032,6 +2141,41 @@
   return TRUE;
 }
 
+static bfd_boolean
+elf_i386_always_size_sections (bfd *output_bfd,
+       struct bfd_link_info *info)
+{
+  asection *tls_sec = elf_hash_table (info)->tls_sec;
+
+  if (tls_sec)
+    {
+      struct elf_link_hash_entry *tlsbase;
+
+      tlsbase = elf_link_hash_lookup (elf_hash_table (info),
+      "_TLS_MODULE_BASE_",
+      FALSE, FALSE, FALSE);
+
+      if (tlsbase && tlsbase->type == STT_TLS)
+ {
+  struct bfd_link_hash_entry *bh = NULL;
+  const struct elf_backend_data *bed
+    = get_elf_backend_data (output_bfd);
+
+  if (!(_bfd_generic_link_add_one_symbol
+ (info, output_bfd, "_TLS_MODULE_BASE_", BSF_LOCAL,
+ tls_sec, 0, NULL, FALSE,
+ bed->collect, &bh)))
+    return FALSE;
+  tlsbase = (struct elf_link_hash_entry *)bh;
+  tlsbase->def_regular = 1;
+  tlsbase->other = STV_HIDDEN;
+  (*bed->elf_backend_hide_symbol) (info, tlsbase, TRUE);
+ }
+    }
+
+  return TRUE;
+}
+
 /* Set the correct type for an x86 ELF section.  We do this by the
    section name, which is a hack, but ought to work.  */
 
@@ -2109,6 +2253,7 @@
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   bfd_vma *local_got_offsets;
+  bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
   Elf_Internal_Rela *relend;
 
@@ -2116,6 +2261,7 @@
   symtab_hdr = &elf_tdata (input_bfd)->symtab_hdr;
   sym_hashes = elf_sym_hashes (input_bfd);
   local_got_offsets = elf_local_got_offsets (input_bfd);
+  local_tlsdesc_gotents = elf_i386_local_tlsdesc_gotent (input_bfd);
 
   rel = relocs;
   relend = relocs + input_section->reloc_count;
@@ -2127,7 +2273,7 @@
       struct elf_link_hash_entry *h;
       Elf_Internal_Sym *sym;
       asection *sec;
-      bfd_vma off;
+      bfd_vma off, offplt;
       bfd_vma relocation;
       bfd_boolean unresolved_reloc;
       bfd_reloc_status_type r;
@@ -2549,6 +2695,8 @@
   /* Fall through */
 
  case R_386_TLS_GD:
+ case R_386_TLS_GOTDESC:
+ case R_386_TLS_DESC_CALL:
  case R_386_TLS_IE_32:
  case R_386_TLS_GOTIE:
   r_type = elf_i386_tls_transition (info, r_type, h == NULL);
@@ -2563,7 +2711,9 @@
     }
   if (tls_type == GOT_TLS_IE)
     tls_type = GOT_TLS_IE_NEG;
-  if (r_type == R_386_TLS_GD)
+  if (r_type == R_386_TLS_GD
+      || r_type == R_386_TLS_GOTDESC
+      || r_type == R_386_TLS_DESC_CALL)
     {
       if (tls_type == GOT_TLS_IE_POS)
  r_type = R_386_TLS_GOTIE;
@@ -2637,6 +2787,63 @@
   rel++;
   continue;
  }
+      else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
+ {
+  /* GDesc -> LE transition.
+     It's originally something like:
+     leal x@tlsdesc(%ebx), %eax
+
+     leal x@ntpoff, %eax
+
+     Registers other than %eax may be set up here.  */
+
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a leal adding ebx to a
+     32-bit offset into any register, although it's
+     probably almost always going to be eax.  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff >= 2);
+  type = bfd_get_8 (input_bfd, contents + roff - 2);
+  BFD_ASSERT (type == 0x8d);
+  val = bfd_get_8 (input_bfd, contents + roff - 1);
+  BFD_ASSERT ((val & 0xc7) == 0x83);
+  BFD_ASSERT (roff + 4 <= input_section->size);
+
+  /* Now modify the instruction as appropriate.  */
+  /* aoliva FIXME: remove the above and xor the byte
+     below with 0x86.  */
+  bfd_put_8 (output_bfd, val ^ 0x86,
+     contents + roff - 1);
+  bfd_put_32 (output_bfd, -tpoff (info, relocation),
+      contents + roff);
+  continue;
+ }
+      else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
+ {
+  /* GDesc -> LE transition.
+     It's originally:
+     call *(%eax)
+     Turn it into:
+     nop; nop  */
+
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a call *(%eax).  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff + 2 <= input_section->size);
+  type = bfd_get_8 (input_bfd, contents + roff);
+  BFD_ASSERT (type == 0xff);
+  val = bfd_get_8 (input_bfd, contents + roff + 1);
+  BFD_ASSERT (val == 0x10);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x90, contents + roff);
+  bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
+  continue;
+ }
       else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_IE)
  {
   unsigned int val, type;
@@ -2751,13 +2958,17 @@
     abort ();
 
   if (h != NULL)
-    off = h->got.offset;
+    {
+      off = h->got.offset;
+      offplt = elf_i386_hash_entry (h)->tlsdesc_got;
+    }
   else
     {
       if (local_got_offsets == NULL)
  abort ();
 
       off = local_got_offsets[r_symndx];
+      offplt = local_tlsdesc_gotents[r_symndx];
     }
 
   if ((off & 1) != 0)
@@ -2767,35 +2978,77 @@
       Elf_Internal_Rela outrel;
       bfd_byte *loc;
       int dr_type, indx;
+      asection *sreloc;
 
       if (htab->srelgot == NULL)
  abort ();
 
+      indx = h && h->dynindx != -1 ? h->dynindx : 0;
+
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  outrel.r_info = ELF32_R_INFO (indx, R_386_TLS_DESC);
+  BFD_ASSERT (htab->sgotplt_jump_table_size + offplt + 8
+      <= htab->sgotplt->size);
+  outrel.r_offset = (htab->sgotplt->output_section->vma
+     + htab->sgotplt->output_offset
+     + offplt
+     + htab->sgotplt_jump_table_size);
+  sreloc = htab->srelplt;
+  loc = sreloc->contents;
+  loc += sreloc->reloc_count++
+    * sizeof (Elf32_External_Rel);
+  BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+      <= sreloc->contents + sreloc->size);
+  bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
+  if (indx == 0)
+    {
+      BFD_ASSERT (! unresolved_reloc);
+      bfd_put_32 (output_bfd,
+  relocation - dtpoff_base (info),
+  htab->sgotplt->contents + offplt
+  + htab->sgotplt_jump_table_size + 4);
+    }
+  else
+    {
+      bfd_put_32 (output_bfd, 0,
+  htab->sgotplt->contents + offplt
+  + htab->sgotplt_jump_table_size + 4);
+    }
+ }
+
+      sreloc = htab->srelgot;
+
       outrel.r_offset = (htab->sgot->output_section->vma
  + htab->sgot->output_offset + off);
 
-      indx = h && h->dynindx != -1 ? h->dynindx : 0;
-      if (r_type == R_386_TLS_GD)
+      if (GOT_TLS_GD_P (tls_type))
  dr_type = R_386_TLS_DTPMOD32;
+      else if (GOT_TLS_GDESC_P (tls_type))
+ goto dr_done;
       else if (tls_type == GOT_TLS_IE_POS)
  dr_type = R_386_TLS_TPOFF;
       else
  dr_type = R_386_TLS_TPOFF32;
+
       if (dr_type == R_386_TLS_TPOFF && indx == 0)
  bfd_put_32 (output_bfd, relocation - dtpoff_base (info),
     htab->sgot->contents + off);
       else if (dr_type == R_386_TLS_TPOFF32 && indx == 0)
  bfd_put_32 (output_bfd, dtpoff_base (info) - relocation,
     htab->sgot->contents + off);
-      else
+      else if (dr_type != R_386_TLS_DESC)
  bfd_put_32 (output_bfd, 0,
     htab->sgot->contents + off);
       outrel.r_info = ELF32_R_INFO (indx, dr_type);
-      loc = htab->srelgot->contents;
-      loc += htab->srelgot->reloc_count++ * sizeof (Elf32_External_Rel);
+
+      loc = sreloc->contents;
+      loc += sreloc->reloc_count++ * sizeof (Elf32_External_Rel);
+      BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+  <= sreloc->contents + sreloc->size);
       bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
 
-      if (r_type == R_386_TLS_GD)
+      if (GOT_TLS_GD_P (tls_type))
  {
   if (indx == 0)
     {
@@ -2811,8 +3064,10 @@
       outrel.r_info = ELF32_R_INFO (indx,
     R_386_TLS_DTPOFF32);
       outrel.r_offset += 4;
-      htab->srelgot->reloc_count++;
+      sreloc->reloc_count++;
       loc += sizeof (Elf32_External_Rel);
+      BFD_ASSERT (loc + sizeof (Elf32_External_Rel)
+  <= sreloc->contents + sreloc->size);
       bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
     }
  }
@@ -2823,25 +3078,33 @@
       htab->sgot->contents + off + 4);
   outrel.r_info = ELF32_R_INFO (indx, R_386_TLS_TPOFF);
   outrel.r_offset += 4;
-  htab->srelgot->reloc_count++;
+  sreloc->reloc_count++;
   loc += sizeof (Elf32_External_Rel);
   bfd_elf32_swap_reloc_out (output_bfd, &outrel, loc);
  }
 
+    dr_done:
       if (h != NULL)
  h->got.offset |= 1;
       else
  local_got_offsets[r_symndx] |= 1;
     }
 
-  if (off >= (bfd_vma) -2)
+  if (off >= (bfd_vma) -2
+      && ! GOT_TLS_GDESC_P (tls_type))
     abort ();
-  if (r_type == ELF32_R_TYPE (rel->r_info))
+  if (r_type == R_386_TLS_GOTDESC
+      || r_type == R_386_TLS_DESC_CALL)
+    {
+      relocation = htab->sgotplt_jump_table_size + offplt;
+      unresolved_reloc = FALSE;
+    }
+  else if (r_type == ELF32_R_TYPE (rel->r_info))
     {
       bfd_vma g_o_t = htab->sgotplt->output_section->vma
       + htab->sgotplt->output_offset;
       relocation = htab->sgot->output_section->vma
-   + htab->sgot->output_offset + off - g_o_t;
+ + htab->sgot->output_offset + off - g_o_t;
       if ((r_type == R_386_TLS_IE || r_type == R_386_TLS_GOTIE)
   && tls_type == GOT_TLS_IE_BOTH)
  relocation += 4;
@@ -2849,7 +3112,7 @@
  relocation += g_o_t;
       unresolved_reloc = FALSE;
     }
-  else
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GD)
     {
       unsigned int val, type;
       bfd_vma roff;
@@ -2913,6 +3176,94 @@
       rel++;
       continue;
     }
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_GOTDESC)
+    {
+      /* GDesc -> IE transition.
+ It's originally something like:
+ leal x@tlsdesc(%ebx), %eax
+
+ Change it to:
+ movl x@gotntpoff(%ebx), %eax # before nop; nop
+ or:
+ movl x@gottpoff(%ebx), %eax # before negl %eax
+
+ Registers other than %eax may be set up here.  */
+
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a leal adding ebx to a 32-bit
+ offset into any register, although it's probably
+ almost always going to be eax.  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff >= 2);
+      type = bfd_get_8 (input_bfd, contents + roff - 2);
+      BFD_ASSERT (type == 0x8d);
+      val = bfd_get_8 (input_bfd, contents + roff - 1);
+      BFD_ASSERT ((val & 0xc7) == 0x83);
+      BFD_ASSERT (roff + 4 <= input_section->size);
+
+      /* Now modify the instruction as appropriate.  */
+      /* To turn a leal into a movl in the form we use it, it
+ suffices to change the first byte from 0x8d to 0x8b.
+ aoliva FIXME: should we decide to keep the leal, all
+ we have to do is remove the statement below, and
+ adjust the relaxation of R_386_TLS_DESC_CALL.  */
+      bfd_put_8 (output_bfd, 0x8b, contents + roff - 2);
+
+      if (tls_type == GOT_TLS_IE_BOTH)
+ off += 4;
+
+      bfd_put_32 (output_bfd,
+  htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off
+  - htab->sgotplt->output_section->vma
+  - htab->sgotplt->output_offset,
+  contents + roff);
+      continue;
+    }
+  else if (ELF32_R_TYPE (rel->r_info) == R_386_TLS_DESC_CALL)
+    {
+      /* GDesc -> IE transition.
+ It's originally:
+ call *(%eax)
+
+ Change it to:
+ nop; nop
+ or
+ negl %eax
+ depending on how we transformed the TLS_GOTDESC above.
+      */
+
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a call *(%eax).  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff + 2 <= input_section->size);
+      type = bfd_get_8 (input_bfd, contents + roff);
+      BFD_ASSERT (type == 0xff);
+      val = bfd_get_8 (input_bfd, contents + roff + 1);
+      BFD_ASSERT (val == 0x10);
+
+      /* Now modify the instruction as appropriate.  */
+      if (tls_type != GOT_TLS_IE_NEG)
+ {
+  /* nop; nop */
+  bfd_put_8 (output_bfd, 0x90, contents + roff);
+  bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
+ }
+      else
+ {
+  /* negl %eax */
+  bfd_put_8 (output_bfd, 0xf7, contents + roff);
+  bfd_put_8 (output_bfd, 0xd8, contents + roff + 1);
+ }
+
+      continue;
+    }
+  else
+    BFD_ASSERT (FALSE);
   break;
 
  case R_386_TLS_LDM:
@@ -3220,7 +3571,7 @@
     }
 
   if (h->got.offset != (bfd_vma) -1
-      && elf_i386_hash_entry(h)->tls_type != GOT_TLS_GD
+      && ! GOT_TLS_GD_ANY_P (elf_i386_hash_entry(h)->tls_type)
       && (elf_i386_hash_entry(h)->tls_type & GOT_TLS_IE) == 0)
     {
       Elf_Internal_Rela rel;
@@ -3555,6 +3906,7 @@
 #define elf_backend_reloc_type_class      elf_i386_reloc_type_class
 #define elf_backend_relocate_section      elf_i386_relocate_section
 #define elf_backend_size_dynamic_sections     elf_i386_size_dynamic_sections
+#define elf_backend_always_size_sections      elf_i386_always_size_sections
 #define elf_backend_plt_sym_val      elf_i386_plt_sym_val
 
 #include "elf32-target.h"
Index: bfd/elf64-x86-64.c
===================================================================
--- bfd/elf64-x86-64.c.orig 2006-01-13 18:13:26.000000000 -0500
+++ bfd/elf64-x86-64.c 2006-01-13 18:14:55.000000000 -0500
@@ -112,12 +112,31 @@
   HOWTO(R_X86_64_GOTPC32, 0, 2, 32, TRUE, 0, complain_overflow_signed,
  bfd_elf_generic_reloc, "R_X86_64_GOTPC32",
  FALSE, 0xffffffff, 0xffffffff, TRUE),
+  EMPTY_HOWTO (27),
+  EMPTY_HOWTO (28),
+  EMPTY_HOWTO (29),
+  EMPTY_HOWTO (30),
+  EMPTY_HOWTO (31),
+  EMPTY_HOWTO (32),
+  EMPTY_HOWTO (33),
+  HOWTO(R_X86_64_GOTPC32_TLSDESC, 0, 2, 32, TRUE, 0,
+ complain_overflow_bitfield, bfd_elf_generic_reloc,
+ "R_X86_64_GOTPC32_TLSDESC",
+ FALSE, 0xffffffff, 0xffffffff, TRUE),
+  HOWTO(R_X86_64_TLSDESC_CALL, 0, 0, 0, FALSE, 0,
+ complain_overflow_dont, bfd_elf_generic_reloc,
+ "R_X86_64_TLSDESC_CALL",
+ FALSE, 0, 0, FALSE),
+  HOWTO(R_X86_64_TLSDESC, 0, 4, 64, FALSE, 0,
+ complain_overflow_bitfield, bfd_elf_generic_reloc,
+ "R_X86_64_TLSDESC",
+ FALSE, MINUS_ONE, MINUS_ONE, FALSE),
 
   /* We have a gap in the reloc numbers here.
      R_X86_64_standard counts the number up to this point, and
      R_X86_64_vt_offset is the value to subtract from a reloc type of
      R_X86_64_GNU_VT* to form an index into this table.  */
-#define R_X86_64_standard (R_X86_64_GOTPC32 + 1)
+#define R_X86_64_standard (R_X86_64_TLSDESC + 1)
 #define R_X86_64_vt_offset (R_X86_64_GNU_VTINHERIT - R_X86_64_standard)
 
 /* GNU extension to record C++ vtable hierarchy.  */
@@ -166,14 +185,38 @@
   { BFD_RELOC_64_PCREL, R_X86_64_PC64, },
   { BFD_RELOC_X86_64_GOTOFF64, R_X86_64_GOTOFF64, },
   { BFD_RELOC_X86_64_GOTPC32, R_X86_64_GOTPC32, },
+  { BFD_RELOC_X86_64_GOTPC32_TLSDESC, R_X86_64_GOTPC32_TLSDESC, },
+  { BFD_RELOC_X86_64_TLSDESC_CALL, R_X86_64_TLSDESC_CALL, },
+  { BFD_RELOC_X86_64_TLSDESC, R_X86_64_TLSDESC, },
   { BFD_RELOC_VTABLE_INHERIT, R_X86_64_GNU_VTINHERIT, },
   { BFD_RELOC_VTABLE_ENTRY, R_X86_64_GNU_VTENTRY, },
 };
 
+static reloc_howto_type *
+elf64_x86_64_rtype_to_howto (bfd *abfd, unsigned r_type)
+{
+  unsigned i;
+
+  if (r_type < (unsigned int) R_X86_64_GNU_VTINHERIT
+      || r_type >= (unsigned int) R_X86_64_max)
+    {
+      if (r_type >= (unsigned int) R_X86_64_standard)
+ {
+  (*_bfd_error_handler) (_("%B: invalid relocation type %d"),
+ abfd, (int) r_type);
+  r_type = R_X86_64_NONE;
+ }
+      i = r_type;
+    }
+  else
+    i = r_type - (unsigned int) R_X86_64_vt_offset;
+  BFD_ASSERT (x86_64_elf_howto_table[i].type == r_type);
+  return &x86_64_elf_howto_table[i];
+}
 
 /* Given a BFD reloc type, return a HOWTO structure.  */
 static reloc_howto_type *
-elf64_x86_64_reloc_type_lookup (bfd *abfd ATTRIBUTE_UNUSED,
+elf64_x86_64_reloc_type_lookup (bfd *abfd,
  bfd_reloc_code_real_type code)
 {
   unsigned int i;
@@ -182,7 +225,8 @@
        i++)
     {
       if (x86_64_reloc_map[i].bfd_reloc_val == code)
- return &x86_64_elf_howto_table[i];
+ return elf64_x86_64_rtype_to_howto (abfd,
+    x86_64_reloc_map[i].elf_reloc_val);
     }
   return 0;
 }
@@ -193,23 +237,10 @@
 elf64_x86_64_info_to_howto (bfd *abfd ATTRIBUTE_UNUSED, arelent *cache_ptr,
     Elf_Internal_Rela *dst)
 {
-  unsigned r_type, i;
+  unsigned r_type;
 
   r_type = ELF64_R_TYPE (dst->r_info);
-  if (r_type < (unsigned int) R_X86_64_GNU_VTINHERIT
-      || r_type >= (unsigned int) R_X86_64_max)
-    {
-      if (r_type >= (unsigned int) R_X86_64_standard)
- {
-  (*_bfd_error_handler) (_("%B: invalid relocation type %d"),
- abfd, (int) r_type);
-  r_type = R_X86_64_NONE;
- }
-      i = r_type;
-    }
-  else
-    i = r_type - (unsigned int) R_X86_64_vt_offset;
-  cache_ptr->howto = &x86_64_elf_howto_table[i];
+  cache_ptr->howto = elf64_x86_64_rtype_to_howto (abfd, r_type);
   BFD_ASSERT (r_type == cache_ptr->howto->type);
 }
 
@@ -353,7 +384,20 @@
 #define GOT_NORMAL 1
 #define GOT_TLS_GD 2
 #define GOT_TLS_IE 3
+#define GOT_TLS_GDESC 4
+#define GOT_TLS_GD_BOTH_P(type) \
+  ((type) == (GOT_TLS_GD | GOT_TLS_GDESC))
+#define GOT_TLS_GD_P(type) \
+  ((type) == GOT_TLS_GD || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GDESC_P(type) \
+  ((type) == GOT_TLS_GDESC || GOT_TLS_GD_BOTH_P (type))
+#define GOT_TLS_GD_ANY_P(type) \
+  (GOT_TLS_GD_P (type) || GOT_TLS_GDESC_P (type))
   unsigned char tls_type;
+
+  /* Offset of the GOTPLT entry reserved for the TLS descriptor,
+     starting at the end of the jump table.  */
+  bfd_vma tlsdesc_got;
 };
 
 #define elf64_x86_64_hash_entry(ent) \
@@ -365,6 +409,9 @@
 
   /* tls_type for each local got entry.  */
   char *local_got_tls_type;
+
+  /* GOTPLT entries for TLS descriptors.  */
+  bfd_vma *local_tlsdesc_gotent;
 };
 
 #define elf64_x86_64_tdata(abfd) \
@@ -373,6 +420,8 @@
 #define elf64_x86_64_local_got_tls_type(abfd) \
   (elf64_x86_64_tdata (abfd)->local_got_tls_type)
 
+#define elf64_x86_64_local_tlsdesc_gotent(abfd) \
+  (elf64_x86_64_tdata (abfd)->local_tlsdesc_gotent)
 
 /* x86-64 ELF linker hash table.  */
 
@@ -389,11 +438,23 @@
   asection *sdynbss;
   asection *srelbss;
 
+  /* The offset into splt of the PLT entry for the TLS descriptor
+     resolver.  Special values are 0, if not necessary (or not found
+     to be necessary yet), and -1 if needed but not determined
+     yet.  */
+  bfd_vma tlsdesc_plt;
+  /* The offset into sgot of the GOT entry used by the PLT entry
+     above.  */
+  bfd_vma tlsdesc_got;
+
   union {
     bfd_signed_vma refcount;
     bfd_vma offset;
   } tls_ld_got;
 
+  /* The amount of space used by the jump slots in the GOT.  */
+  bfd_vma sgotplt_jump_table_size;
+
   /* Small local sym to section mapping cache.  */
   struct sym_sec_cache sym_sec;
 };
@@ -403,6 +464,9 @@
 #define elf64_x86_64_hash_table(p) \
   ((struct elf64_x86_64_link_hash_table *) ((p)->hash))
 
+#define elf64_x86_64_compute_jump_table_size(htab) \
+  ((htab)->srelplt->reloc_count * GOT_ENTRY_SIZE)
+
 /* Create an entry in an x86-64 ELF linker hash table. */
 
 static struct bfd_hash_entry *
@@ -428,6 +492,7 @@
       eh = (struct elf64_x86_64_link_hash_entry *) entry;
       eh->dyn_relocs = NULL;
       eh->tls_type = GOT_UNKNOWN;
+      eh->tlsdesc_got = (bfd_vma) -1;
     }
 
   return entry;
@@ -459,7 +524,10 @@
   ret->sdynbss = NULL;
   ret->srelbss = NULL;
   ret->sym_sec.abfd = NULL;
+  ret->tlsdesc_plt = 0;
+  ret->tlsdesc_got = 0;
   ret->tls_ld_got.refcount = 0;
+  ret->sgotplt_jump_table_size = 0;
 
   return &ret->elf.root;
 }
@@ -616,6 +684,8 @@
   switch (r_type)
     {
     case R_X86_64_TLSGD:
+    case R_X86_64_GOTPC32_TLSDESC:
+    case R_X86_64_TLSDESC_CALL:
     case R_X86_64_GOTTPOFF:
       if (is_local)
  return R_X86_64_TPOFF32;
@@ -706,6 +776,8 @@
  case R_X86_64_GOT32:
  case R_X86_64_GOTPCREL:
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
   /* This symbol requires a global offset table entry. */
   {
     int tls_type, old_tls_type;
@@ -715,6 +787,9 @@
       default: tls_type = GOT_NORMAL; break;
       case R_X86_64_TLSGD: tls_type = GOT_TLS_GD; break;
       case R_X86_64_GOTTPOFF: tls_type = GOT_TLS_IE; break;
+      case R_X86_64_GOTPC32_TLSDESC:
+      case R_X86_64_TLSDESC_CALL:
+ tls_type = GOT_TLS_GDESC; break;
       }
 
     if (h != NULL)
@@ -733,14 +808,17 @@
     bfd_size_type size;
 
     size = symtab_hdr->sh_info;
-    size *= sizeof (bfd_signed_vma) + sizeof (char);
+    size *= sizeof (bfd_signed_vma)
+      + sizeof (bfd_vma) + sizeof (char);
     local_got_refcounts = ((bfd_signed_vma *)
    bfd_zalloc (abfd, size));
     if (local_got_refcounts == NULL)
       return FALSE;
     elf_local_got_refcounts (abfd) = local_got_refcounts;
+    elf64_x86_64_local_tlsdesc_gotent (abfd)
+      = (bfd_vma *) (local_got_refcounts + symtab_hdr->sh_info);
     elf64_x86_64_local_got_tls_type (abfd)
-      = (char *) (local_got_refcounts + symtab_hdr->sh_info);
+      = (char *) (local_got_refcounts + 2 * symtab_hdr->sh_info);
   }
  local_got_refcounts[r_symndx] += 1;
  old_tls_type
@@ -750,10 +828,14 @@
     /* If a TLS symbol is accessed using IE at least once,
        there is no point to use dynamic model for it.  */
     if (old_tls_type != tls_type && old_tls_type != GOT_UNKNOWN
- && (old_tls_type != GOT_TLS_GD || tls_type != GOT_TLS_IE))
+ && (! GOT_TLS_GD_ANY_P (old_tls_type)
+    || tls_type != GOT_TLS_IE))
       {
- if (old_tls_type == GOT_TLS_IE && tls_type == GOT_TLS_GD)
+ if (old_tls_type == GOT_TLS_IE && GOT_TLS_GD_ANY_P (tls_type))
   tls_type = old_tls_type;
+ else if (GOT_TLS_GD_ANY_P (old_tls_type)
+ && GOT_TLS_GD_ANY_P (tls_type))
+  tls_type |= old_tls_type;
  else
   {
     (*_bfd_error_handler)
@@ -1101,6 +1183,8 @@
   break;
 
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
  case R_X86_64_GOTTPOFF:
  case R_X86_64_GOT32:
  case R_X86_64_GOTPCREL:
@@ -1368,6 +1452,7 @@
 
   /* We also need to make an entry in the .rela.plt section.  */
   htab->srelplt->size += sizeof (Elf64_External_Rela);
+  htab->srelplt->reloc_count++;
  }
       else
  {
@@ -1381,6 +1466,9 @@
       h->needs_plt = 0;
     }
 
+  eh = (struct elf64_x86_64_link_hash_entry *) h;
+  eh->tlsdesc_got = (bfd_vma) -1;
+
   /* If R_X86_64_GOTTPOFF symbol is now local to the binary,
      make it a R_X86_64_TPOFF32 requiring no GOT entry.  */
   if (h->got.refcount > 0
@@ -1403,31 +1491,46 @@
     return FALSE;
  }
 
-      s = htab->sgot;
-      h->got.offset = s->size;
-      s->size += GOT_ENTRY_SIZE;
-      /* R_X86_64_TLSGD needs 2 consecutive GOT slots.  */
-      if (tls_type == GOT_TLS_GD)
- s->size += GOT_ENTRY_SIZE;
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  eh->tlsdesc_got = htab->sgotplt->size
+    - elf64_x86_64_compute_jump_table_size (htab);
+  htab->sgotplt->size += 2 * GOT_ENTRY_SIZE;
+  h->got.offset = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (tls_type)
+  || GOT_TLS_GD_P (tls_type))
+ {
+  s = htab->sgot;
+  h->got.offset = s->size;
+  s->size += GOT_ENTRY_SIZE;
+  if (GOT_TLS_GD_P (tls_type))
+    s->size += GOT_ENTRY_SIZE;
+ }
       dyn = htab->elf.dynamic_sections_created;
       /* R_X86_64_TLSGD needs one dynamic relocation if local symbol
  and two if global.
  R_X86_64_GOTTPOFF needs one dynamic relocation.  */
-      if ((tls_type == GOT_TLS_GD && h->dynindx == -1)
+      if ((GOT_TLS_GD_P (tls_type) && h->dynindx == -1)
   || tls_type == GOT_TLS_IE)
  htab->srelgot->size += sizeof (Elf64_External_Rela);
-      else if (tls_type == GOT_TLS_GD)
+      else if (GOT_TLS_GD_P (tls_type))
  htab->srelgot->size += 2 * sizeof (Elf64_External_Rela);
-      else if ((ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
- || h->root.type != bfd_link_hash_undefweak)
+      else if (! GOT_TLS_GDESC_P (tls_type)
+       && (ELF_ST_VISIBILITY (h->other) == STV_DEFAULT
+   || h->root.type != bfd_link_hash_undefweak)
        && (info->shared
    || WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, 0, h)))
  htab->srelgot->size += sizeof (Elf64_External_Rela);
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  htab->srelplt->size += sizeof (Elf64_External_Rela);
+  htab->tlsdesc_plt = (bfd_vma) -1;
+ }
     }
   else
     h->got.offset = (bfd_vma) -1;
 
-  eh = (struct elf64_x86_64_link_hash_entry *) h;
   if (eh->dyn_relocs == NULL)
     return TRUE;
 
@@ -1575,6 +1678,7 @@
       bfd_signed_vma *local_got;
       bfd_signed_vma *end_local_got;
       char *local_tls_type;
+      bfd_vma *local_tlsdesc_gotent;
       bfd_size_type locsymcount;
       Elf_Internal_Shdr *symtab_hdr;
       asection *srel;
@@ -1618,20 +1722,43 @@
       locsymcount = symtab_hdr->sh_info;
       end_local_got = local_got + locsymcount;
       local_tls_type = elf64_x86_64_local_got_tls_type (ibfd);
+      local_tlsdesc_gotent = elf64_x86_64_local_tlsdesc_gotent (ibfd);
       s = htab->sgot;
       srel = htab->srelgot;
-      for (; local_got < end_local_got; ++local_got, ++local_tls_type)
+      for (; local_got < end_local_got;
+   ++local_got, ++local_tls_type, ++local_tlsdesc_gotent)
  {
+  *local_tlsdesc_gotent = (bfd_vma) -1;
   if (*local_got > 0)
     {
-      *local_got = s->size;
-      s->size += GOT_ENTRY_SIZE;
-      if (*local_tls_type == GOT_TLS_GD)
- s->size += GOT_ENTRY_SIZE;
+      if (GOT_TLS_GDESC_P (*local_tls_type))
+ {
+  *local_tlsdesc_gotent = htab->sgotplt->size
+    - elf64_x86_64_compute_jump_table_size (htab);
+  htab->sgotplt->size += 2 * GOT_ENTRY_SIZE;
+  *local_got = (bfd_vma) -2;
+ }
+      if (! GOT_TLS_GDESC_P (*local_tls_type)
+  || GOT_TLS_GD_P (*local_tls_type))
+ {
+  *local_got = s->size;
+  s->size += GOT_ENTRY_SIZE;
+  if (GOT_TLS_GD_P (*local_tls_type))
+    s->size += GOT_ENTRY_SIZE;
+ }
       if (info->shared
-  || *local_tls_type == GOT_TLS_GD
+  || GOT_TLS_GD_ANY_P (*local_tls_type)
   || *local_tls_type == GOT_TLS_IE)
- srel->size += sizeof (Elf64_External_Rela);
+ {
+  if (GOT_TLS_GDESC_P (*local_tls_type))
+    {
+      htab->srelplt->size += sizeof (Elf64_External_Rela);
+      htab->tlsdesc_plt = (bfd_vma) -1;
+    }
+  if (! GOT_TLS_GDESC_P (*local_tls_type)
+      || GOT_TLS_GD_P (*local_tls_type))
+    srel->size += sizeof (Elf64_External_Rela);
+ }
     }
   else
     *local_got = (bfd_vma) -1;
@@ -1653,6 +1780,34 @@
      sym dynamic relocs.  */
   elf_link_hash_traverse (&htab->elf, allocate_dynrelocs, (PTR) info);
 
+  /* For every jump slot reserved in the sgotplt, reloc_count is
+     incremented.  However, when we reserve space for TLS descriptors,
+     it's not incremented, so in order to compute the space reserved
+     for them, it suffices to multiply the reloc count by the jump
+     slot size.  */
+  if (htab->srelplt)
+    htab->sgotplt_jump_table_size
+      = elf64_x86_64_compute_jump_table_size (htab);
+
+  if (htab->tlsdesc_plt)
+    {
+      /* If we're not using lazy TLS relocations, don't generate the
+ PLT and GOT entries they require.  */
+      if ((info->flags & DF_BIND_NOW))
+ htab->tlsdesc_plt = 0;
+      else
+ {
+  htab->tlsdesc_got = htab->sgot->size;
+  htab->sgot->size += GOT_ENTRY_SIZE;
+  /* Reserve room for the initial entry.
+     FIXME: we could probably do away with it in this case.  */
+  if (htab->splt->size == 0)
+    htab->splt->size += PLT_ENTRY_SIZE;
+  htab->tlsdesc_plt = htab->splt->size;
+  htab->splt->size += PLT_ENTRY_SIZE;
+ }
+    }
+
   /* We now have determined the sizes of the various dynamic sections.
      Allocate memory for them.  */
   relocs = FALSE;
@@ -1676,7 +1831,8 @@
 
   /* We use the reloc_count field as a counter if we need
      to copy relocs into the output file.  */
-  s->reloc_count = 0;
+  if (s != htab->srelplt)
+    s->reloc_count = 0;
  }
       else
  {
@@ -1736,6 +1892,11 @@
       || !add_dynamic_entry (DT_PLTREL, DT_RELA)
       || !add_dynamic_entry (DT_JMPREL, 0))
     return FALSE;
+
+  if (htab->tlsdesc_plt
+      && (!add_dynamic_entry (DT_TLSDESC_PLT, 0)
+  || !add_dynamic_entry (DT_TLSDESC_GOT, 0)))
+    return FALSE;
  }
 
       if (relocs)
@@ -1763,6 +1924,41 @@
   return TRUE;
 }
 
+static bfd_boolean
+elf64_x86_64_always_size_sections (bfd *output_bfd,
+   struct bfd_link_info *info)
+{
+  asection *tls_sec = elf_hash_table (info)->tls_sec;
+
+  if (tls_sec)
+    {
+      struct elf_link_hash_entry *tlsbase;
+
+      tlsbase = elf_link_hash_lookup (elf_hash_table (info),
+      "_TLS_MODULE_BASE_",
+      FALSE, FALSE, FALSE);
+
+      if (tlsbase && tlsbase->type == STT_TLS)
+ {
+  struct bfd_link_hash_entry *bh = NULL;
+  const struct elf_backend_data *bed
+    = get_elf_backend_data (output_bfd);
+
+  if (!(_bfd_generic_link_add_one_symbol
+ (info, output_bfd, "_TLS_MODULE_BASE_", BSF_LOCAL,
+ tls_sec, 0, NULL, FALSE,
+ bed->collect, &bh)))
+    return FALSE;
+  tlsbase = (struct elf_link_hash_entry *)bh;
+  tlsbase->def_regular = 1;
+  tlsbase->other = STV_HIDDEN;
+  (*bed->elf_backend_hide_symbol) (info, tlsbase, TRUE);
+ }
+    }
+
+  return TRUE;
+}
+
 /* Return the base VMA address which should be subtracted from real addresses
    when resolving @dtpoff relocation.
    This is PT_TLS segment p_vaddr.  */
@@ -1821,6 +2017,7 @@
   Elf_Internal_Shdr *symtab_hdr;
   struct elf_link_hash_entry **sym_hashes;
   bfd_vma *local_got_offsets;
+  bfd_vma *local_tlsdesc_gotents;
   Elf_Internal_Rela *rel;
   Elf_Internal_Rela *relend;
 
@@ -1831,6 +2028,7 @@
   symtab_hdr = &elf_tdata (input_bfd)->symtab_hdr;
   sym_hashes = elf_sym_hashes (input_bfd);
   local_got_offsets = elf_local_got_offsets (input_bfd);
+  local_tlsdesc_gotents = elf64_x86_64_local_tlsdesc_gotent (input_bfd);
 
   rel = relocs;
   relend = relocs + input_section->reloc_count;
@@ -1842,7 +2040,7 @@
       struct elf_link_hash_entry *h;
       Elf_Internal_Sym *sym;
       asection *sec;
-      bfd_vma off;
+      bfd_vma off, offplt;
       bfd_vma relocation;
       bfd_boolean unresolved_reloc;
       bfd_reloc_status_type r;
@@ -2201,6 +2399,8 @@
   break;
 
  case R_X86_64_TLSGD:
+ case R_X86_64_GOTPC32_TLSDESC:
+ case R_X86_64_TLSDESC_CALL:
  case R_X86_64_GOTTPOFF:
   r_type = elf64_x86_64_tls_transition (info, r_type, h == NULL);
   tls_type = GOT_UNKNOWN;
@@ -2212,7 +2412,9 @@
       if (!info->shared && h->dynindx == -1 && tls_type == GOT_TLS_IE)
  r_type = R_X86_64_TPOFF32;
     }
-  if (r_type == R_X86_64_TLSGD)
+  if (r_type == R_X86_64_TLSGD
+      || r_type == R_X86_64_GOTPC32_TLSDESC
+      || r_type == R_X86_64_TLSDESC_CALL)
     {
       if (tls_type == GOT_TLS_IE)
  r_type = R_X86_64_GOTTPOFF;
@@ -2254,6 +2456,67 @@
   rel++;
   continue;
  }
+      else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
+ {
+  /* GDesc -> LE transition.
+     It's originally something like:
+     leaq x@tlsdesc(%rip), %rax
+
+     Change it to:
+     movl $x@tpoff, %rax
+
+     Registers other than %rax may be set up here.  */
+
+  unsigned int val, type, type2;
+  bfd_vma roff;
+
+  /* First, make sure it's a leaq adding rip to a
+     32-bit offset into any register, although it's
+     probably almost always going to be rax.  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff >= 3);
+  type = bfd_get_8 (input_bfd, contents + roff - 3);
+  BFD_ASSERT ((type & 0xfb) == 0x48);
+  type2 = bfd_get_8 (input_bfd, contents + roff - 2);
+  BFD_ASSERT (type2 == 0x8d);
+  val = bfd_get_8 (input_bfd, contents + roff - 1);
+  BFD_ASSERT ((val & 0xc7) == 0x05);
+  BFD_ASSERT (roff + 4 <= input_section->size);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x48 | ((type >> 2) & 1),
+     contents + roff - 3);
+  bfd_put_8 (output_bfd, 0xc7, contents + roff - 2);
+  bfd_put_8 (output_bfd, 0xc0 | ((val >> 3) & 7),
+     contents + roff - 1);
+  bfd_put_32 (output_bfd, tpoff (info, relocation),
+      contents + roff);
+  continue;
+ }
+      else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSDESC_CALL)
+ {
+  /* GDesc -> LE transition.
+     It's originally:
+     call *(%rax)
+     Turn it into:
+     nop; nop.  */
+
+  unsigned int val, type;
+  bfd_vma roff;
+
+  /* First, make sure it's a call *(%rax).  */
+  roff = rel->r_offset;
+  BFD_ASSERT (roff + 2 <= input_section->size);
+  type = bfd_get_8 (input_bfd, contents + roff);
+  BFD_ASSERT (type == 0xff);
+  val = bfd_get_8 (input_bfd, contents + roff + 1);
+  BFD_ASSERT (val == 0x10);
+
+  /* Now modify the instruction as appropriate.  */
+  bfd_put_8 (output_bfd, 0x90, contents + roff);
+  bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
+  continue;
+ }
       else
  {
   unsigned int val, type, reg;
@@ -2319,13 +2582,17 @@
     abort ();
 
   if (h != NULL)
-    off = h->got.offset;
+    {
+      off = h->got.offset;
+      offplt = elf64_x86_64_hash_entry (h)->tlsdesc_got;
+    }
   else
     {
       if (local_got_offsets == NULL)
  abort ();
 
       off = local_got_offsets[r_symndx];
+      offplt = local_tlsdesc_gotents[r_symndx];
     }
 
   if ((off & 1) != 0)
@@ -2335,30 +2602,61 @@
       Elf_Internal_Rela outrel;
       bfd_byte *loc;
       int dr_type, indx;
+      asection *sreloc;
 
       if (htab->srelgot == NULL)
  abort ();
 
+      indx = h && h->dynindx != -1 ? h->dynindx : 0;
+
+      if (GOT_TLS_GDESC_P (tls_type))
+ {
+  outrel.r_info = ELF64_R_INFO (indx, R_X86_64_TLSDESC);
+  BFD_ASSERT (htab->sgotplt_jump_table_size + offplt
+      + 2 * GOT_ENTRY_SIZE <= htab->sgotplt->size);
+  outrel.r_offset = (htab->sgotplt->output_section->vma
+     + htab->sgotplt->output_offset
+     + offplt
+     + htab->sgotplt_jump_table_size);
+  sreloc = htab->srelplt;
+  loc = sreloc->contents;
+  loc += sreloc->reloc_count++
+    * sizeof (Elf64_External_Rela);
+  BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+      <= sreloc->contents + sreloc->size);
+  if (indx == 0)
+    outrel.r_addend = relocation - dtpoff_base (info);
+  else
+    outrel.r_addend = 0;
+  bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
+ }
+
+      sreloc = htab->srelgot;
+
       outrel.r_offset = (htab->sgot->output_section->vma
  + htab->sgot->output_offset + off);
 
-      indx = h && h->dynindx != -1 ? h->dynindx : 0;
-      if (r_type == R_X86_64_TLSGD)
+      if (GOT_TLS_GD_P (tls_type))
  dr_type = R_X86_64_DTPMOD64;
+      else if (GOT_TLS_GDESC_P (tls_type))
+ goto dr_done;
       else
  dr_type = R_X86_64_TPOFF64;
 
       bfd_put_64 (output_bfd, 0, htab->sgot->contents + off);
       outrel.r_addend = 0;
-      if (dr_type == R_X86_64_TPOFF64 && indx == 0)
+      if ((dr_type == R_X86_64_TPOFF64
+   || dr_type == R_X86_64_TLSDESC) && indx == 0)
  outrel.r_addend = relocation - dtpoff_base (info);
       outrel.r_info = ELF64_R_INFO (indx, dr_type);
 
-      loc = htab->srelgot->contents;
-      loc += htab->srelgot->reloc_count++ * sizeof (Elf64_External_Rela);
+      loc = sreloc->contents;
+      loc += sreloc->reloc_count++ * sizeof (Elf64_External_Rela);
+      BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+  <= sreloc->contents + sreloc->size);
       bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
 
-      if (r_type == R_X86_64_TLSGD)
+      if (GOT_TLS_GD_P (tls_type))
  {
   if (indx == 0)
     {
@@ -2374,27 +2672,37 @@
       outrel.r_info = ELF64_R_INFO (indx,
     R_X86_64_DTPOFF64);
       outrel.r_offset += GOT_ENTRY_SIZE;
-      htab->srelgot->reloc_count++;
+      sreloc->reloc_count++;
       loc += sizeof (Elf64_External_Rela);
+      BFD_ASSERT (loc + sizeof (Elf64_External_Rela)
+  <= sreloc->contents + sreloc->size);
       bfd_elf64_swap_reloca_out (output_bfd, &outrel, loc);
     }
  }
 
+    dr_done:
       if (h != NULL)
  h->got.offset |= 1;
       else
  local_got_offsets[r_symndx] |= 1;
     }
 
-  if (off >= (bfd_vma) -2)
+  if (off >= (bfd_vma) -2
+      && ! GOT_TLS_GDESC_P (tls_type))
     abort ();
   if (r_type == ELF64_R_TYPE (rel->r_info))
     {
-      relocation = htab->sgot->output_section->vma
-   + htab->sgot->output_offset + off;
+      if (r_type == R_X86_64_GOTPC32_TLSDESC
+  || r_type == R_X86_64_TLSDESC_CALL)
+ relocation = htab->sgotplt->output_section->vma
+  + htab->sgotplt->output_offset
+  + offplt + htab->sgotplt_jump_table_size;
+      else
+ relocation = htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off;
       unresolved_reloc = FALSE;
     }
-  else
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSGD)
     {
       unsigned int i;
       static unsigned char tlsgd[8]
@@ -2434,6 +2742,77 @@
       rel++;
       continue;
     }
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_GOTPC32_TLSDESC)
+    {
+      /* GDesc -> IE transition.
+ It's originally something like:
+ leaq x@tlsdesc(%rip), %rax
+
+ Change it to:
+ movq x@gottpoff(%rip), %rax # before nop; nop
+
+ Registers other than %rax may be set up here.  */
+
+      unsigned int val, type, type2;
+      bfd_vma roff;
+
+      /* First, make sure it's a leaq adding rip to a 32-bit
+ offset into any register, although it's probably
+ almost always going to be rax.  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff >= 3);
+      type = bfd_get_8 (input_bfd, contents + roff - 3);
+      BFD_ASSERT ((type & 0xfb) == 0x48);
+      type2 = bfd_get_8 (input_bfd, contents + roff - 2);
+      BFD_ASSERT (type2 == 0x8d);
+      val = bfd_get_8 (input_bfd, contents + roff - 1);
+      BFD_ASSERT ((val & 0xc7) == 0x05);
+      BFD_ASSERT (roff + 4 <= input_section->size);
+
+      /* Now modify the instruction as appropriate.  */
+      /* To turn a leaq into a movq in the form we use it, it
+ suffices to change the second byte from 0x8d to
+ 0x8b.  */
+      bfd_put_8 (output_bfd, 0x8b, contents + roff - 2);
+
+      bfd_put_32 (output_bfd,
+  htab->sgot->output_section->vma
+  + htab->sgot->output_offset + off
+  - rel->r_offset
+  - input_section->output_section->vma
+  - input_section->output_offset
+  - 4,
+  contents + roff);
+      continue;
+    }
+  else if (ELF64_R_TYPE (rel->r_info) == R_X86_64_TLSDESC_CALL)
+    {
+      /* GDesc -> IE transition.
+ It's originally:
+ call *(%rax)
+
+ Change it to:
+ nop; nop.  */
+
+      unsigned int val, type;
+      bfd_vma roff;
+
+      /* First, make sure it's a call *(%eax).  */
+      roff = rel->r_offset;
+      BFD_ASSERT (roff + 2 <= input_section->size);
+      type = bfd_get_8 (input_bfd, contents + roff);
+      BFD_ASSERT (type == 0xff);
+      val = bfd_get_8 (input_bfd, contents + roff + 1);
+      BFD_ASSERT (val == 0x10);
+
+      /* Now modify the instruction as appropriate.  */
+      bfd_put_8 (output_bfd, 0x90, contents + roff);
+      bfd_put_8 (output_bfd, 0x90, contents + roff + 1);
+
+      continue;
+    }
+  else
+    BFD_ASSERT (FALSE);
   break;
 
  case R_X86_64_TLSLD:
@@ -2672,7 +3051,7 @@
     }
 
   if (h->got.offset != (bfd_vma) -1
-      && elf64_x86_64_hash_entry (h)->tls_type != GOT_TLS_GD
+      && ! GOT_TLS_GD_ANY_P (elf64_x86_64_hash_entry (h)->tls_type)
       && elf64_x86_64_hash_entry (h)->tls_type != GOT_TLS_IE)
     {
       Elf_Internal_Rela rela;
@@ -2827,6 +3206,18 @@
   dyn.d_un.d_val -= s->size;
  }
       break;
+
+    case DT_TLSDESC_PLT:
+      s = htab->splt;
+      dyn.d_un.d_ptr = s->output_section->vma + s->output_offset
+ + htab->tlsdesc_plt;
+      break;
+
+    case DT_TLSDESC_GOT:
+      s = htab->sgot;
+      dyn.d_un.d_ptr = s->output_section->vma + s->output_offset
+ + htab->tlsdesc_got;
+      break;
     }
 
   bfd_elf64_swap_dyn_out (output_bfd, &dyn, dyncon);
@@ -2861,6 +3252,40 @@
 
   elf_section_data (htab->splt->output_section)->this_hdr.sh_entsize =
     PLT_ENTRY_SIZE;
+
+  if (htab->tlsdesc_plt)
+    {
+      bfd_put_64 (output_bfd, (bfd_vma) 0,
+  htab->sgot->contents + htab->tlsdesc_got);
+
+      memcpy (htab->splt->contents + htab->tlsdesc_plt,
+      elf64_x86_64_plt0_entry,
+      PLT_ENTRY_SIZE);
+
+      /* Add offset for pushq GOT+8(%rip), since the
+ instruction uses 6 bytes subtract this value.  */
+      bfd_put_32 (output_bfd,
+  (htab->sgotplt->output_section->vma
+   + htab->sgotplt->output_offset
+   + 8
+   - htab->splt->output_section->vma
+   - htab->splt->output_offset
+   - htab->tlsdesc_plt
+   - 6),
+  htab->splt->contents + htab->tlsdesc_plt + 2);
+      /* Add offset for jmp *GOT+TDG(%rip), where TGD stands for
+ htab->tlsdesc_got. The 12 is the offset to the end of
+ the instruction.  */
+      bfd_put_32 (output_bfd,
+  (htab->sgot->output_section->vma
+   + htab->sgot->output_offset
+   + htab->tlsdesc_got
+   - htab->splt->output_sec
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alan Modra
On Sat, Jan 14, 2006 at 12:57:47PM -0500, Alexandre Oliva wrote:
> One more update.  This time I've modified a little bit the code
> generated to pad relaxations, to get the best performance according to
> my benchmarking (not a lot of difference, but still), and adjusted the
> testsuite to match.
>
> Ok to install?

Yes, looks OK to me.

--
Alan Modra
IBM OzLabs - Linux Technology Centre
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
In reply to this post by Menezes, Evandro
Hi, Evandro,

Sorry that it took so long for me to get back to you after the GCC
Summit.  I've been quite busy and couldn't focus on this issue for a
while.

Here's an updated patch the should address all of your concerns.  The
proposed ABI changes haven't changed at all for almost a year, and in
the mean time we've ported it to one more platform (ARM), so I believe
this is rock solid now.

Let me know what you think about the proposed changes.  They document
what's implemented in GNU binutils, GCC and the pending patches I have
for glibc, that I'm retesting after updating them to a current tree.

Thanks,


for ChangeLog
from  Alexandre Oliva  <[hidden email]>

        * object-files.tex (Relocation Types): Add
        R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL and
        R_X86_64_TLSDESC.  Add pointer to description.  Add short
        description of all TLS relocations.  Fix typo in DTPMOD64.
        * dl.tex (Procedure Linkage Table): Mention lazy relocation of TLS
        descriptors.  Add short description.

Index: dl.tex
===================================================================
--- dl.tex.orig 2006-10-08 16:53:13.000000000 -0300
+++ dl.tex 2006-10-08 17:39:44.000000000 -0300
@@ -265,6 +265,22 @@ evaluates procedure linkage table entrie
 resolution and relocation until the first execution of a table entry.
 \index{procedure linkage table|)}
 
+Relocation entries of type \codeindex{R_X86_64_TLSDESC} may also be
+subject to lazy relocation, using a single entry in the procedure
+linkage table and in the global offset table, at locations given by
+\texttt{DT_TLSDESC_PLT} and \texttt{DT_TLSDESC_GOT}, respectively, as
+described in ``Thread-Local Storage Descriptors for IA32 and
+AMD64/EM64T''\footnote{This document is currently available via
+  \url{http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
+
+For self-containment, \texttt{DT_TLSDESC_GOT} specifies a GOT entry in
+which the dynamic loader should store the address of its internal TLS
+Descriptor resolver function, whereas \texttt{DT_TLSDESC_PLT}
+specifies the address of a PLT entry to be used as the TLS descriptor
+resolver function for lazy resolution from within this module.  The
+PLT entry must push the linkmap of the module onto the stack and
+tail-call the internal TLS Descriptor resolver function.
+
 \subsubsection{Large Models}
 
 In the small and medium code models the size of both the PLT and the GOT
Index: object-files.tex
===================================================================
--- object-files.tex.orig 2006-10-08 16:53:13.000000000 -0300
+++ object-files.tex 2006-10-08 17:46:49.000000000 -0300
@@ -435,7 +435,7 @@ the relocation addend.
       \texttt{R_X86_64_PC16}  & 13 & \textit{word16} & \texttt{S + A - P} \\
       \texttt{R_X86_64_8}     & 14 & \textit{word8} & \texttt{S + A} \\
       \texttt{R_X86_64_PC8}   & 15 & \textit{word8} & \texttt{S + A - P} \\
-      \texttt{R_X86_64_DPTMOD64}   & 16 & \textit{word64} &  \\
+      \texttt{R_X86_64_DTPMOD64}   & 16 & \textit{word64} &  \\
       \texttt{R_X86_64_DTPOFF64}   & 17 & \textit{word64} &  \\
       \texttt{R_X86_64_TPOFF64}   & 18 & \textit{word64} &  \\
       \texttt{R_X86_64_TLSGD}   & 19 & \textit{word32} &  \\
@@ -448,6 +448,9 @@ the relocation addend.
       \texttt{R_X86_64_GOTPC32} & 26 & \textit{word32} & \texttt{GOT + A - P} \\
       \texttt{R_X86_64_SIZE32} & 32 & \textit{word32} & \texttt{Z + A} \\
       \texttt{R_X86_64_SIZE64} & 33 & \textit{word64} & \texttt{Z + A} \\
+      \texttt{R_X86_64_GOTPC32_TLSDESC} & 34 & \textit{word32} &  \\
+      \texttt{R_X86_64_TLSDESC_CALL} & 35 & none &  \\
+      \texttt{R_X86_64_TLSDESC} & 36 & \textit{word64}$\times 2$ & \\
 %      \texttt{R_X86_64_GOT64} & 16 & \textit{word64} & \texttt{G + A} \\
 %      \texttt{R_X86_64_PLT64} & 17 & \textit{word64} & \texttt{L + A - P} \\
     \end{tabular}
@@ -469,6 +472,7 @@ to those used for the \intelabi.  \footn
   loading the offset into a displacement register; the base plus
   immediate displacement addressing form can be used.}
 
+\begin{sloppypar}
 The \texttt{R_X86_64_GOTPCREL} relocation has different semantics from the
 \texttt{R_X86_64_GOT32} or equivalent i386 \texttt{R_I386_GOTPC} relocation.
 In particular, because the \xARCH architecture has an addressing mode relative
@@ -477,6 +481,7 @@ using a single instruction.  The calcula
 \texttt{R_X86_64_GOTPCREL} relocation gives the difference between the location
 in the GOT where the symbol's address is given and the location where the
 relocation is applied.
+\end{sloppypar}
 
 \begin{sloppypar}
 The \texttt{R_X86_64_32} and \texttt{R_X86_64_32S} relocations truncate
@@ -492,19 +497,72 @@ relocations is not conformant to this AB
 added for documentation purposes.  The \texttt{R_X86_64_16}, and
 \texttt{R_X86_64_8} relocations truncate the computed value to 16-bits
 resp. 8-bits.
+\end{sloppypar}
 
-The relocations \texttt{R_X86_64_DPTMOD64},
-\texttt{R_X86_64_DTPOFF64}, \texttt{R_X86_64_TPOFF64} ,
-\texttt{R_X86_64_TLSGD} , \texttt{R_X86_64_TLSLD} ,
+\begin{sloppypar}
+The relocations \texttt{R_X86_64_DTPMOD64},
+\texttt{R_X86_64_DTPOFF64}, \texttt{R_X86_64_TPOFF64},
+\texttt{R_X86_64_TLSGD}, \texttt{R_X86_64_TLSLD},
 \texttt{R_X86_64_DTPOFF32}, \texttt{R_X86_64_GOTTPOFF} and
 \texttt{R_X86_64_TPOFF32} are listed for completeness.  They are part
 of the Thread-Local Storage ABI extensions and are documented in the
 document called ``ELF Handling for Thread-Local
 Storage''\footnote{This document is currently available via
-  \url{http://people.redhat.com/drepper/tls.pdf}}\index{Thread-Local Storage}.
+  \url{http://people.redhat.com/drepper/tls.pdf}}\index{Thread-Local
+  Storage}.  The relocations \texttt{R_X86_64_GOTPC32_TLSDESC},
+\texttt{R_X86_64_TLSDESC_CALL} and \texttt{R_X86_64_TLSDESC} are also
+used for Thread-Local Storage, but are not documented there as of this
+writing.  A description can be found in the document ``Thread-Local
+Storage Descriptors for IA32 and AMD64/EM64T''\footnote{This document
+  is currently available via
+  \url{http://people.redhat.com/aoliva/writeups/TLS/RFC-TLSDESC-x86.txt}}.
+\end{sloppypar}
+
+In order to make this document self-contained, a description of the
+TLS relocations follows.
 
+\begin{sloppypar}
+\texttt{R_X86_64_DTPMOD64} resolves to the index of the dynamic thread
+vector entry that points to the base address of the TLS block
+corresponding to the module that defines the referenced symbol.
+\texttt{R_X86_64_DTPOFF64} and \texttt{R_X86_64_DTPOFF32} compute the
+offset from the pointer in that entry to the referenced symbol.  The
+linker generates such relocations in adjacent entries in the GOT, in
+response to \texttt{R_X86_64_TLSGD} and \texttt{R_X86_64_TLSLD}
+relocations.  If the linker can compute the offset itself, because the
+referenced symbol binds locally, the \texttt{DTPOFF} may be omitted.
+Otherwise, such relocations are always in pairs, such that the
+\texttt{DTPOFF64} relocation applies to the word64 right past the
+corresponding \texttt{DTPMOD} relocation.
 \end{sloppypar}
 
+\texttt{R_X86_64_TPOFF64} and \texttt{R_X86_64_TPOFF32} resolve to the
+offset from the thread pointer to a thread-local variable.  The former
+is generated in response to \texttt{R_X86_64_GOTTPOFF}, that resolves
+to a PC-relative address of a GOT entry containing such a 64-bit
+offset.
+
+\texttt{R_X86_64_TLSGD} and \texttt{R_X86_64_TLSLD} both resolve to
+PC-relative offsets to a \texttt{DTPMOD} GOT entry.  The difference
+between them is that, for \texttt{TLSGD}, the following GOT entry will
+contain the offset of the referenced symbol into its TLS block,
+whereas, for \texttt{TLSLD}, the following GOT entry will contain the
+offset for the base address of the TLS block.  The idea is that adding
+this offset to the result of \texttt{DTPMOD32} for a symbol ought to
+yield the same as the result of \texttt{DTPMOD64} for the same symbol.
+
+\texttt{R_X86_64_TLSDESC} resolves to a pair of word64s, called TLS
+Descriptor, the first of which is a pointer to a function, followed by
+an argument.  The function is passed a pointer to the this pair of
+entries in \%rax and, using the argument in the second entry, it must
+compute and return in \%rax the offset from the thread pointer to the
+symbol referenced in the relocation, without modifying any registers
+other than processor flags.  \texttt{R_X86_64_GOTPC32_TLSDESC}
+resolves to the PC-relative address of a TLS descriptor corresponding
+to the named symbol.  \texttt{R_X86_64_TLSDESC_CALL} must annotate the
+instruction used to call the TLS Descriptor resolver function, so as
+to enable relaxation of that instruction.
+
 \subsection{Large Models}
 
 In order to extend both the PLT and the GOT beyond 2GB, it



On Sep 19, 2005, "Menezes, Evandro" <[hidden email]> wrote:

> Alexandre,
>> Please read the document referenced in the patch, for
>> starters.  In it you'll see the exact spelling of the coding
>> samples is not final yet, and it doesn't make sense to
>> maintain yet another copy of this until it settles down.  

> When it does, it'll be added to the ABI then.  Not before.  For now, it's OK to reserve the relocation numbers in this mailing list.  

>> Also, you'll find that the calculations are not quite
>> possible to express in the way other relocations are
>> expressed; suggestions are welcome.  

> State so, perhaps in a note, expanding what they mean.

>> Finally, what's wrong
>> with following the existing practice of referring to TLS
>> specs elsewhere?

> The intent is that the x86-64 ABI remains a stand-alone document as much as possible.  It's not quite there yet, but adding yet another external reference sets it back even further.

> BTW, the TLS reference is slated to be incorporated into the x86-64 ABI.

>> The point of this posting was more to reserve the relocation
>> numbers for these purposes (the purpose of the relocations is
>> quite solid already, even though the numbers have changed as
>> recently as yesterday), but I'm yet to do some more
>> performance tests with some minor variations of the code
>> sequences to choose the best one.  I don't want to have to
>> maintain all this information in sync between multiple specs
>> documents and the several different packages that implement
>> them; having a single specs document is much better for now.

> That's fine.  When it reaches a mature state, patches against the ABI will be more than welcome.

--
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin America        http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Michael Matz
Hi Alexandre,

On Sun, 8 Oct 2006, Alexandre Oliva wrote:

> Here's an updated patch the should address all of your concerns.  The
> proposed ABI changes haven't changed at all for almost a year, and in
> the mean time we've ported it to one more platform (ARM), so I believe
> this is rock solid now.

Thanks for the update.  From my perspective we should include it in the
ABI document.  I'll wait a day or two in case there are concerns.


Ciao,
Michael.
Reply | Threaded
Open this post in threaded view
|

RE: RFC: TLS improvements for IA32 and AMD64/EM64T

Menezes, Evandro
In reply to this post by Alexandre Oliva-2
Hi, Alexandre.

> Here's an updated patch the should address all of your concerns.  The
> proposed ABI changes haven't changed at all for almost a year, and in
> the mean time we've ported it to one more platform (ARM), so I believe
> this is rock solid now.

It looks good and the patch is pretty informative.  However, there are some statements that may not be as clear as they could be, so I was thinking if the changes to your patch in attach seem reasonable.

Would you consider adding the calculations for the new relocations in order to improve their clarity?  The original paper on TLS goes on about them a bit, but it wouldn't be a bad idea to have the psABI document more stand-alone.

I remember some examples in your paper at the GCC Summit and adding them to section 3.5 would be swell too.

> Let me know what you think about the proposed changes.  They document
> what's implemented in GNU binutils, GCC and the pending patches I have
> for glibc, that I'm retesting after updating them to a current tree.

From your paper at the GCC Summit it's quite clear that such additions to the psABI would be a fine idea.  Perhaps HJ would like to consider the corresponding additions for the i386 psABI extension.

So, there's no question about the technical part of your proposal.  But, as you can infer from my comments above, I'd like to improve the clarity of the psABI so that one wouldn't have to go to specific implementations to figure out the details.  What do you think?

Thank you,

--
_______________________________________________________
Evandro Menezes               AMD            Austin, TX


tlsdesc.patch.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: TLS improvements for IA32 and AMD64/EM64T

Alexandre Oliva-2
On Oct  9, 2006, "Menezes, Evandro" <[hidden email]> wrote:

> Would you consider adding the calculations for the new relocations
> in order to improve their clarity?

I can try, although relaxations make it much trickier than it might
seem.

> I remember some examples in your paper at the GCC Summit and adding
> them to section 3.5 would be swell too.

So we're talking *really* self-contained, eh?  Fair enough, I'll take
a shot.

> From your paper at the GCC Summit it's quite clear that such
> additions to the psABI would be a fine idea.  Perhaps HJ would like
> to consider the corresponding additions for the i386 psABI
> extension.

H.J., do you have the i386 psABI in source form somewhere I could get
it, to make the corresponding changes?

> So, there's no question about the technical part of your proposal.
> But, as you can infer from my comments above, I'd like to improve
> the clarity of the psABI so that one wouldn't have to go to specific
> implementations to figure out the details.  What do you think?

Sounds like a reasonable goal.

> -+referenced symbol binds locally, the \texttt{DTPOFF} may be omitted.
> ++referenced symbol binds locally, the relocations \texttt{R_X86_64_64} and \texttt{R_X86_64_32} may be used instead.

No, in such cases the linker omits the relocation entirely, and fills
the corresponding stop with the value it can compute itself.

>  +Otherwise, such relocations are always in pairs, such that the
> -+\texttt{DTPOFF64} relocation applies to the word64 right past the
> -+corresponding \texttt{DTPMOD} relocation.
> ++\texttt{R_X86_64_DTPOFF64} relocation applies to the word64 right past the
> ++corresponding \texttt{R_X86_64_DTPMOD64} relocation.

Ok, I've added R_X86_64_ everywhere I'd omitted it.

Please expect an updated patch soon.

If you'd rather install a patch with these minor modifications and
keep the more detailed patch separate, let me know and I'll send you
what I have right away.

--
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin America        http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva@{redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva@{lsd.ic.unicamp.br, gnu.org}
Reply | Threaded
Open this post in threaded view
|

RE: RFC: TLS improvements for IA32 and AMD64/EM64T

Menezes, Evandro
Oi, Alexandre.

> > Would you consider adding the calculations for the new relocations
> > in order to improve their clarity?
>
> I can try, although relaxations make it much trickier than it might
> seem.

Where possible.  If it gets too hairy, the text should spill the beans instead.
 
> > I remember some examples in your paper at the GCC Summit and adding
> > them to section 3.5 would be swell too.
>
> So we're talking *really* self-contained, eh?  Fair enough, I'll take
> a shot.

I appreciate it.  It sure makes this psABI much more solid.  As is, it already refers to too many external ABIs, such as SY, i386 and C++, so I think that it exhausted its credit for external references. :-)
 
> > From your paper at the GCC Summit it's quite clear that such
> > additions to the psABI would be a fine idea.  Perhaps HJ would like
> > to consider the corresponding additions for the i386 psABI
> > extension.
>
> H.J., do you have the i386 psABI in source form somewhere I could get
> it, to make the corresponding changes?

Actually, it's about an extension to the i386 psABI and it's an idea still in its infancy: http://sourceware.org/ml/binutils/2006-09/msg00342.html.

> > So, there's no question about the technical part of your proposal.
> > But, as you can infer from my comments above, I'd like to improve
> > the clarity of the psABI so that one wouldn't have to go to specific
> > implementations to figure out the details.  What do you think?
>
> Sounds like a reasonable goal.
>
> > -+referenced symbol binds locally, the \texttt{DTPOFF} may
> be omitted.
> > ++referenced symbol binds locally, the relocations
> \texttt{R_X86_64_64} and \texttt{R_X86_64_32} may be used instead.
>
> No, in such cases the linker omits the relocation entirely, and fills
> the corresponding stop with the value it can compute itself.

Then how would you phrase it?

> If you'd rather install a patch with these minor modifications and
> keep the more detailed patch separate, let me know and I'll send you
> what I have right away.

That sounds like a fine idea.  As I haven't heard comments in contrary, there seems to be an unspoken agreement that it should be added.  Feel free to send the patch with minor changes and if we don't hear anything against it until the Oct 12 (GMT), I'll apply it.

Thank you,

--
_______________________________________________________
Evandro Menezes               AMD            Austin, TX



Reply | Threaded
Open this post in threaded view
|

A public discussion group for IA32 psABI

H.J. Lu-27
On Tue, Oct 10, 2006 at 11:21:41AM -0500, Menezes, Evandro wrote:

> > H.J., do you have the i386 psABI in source form somewhere I could get
> > it, to make the corresponding changes?
>
> Actually, it's about an extension to the i386 psABI and it's an idea still in its infancy: http://sourceware.org/ml/binutils/2006-09/msg00342.html.
>

Some people told that FSG might not be the best place to start the
public IA32 psABI discussion group. I created one at

http://groups-beta.google.com/group/ia32-abi

We can reconsider if it should be moved to FSG later.

Alexandre, could you please upload your IA32 psABI extension proposal
to it?

Thanks.


H.J.