Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
+H J Lu

On Thu, Jan 4, 2018 at 9:21 AM, Sriraman Tallam <[hidden email]> wrote:

>
> TLDR; Introduce the "retpoline" mitigation technique for variant #2 of the
> speculative execution vulnerabilities on Intel (and perhaps other) CPUs,
> specifically identified by CVE-2017-5715 and in some places called
> “spectre”. Retpoline PLTs can be enabled by using the linker flag
> “-z,retpolineplt” on x86-64 only.  Patch attached for the gold linker.
>
> This "retpoline" mitigation is fully described in the following blog post:
> https://support.google.com/faqs/answer/7625886
>
>
> Description of retpoline PLT
> -------------------------------------
>
> A standard PLT entry looks like this:
>
>
> 4005d0:  jmpq   *0x1a12(%rip)        # 401fe8 <_GLOBAL_OFFSET_TABLE_+0x18>
> 4005d6:  pushq  $0x0
> 4005db:  jmpq   4005c0 <_init+0x20>
>
>
> It is 16 byte aligned and 16 bytes in size and has three instructions.
>
>
> A retpoline PLT entry looks like this:
>
>
> 4005e0:  mov    0x1a01(%rip),%r11        # 401fe8
> <_GLOBAL_OFFSET_TABLE_+0x18>
> 4005e7:  callq  4005f0 <_Z13ethethopolinev@plt>
> 4005ec:  pause
> 4005ee:  jmp    4005ec <__gmon_start__@plt+0xc>
> 00000000004005f0 <_Z13ethethopolinev@plt>:
> 4005f0:  mov    %r11,(%rsp)
> 4005f4:  retq
> 4005f5:  pushq  $0x0
> 4005fa:  jmpq   4005c0 <_init+0x20>
>
> It is 32 byte aligned and 32 bytes in size. The retpoline PLT entry retains
> the last two instructions from the standard PLT entry to support lazy
> binding.  However, the first indirect jump instruction is replaced by a 6
> instruction code sequence which moves the target address of the jump to
> register r11 and calls a function that returns to the target address by
> manipulating the stack.
>
>
> Future optimizations around this work will include -z,now support for a
> 16-byte entry.
>
>
> What you should know?
> --------------------------------
>
> * Techniques such as PGO and LTO dramatically reduce the impact of hot
> indirect calls (by speculatively promoting them to direct calls). If you
> need to deploy these techniques in C++ applications, we *strongly* recommend
> that you ensure all hot call targets are statically linked (avoiding PLT
> indirection) and use both PGO and LTO. Well tuned servers using all of these
> techniques saw 5% - 10% overhead from the use of the full retpoline
> mitigation (including compiler support).
>
> * Binutils tools like readelf and objdump will not disassemble the PLT
> section accurately as they assume that the PLT entry size is 16 bytes.  A
> patch to fix this is in progress.
>
>
> Testing:
> ---------
>
> Checked that all gold tests pass when retpoline PLT is switched on.  Also
> built and ran a huge search benchmark with retpoline PLT enabled.
>
>
> ChangeLog:
> ----------------
>
> * options.h (retpolineplt): New -z option to use retpoline PLT.
> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
> the option is used.
> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
> * testsuite/Makefile.in: Regenerate.
> * testsuite/retpoline_plt_1.sh: New test script.
> * testsuite/retpoline_plt_1.s: New test source.
>
> Many thanks to Chandler, Reid, Eric, Rui and Brooks!
>
>
> Patch attached.
>
> Thanks
> Sri
>

retpoline_plt_patch.txt (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

ccoutant
> * options.h (retpolineplt): New -z option to use retpoline PLT.
> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
> the option is used.
> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
> * testsuite/Makefile.in: Regenerate.
> * testsuite/retpoline_plt_1.sh: New test script.
> * testsuite/retpoline_plt_1.s: New test source.

This makes the -z bndplt and -z retpolineplt options mutually
exclusive. Please add a check in options.cc
(General_options::finalize) for this.

Will we be seeing an aarch64 patch along these same lines soon? As I
understand it, 64-bit ARM is susceptible to Spectre, but 32-bit ARM is
not (because 32-bit chips don't do any OOO execution). I haven't seen
a clear statement about other architectures like Sparc and PPC.

-cary
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
Hi Cary,

  Thanks for reviewing.

On Thu, Jan 4, 2018 at 3:08 PM, Cary Coutant <[hidden email]> wrote:

>> * options.h (retpolineplt): New -z option to use retpoline PLT.
>> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
>> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
>> the option is used.
>> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
>> * testsuite/Makefile.in: Regenerate.
>> * testsuite/retpoline_plt_1.sh: New test script.
>> * testsuite/retpoline_plt_1.s: New test source.
>
> This makes the -z bndplt and -z retpolineplt options mutually
> exclusive. Please add a check in options.cc
> (General_options::finalize) for this.
Done and new patch attached.


* options.h (retpolineplt): New -z option to use retpoline PLT.
* options.cc (General_options::finalize): Check if both retpolineplt
and bndplt are turned on.
* x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
(Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
the option is used.
* testsuite/Makefile.am (retpoline_plt_1.sh): New test.
* testsuite/Makefile.in: Regenerate.
* testsuite/retpoline_plt_1.sh: New test script.
* testsuite/retpoline_plt_1.s: New test source.

>
> Will we be seeing an aarch64 patch along these same lines soon? As I
> understand it, 64-bit ARM is susceptible to Spectre, but 32-bit ARM is
> not (because 32-bit chips don't do any OOO execution). I haven't seen
> a clear statement about other architectures like Sparc and PPC.

retpoline is currently an x86-specific mitigation. Other architectural
mitigations may be forthcoming, but I'm not actively working on them.

Thanks
Sri

>
> -cary

retpoline_plt_patch.txt (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

ccoutant
>>> * options.h (retpolineplt): New -z option to use retpoline PLT.
>>> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
>>> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
>>> the option is used.
>>> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
>>> * testsuite/Makefile.in: Regenerate.
>>> * testsuite/retpoline_plt_1.sh: New test script.
>>> * testsuite/retpoline_plt_1.s: New test source.
>>
>> This makes the -z bndplt and -z retpolineplt options mutually
>> exclusive. Please add a check in options.cc
>> (General_options::finalize) for this.
>
> Done and new patch attached.

+  if (this->bndplt() && this->retpolineplt())
+    {
+      gold_fatal(_("-z,bndplt and -z,retpolineplt are mutually exclusive"));
+    }

Replace the commas with spaces -- the options are only spelled with
commas when using the compiler -Wl,... syntax.

OK with that change.

-cary
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Florian Weimer-5
In reply to this post by ccoutant
On 01/05/2018 12:08 AM, Cary Coutant wrote:

>> * options.h (retpolineplt): New -z option to use retpoline PLT.
>> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
>> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
>> the option is used.
>> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
>> * testsuite/Makefile.in: Regenerate.
>> * testsuite/retpoline_plt_1.sh: New test script.
>> * testsuite/retpoline_plt_1.s: New test source.
>
> This makes the -z bndplt and -z retpolineplt options mutually
> exclusive. Please add a check in options.cc
> (General_options::finalize) for this.

It's also incompatible with shadow stack support, so the binary marker
for that needs to be removed.

I don't think this is the right approach at all.  What is this trying to
accomplish?  What kind of speculation barrier does this implement on
current CPUs?  Isn't this *extremely* costly?

If we think this is a problem that needs to be fixed, we should remove
the indirect call altogether, and have the dynamic linker generate a
direct call at load time.  There are few constraints associated with
that (4 GiB total application + DSO size, some SELinux users will
unhappy, lack of lazy binding support), but at least it can be turned on
in practice.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
On Fri, Jan 5, 2018 at 3:42 AM, Florian Weimer <[hidden email]> wrote:

> On 01/05/2018 12:08 AM, Cary Coutant wrote:
>>>
>>> * options.h (retpolineplt): New -z option to use retpoline PLT.
>>> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
>>> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
>>> the option is used.
>>> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
>>> * testsuite/Makefile.in: Regenerate.
>>> * testsuite/retpoline_plt_1.sh: New test script.
>>> * testsuite/retpoline_plt_1.s: New test source.
>>
>>
>> This makes the -z bndplt and -z retpolineplt options mutually
>> exclusive. Please add a check in options.cc
>> (General_options::finalize) for this.
>
>
> It's also incompatible with shadow stack support, so the binary marker for
> that needs to be removed.
>
> I don't think this is the right approach at all.  What is this trying to
> accomplish?  What kind of speculation barrier does this implement on current
> CPUs?  Isn't this *extremely* costly?

As I understand it (and I may not) this option would be appropriate to
use with dynamically linked programs that hold high value private
information, run on the same machine as untrusted code, and have some
communication path with untrusted code.  If the untrusted code can
cause the trusted program to make indirect branches, it can extract
information from the trusted program's address space.  No specific
control is needed over where the untrusted branch goes; the attack
works by having the untrusted program seed the branch target buffer,
thus causing the processor to speculatively execute instructions that
would never normally be executed.  The results of that speculative
execution, though discarded, will affect the memory cache, and this
can be used to read the address space of the trusted program.  The
attack only works if there is some way to cause the trusted program to
execute an indirect branch; for many programs the PLT provides such a
mechanism, as many communication paths with the trusted program will
cause the trusted program to make some sort of libc call.

It seems to me that it would be appropriate to use this option with
programs like web browsers and ssh-agent, which are long-lived, hold
high value private information, and by the nature of their operation
must communicate with untrusted code.

The change is clearly extremely costly for programs that spend all of
their time making calls through the PLT.  I doubt that describes the
kinds of programs that need to worry about this vulnerability.


> If we think this is a problem that needs to be fixed, we should remove the
> indirect call altogether, and have the dynamic linker generate a direct call
> at load time.  There are few constraints associated with that (4 GiB total
> application + DSO size, some SELinux users will unhappy, lack of lazy
> binding support), but at least it can be turned on in practice.

I agree that that is a good idea.  If we can make that change in the
dynamic linker then after a few years, when we can assume that the
updated dynamic linker is deployed everywhere, then we can discard
this linker option.  Of course, I expect that processors will be
modified to prevent this attack over time as well.

Ian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
On Fri, Jan 5, 2018 at 7:00 AM, Ian Lance Taylor <[hidden email]> wrote:

> On Fri, Jan 5, 2018 at 3:42 AM, Florian Weimer <[hidden email]> wrote:
>> On 01/05/2018 12:08 AM, Cary Coutant wrote:
>>>>
>>>> * options.h (retpolineplt): New -z option to use retpoline PLT.
>>>> * x86_64.cc (Output_data_plt_x86_64_retpoline): New class.
>>>> (Target_x86_64<64>::do_make_data_plt): Create retpoline PLT when
>>>> the option is used.
>>>> * testsuite/Makefile.am (retpoline_plt_1.sh): New test.
>>>> * testsuite/Makefile.in: Regenerate.
>>>> * testsuite/retpoline_plt_1.sh: New test script.
>>>> * testsuite/retpoline_plt_1.s: New test source.
>>>
>>>
>>> This makes the -z bndplt and -z retpolineplt options mutually
>>> exclusive. Please add a check in options.cc
>>> (General_options::finalize) for this.
>>
>>
>> It's also incompatible with shadow stack support, so the binary marker for
>> that needs to be removed.
>>
>> I don't think this is the right approach at all.  What is this trying to
>> accomplish?  What kind of speculation barrier does this implement on current
>> CPUs?  Isn't this *extremely* costly?
>
> As I understand it (and I may not) this option would be appropriate to
> use with dynamically linked programs that hold high value private
> information, run on the same machine as untrusted code, and have some
> communication path with untrusted code.  If the untrusted code can
> cause the trusted program to make indirect branches, it can extract
> information from the trusted program's address space.  No specific
> control is needed over where the untrusted branch goes; the attack
> works by having the untrusted program seed the branch target buffer,
> thus causing the processor to speculatively execute instructions that
> would never normally be executed.  The results of that speculative
> execution, though discarded, will affect the memory cache, and this
> can be used to read the address space of the trusted program.  The
> attack only works if there is some way to cause the trusted program to
> execute an indirect branch; for many programs the PLT provides such a
> mechanism, as many communication paths with the trusted program will
> cause the trusted program to make some sort of libc call.
>
> It seems to me that it would be appropriate to use this option with
> programs like web browsers and ssh-agent, which are long-lived, hold
> high value private information, and by the nature of their operation
> must communicate with untrusted code.

Thanks for the summary!

>
> The change is clearly extremely costly for programs that spend all of
> their time making calls through the PLT.  I doubt that describes the
> kinds of programs that need to worry about this vulnerability.
>
>
>> If we think this is a problem that needs to be fixed, we should remove the
>> indirect call altogether, and have the dynamic linker generate a direct call
>> at load time.  There are few constraints associated with that (4 GiB total
>> application + DSO size, some SELinux users will unhappy, lack of lazy
>> binding support), but at least it can be turned on in practice.

How practical is this really for 64-bit address space where libc is
not mapped close to the binary.  We are looking at this but this if
the call is beyond 4G which is more often the case for shared object
calls, like you noted, this will not work.

>
> I agree that that is a good idea.  If we can make that change in the
> dynamic linker then after a few years, when we can assume that the
> updated dynamic linker is deployed everywhere, then we can discard
> this linker option.  Of course, I expect that processors will be
> modified to prevent this attack over time as well.
>
> Ian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Florian Weimer-5
On 01/05/2018 06:51 PM, Sriraman Tallam wrote:
>>> If we think this is a problem that needs to be fixed, we should remove the
>>> indirect call altogether, and have the dynamic linker generate a direct call
>>> at load time.  There are few constraints associated with that (4 GiB total
>>> application + DSO size, some SELinux users will unhappy, lack of lazy
>>> binding support), but at least it can be turned on in practice.

> How practical is this really for 64-bit address space where libc is
> not mapped close to the binary.

libc can be mapped anywhere.  The dynamic loader does that, not the
kernel, so the placement is really up to the loader.  What you cannot do
is place the dynamic loader itself close to the rest of the program
binaries because the kernel loads both the program and the dynamic
loader (the latter as the program interpreter).  But references to the
loader are rare and could still be handled with redirection through libc.

Thanks,
Florian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

ccoutant
In reply to this post by Florian Weimer-5
> It's also incompatible with shadow stack support, so the binary marker for
> that needs to be removed.

Ugh. But that marker shouldn't be set in the first place, since this
linker option is useful only in conjunction with a corresponding
compiler option.

> I don't think this is the right approach at all.  What is this trying to
> accomplish?  What kind of speculation barrier does this implement on current
> CPUs?  Isn't this *extremely* costly?

Supposedly, this strategy aims to disable branch prediction for all
indirect branches in a piece of code, so that attackers cannot use
branch predictor training to force the speculative execution of any
available "gadgets" in the target code. I haven't yet seen any claims
where branch predictor training by itself can be exploited -- it's
simply one way to exploit the cache side channel vulnerabilities.

Yes, it's costly. I'm hoping that once the cache side channels have
been closed down, we can forget about this option. As for how badly
it's needed in the meantime, I don't really know. I get the feeling
that this particular approach to the exploits is most useful in
leaking data from a hypervisor into a guest OS; thus, the fix is
important for cloud-based services. But, given that, I also don't
really know whether it's really needed for user-level apps that may be
dynamically linked, or only for the kernel, for which compiler changes
should be sufficient.

BTW, the most informative resource I've found so far is ARM's "Cache
Speculation Side Channels" white paper, found here:
https://developer.arm.com/support/security-update/download-the-whitepaper.

> If we think this is a problem that needs to be fixed, we should remove the
> indirect call altogether, and have the dynamic linker generate a direct call
> at load time.  There are few constraints associated with that (4 GiB total
> application + DSO size, some SELinux users will unhappy, lack of lazy
> binding support), but at least it can be turned on in practice.

That would involve moving the PLT into writable memory, and is a much
bigger change than I'd want to see for what should be a temporary
mitigation strategy.

-cary
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
Just to provide some context as one of the leads at Google that has
been working on this class of security mitigation...

On Fri, Jan 5, 2018 at 6:28 PM, Cary Coutant <[hidden email]> wrote:

>> It's also incompatible with shadow stack support, so the binary marker for
>> that needs to be removed.
>
> Ugh. But that marker shouldn't be set in the first place, since this
> linker option is useful only in conjunction with a corresponding
> compiler option.
>
>> I don't think this is the right approach at all.  What is this trying to
>> accomplish?  What kind of speculation barrier does this implement on current
>> CPUs?  Isn't this *extremely* costly?
>
> Supposedly, this strategy aims to disable branch prediction for all
> indirect branches in a piece of code, so that attackers cannot use
> branch predictor training to force the speculative execution of any
> available "gadgets" in the target code. I haven't yet seen any claims
> where branch predictor training by itself can be exploited -- it's
> simply one way to exploit the cache side channel vulnerabilities.
>
> Yes, it's costly. I'm hoping that once the cache side channels have
> been closed down, we can forget about this option. As for how badly
> it's needed in the meantime, I don't really know. I get the feeling
> that this particular approach to the exploits is most useful in
> leaking data from a hypervisor into a guest OS; thus, the fix is
> important for cloud-based services. But, given that, I also don't
> really know whether it's really needed for user-level apps that may be
> dynamically linked, or only for the kernel, for which compiler changes
> should be sufficient.

We aren't patching linkers just because we can. ;]

There are classes of security sensitive applications where this kind
of mitigation is essential to fully mitigate the variant #2 of the
speculative execution attacks recently disclosed. We have some of
these and we have a specific need to build them with these
mitigations. We suspect others do as well which is why we worked hard
to share this patch ASAP after disclosure.

>
> BTW, the most informative resource I've found so far is ARM's "Cache
> Speculation Side Channels" white paper, found here:
> https://developer.arm.com/support/security-update/download-the-whitepaper.
>
>> If we think this is a problem that needs to be fixed, we should remove the
>> indirect call altogether, and have the dynamic linker generate a direct call
>> at load time.  There are few constraints associated with that (4 GiB total
>> application + DSO size, some SELinux users will unhappy, lack of lazy
>> binding support), but at least it can be turned on in practice.
>
> That would involve moving the PLT into writable memory, and is a much
> bigger change than I'd want to see for what should be a temporary
> mitigation strategy.

I also was interested in having the loader generate direct calls but
in addition to the point Cary makes here, there is a further problem
that deploying a loader supporting this is *significantly* harder than
rebuilding your application (which is, itself, quite hard). The new
loader would need to be deployed everywhere even if only a tiny
fraction of applications needed this functionality. We would very much
like a mitigation strategy that we can deploy rapidly in the very few
(but important) cases where it is needed.
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
In reply to this post by ccoutant
On Fri, Jan 5, 2018 at 3:28 PM, Cary Coutant <[hidden email]> wrote:

>
>> If we think this is a problem that needs to be fixed, we should remove the
>> indirect call altogether, and have the dynamic linker generate a direct call
>> at load time.  There are few constraints associated with that (4 GiB total
>> application + DSO size, some SELinux users will unhappy, lack of lazy
>> binding support), but at least it can be turned on in practice.
>
> That would involve moving the PLT into writable memory, and is a much
> bigger change than I'd want to see for what should be a temporary
> mitigation strategy.

The dynamic linker could mprotect the PLT to be writable, resolve all
the references (as with LD_BIND_NOW=1), and then mprotect the PLT to
be non-writable again.  That would all happen before  the program
actually starts, so it would be safe.

(The objections raised by Sri and Chandler are still valid, of course.)

Ian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
Hi Ian,

On Fri, Jan 5, 2018 at 3:52 PM, Ian Lance Taylor <[hidden email]> wrote:

> On Fri, Jan 5, 2018 at 3:28 PM, Cary Coutant <[hidden email]> wrote:
>>
>>> If we think this is a problem that needs to be fixed, we should remove the
>>> indirect call altogether, and have the dynamic linker generate a direct call
>>> at load time.  There are few constraints associated with that (4 GiB total
>>> application + DSO size, some SELinux users will unhappy, lack of lazy
>>> binding support), but at least it can be turned on in practice.
>>
>> That would involve moving the PLT into writable memory, and is a much
>> bigger change than I'd want to see for what should be a temporary
>> mitigation strategy.
>
> The dynamic linker could mprotect the PLT to be writable, resolve all
> the references (as with LD_BIND_NOW=1), and then mprotect the PLT to
> be non-writable again.  That would all happen before  the program
> actually starts, so it would be safe.

This looks very similar to how text relocations would be handled,
except that they are restricted to the .plt section here. Wouldn't
that mean we would suffer from the problems of TEXTREL which is not
very desired as far as I understand? Maybe I understood this wrong.

Thanks
Sri


>
> (The objections raised by Sri and Chandler are still valid, of course.)
>
> Ian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
On Fri, Jan 5, 2018 at 4:03 PM, Sriraman Tallam <[hidden email]> wrote:

>
> On Fri, Jan 5, 2018 at 3:52 PM, Ian Lance Taylor <[hidden email]> wrote:
>> On Fri, Jan 5, 2018 at 3:28 PM, Cary Coutant <[hidden email]> wrote:
>>>
>>>> If we think this is a problem that needs to be fixed, we should remove the
>>>> indirect call altogether, and have the dynamic linker generate a direct call
>>>> at load time.  There are few constraints associated with that (4 GiB total
>>>> application + DSO size, some SELinux users will unhappy, lack of lazy
>>>> binding support), but at least it can be turned on in practice.
>>>
>>> That would involve moving the PLT into writable memory, and is a much
>>> bigger change than I'd want to see for what should be a temporary
>>> mitigation strategy.
>>
>> The dynamic linker could mprotect the PLT to be writable, resolve all
>> the references (as with LD_BIND_NOW=1), and then mprotect the PLT to
>> be non-writable again.  That would all happen before  the program
>> actually starts, so it would be safe.
>
> This looks very similar to how text relocations would be handled,
> except that they are restricted to the .plt section here. Wouldn't
> that mean we would suffer from the problems of TEXTREL which is not
> very desired as far as I understand? Maybe I understood this wrong.

Yes, in this scheme we would want to make sure that the PLT was on a
separate page by itself.

Ian
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Alan Modra-3
In reply to this post by ccoutant
On Fri, Jan 05, 2018 at 03:28:34PM -0800, Cary Coutant wrote:

> > It's also incompatible with shadow stack support, so the binary marker for
> > that needs to be removed.
>
> Ugh. But that marker shouldn't be set in the first place, since this
> linker option is useful only in conjunction with a corresponding
> compiler option.
>
> > I don't think this is the right approach at all.  What is this trying to
> > accomplish?  What kind of speculation barrier does this implement on current
> > CPUs?  Isn't this *extremely* costly?
>
> Supposedly, this strategy aims to disable branch prediction for all
> indirect branches in a piece of code, so that attackers cannot use
> branch predictor training to force the speculative execution of any
> available "gadgets" in the target code. I haven't yet seen any claims
> where branch predictor training by itself can be exploited -- it's
> simply one way to exploit the cache side channel vulnerabilities.

I don't think it's just the victim code.  It seems to me that you'd
need to disable indirect branch prediction for all indirect branches
in the victim address space.  So it won't be sufficient to simply
relink the app with fancy PLT call code.  You'd need to relink *all*
libraries that make PLT calls, including libc.so, too.  (libc PLT
calls to __tls_get_addr, calloc and any ifunc come to mind as possible
attack vectors.)  And of course recompile everything to mitigate any
inline function pointer calls.

Unless I'm missing something, this makes the fancy PLT mitigation
unworkable in practice.  You will definitely not want a slow shared
libc, libstdc++ etc. to be used by all applications.  So build a set
of hardened static libraries and link them into your hardened app.
No PLT calls involved, and thus no PLT mitigation needed.

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

H.J. Lu-30
On Fri, Jan 5, 2018 at 6:53 PM, Alan Modra <[hidden email]> wrote:

> On Fri, Jan 05, 2018 at 03:28:34PM -0800, Cary Coutant wrote:
>> > It's also incompatible with shadow stack support, so the binary marker for
>> > that needs to be removed.
>>
>> Ugh. But that marker shouldn't be set in the first place, since this
>> linker option is useful only in conjunction with a corresponding
>> compiler option.
>>
>> > I don't think this is the right approach at all.  What is this trying to
>> > accomplish?  What kind of speculation barrier does this implement on current
>> > CPUs?  Isn't this *extremely* costly?
>>
>> Supposedly, this strategy aims to disable branch prediction for all
>> indirect branches in a piece of code, so that attackers cannot use
>> branch predictor training to force the speculative execution of any
>> available "gadgets" in the target code. I haven't yet seen any claims
>> where branch predictor training by itself can be exploited -- it's
>> simply one way to exploit the cache side channel vulnerabilities.
>
> I don't think it's just the victim code.  It seems to me that you'd
> need to disable indirect branch prediction for all indirect branches
> in the victim address space.  So it won't be sufficient to simply
> relink the app with fancy PLT call code.  You'd need to relink *all*
> libraries that make PLT calls, including libc.so, too.  (libc PLT
> calls to __tls_get_addr, calloc and any ifunc come to mind as possible
> attack vectors.)  And of course recompile everything to mitigate any
> inline function pointer calls.
>
> Unless I'm missing something, this makes the fancy PLT mitigation
> unworkable in practice.  You will definitely not want a slow shared
> libc, libstdc++ etc. to be used by all applications.  So build a set
> of hardened static libraries and link them into your hardened app.
> No PLT calls involved, and thus no PLT mitigation needed.
>

Adding x86-64 psABI group.

Also Florian pointed out, this doesn't work for shadow stack.  If you
are really concerned about PLT, you should avoid PLT altogether as
suggested by

https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-secure.pdf

This feature has been implemented in GCC + binutils.

--
H.J.
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

ccoutant
In reply to this post by Sourceware - binutils list mailing list
> We aren't patching linkers just because we can. ;]

Chandler, if I didn't know you personally, I'd take offense. Hmm,
maybe I took a wee bit of offense, even so. :-) Snark isn't going to
get you anywhere; it's more likely to close minds.

The details of these vulnerabilities are out there now, and your
little circle is much bigger. That means you're now exposed to people
with different experiences and possibly more expertise. Even if you've
already been down certain paths and answered certain questions, you're
going to need to do it again for the rest of us. People on this thread
are asking reasonable questions, and if you want help and cooperation,
those questions deserve serious answers. Together, we may even come up
with better solutions.

In particular, I'd like to know your answer to Alan's question about
the performance implications of deploying slow shared libraries where
not all applications need this mitigation, and the suggestion to just
compile secure apps statically. I'd like to know your answer to HJ's
suggestion to eliminate the PLT altogether (I have an answer to that,
but I'd like to know yours).

I've approved Sri's patch, in the hope that it's a short-term
mitigation strategy that we can retire in a reasonable short period of
time, and I'll be receptive to a follow-on patch that improves the
code sequences as discussed in the LLVM review thread. I really don't
want to get into the business of changing the ABI for this, though.

-cary
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Florian Weimer
In reply to this post by ccoutant
* Cary Coutant:

> Supposedly, this strategy aims to disable branch prediction for all
> indirect branches in a piece of code, so that attackers cannot use
> branch predictor training to force the speculative execution of any
> available "gadgets" in the target code. I haven't yet seen any claims
> where branch predictor training by itself can be exploited -- it's
> simply one way to exploit the cache side channel vulnerabilities.

The “supposedly” bit irks me.  Using RET for indirect jumps isn't new.
They have been used on in the i386 dynamic linker for a long, long
time because there is no reserved register which the PLT stub can use
to store the target address.  (The GNU ABI supports three integer
register arguments, and the remining registers are either
special-purpose or callee-saved.)  As a result, CPUs might already
have logic to recognize non-returning RETs.

It's also not clear to me why the PAUSE loop would be preferable to a
single UD2 or CPUID instruction.

In i386 mode, the CPU does not prefetch through far jumps.  Does such
prefetching occur in x86-64 mode?  If not, why not use a far jump?

Is there an expectation that the retpoline does something to the
caller or callee as far as return stack caching is concerned?

It's also not clear to me that a DSO boundar equals a trust boundary.
For a JIT, the indirect calls which need protecting are likely
someplace else, I assume.

> But, given that, I also don't really know whether it's really needed
> for user-level apps that may be dynamically linked, or only for the
> kernel, for which compiler changes should be sufficient.

I expect that compiler support for indirect branch rewriting together
with -fno-plt is sufficient.  There is simply no need to put any code
into binutils at this point.
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Nick Clifton
In reply to this post by Sourceware - binutils list mailing list
Hi Sri,

> Patch attached for the gold linker.

Did I miss something, or has this patch only been submitted for the gold linker
and not the bfd linker ?


>> This "retpoline" mitigation is fully described in the following blog post:
>> https://support.google.com/faqs/answer/7625886

I think that it might be useful to include this link in the documentation
for the -z retpolineplt option.


It occurs to me that it might also be useful to be able to mark a binary
as having been linked with this option, so that the loader (or a static
tool) can check, and if desired, reject a binary as being insecure.

Cheers
  Nick
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
In reply to this post by H.J. Lu-30
Hello HJ,

On Sun, Jan 7, 2018 at 10:10 AM, H.J. Lu <[hidden email]> wrote:

> On Fri, Jan 5, 2018 at 6:53 PM, Alan Modra <[hidden email]> wrote:
>> On Fri, Jan 05, 2018 at 03:28:34PM -0800, Cary Coutant wrote:
>>> > It's also incompatible with shadow stack support, so the binary marker for
>>> > that needs to be removed.
>>>
>>> Ugh. But that marker shouldn't be set in the first place, since this
>>> linker option is useful only in conjunction with a corresponding
>>> compiler option.
>>>
>>> > I don't think this is the right approach at all.  What is this trying to
>>> > accomplish?  What kind of speculation barrier does this implement on current
>>> > CPUs?  Isn't this *extremely* costly?
>>>
>>> Supposedly, this strategy aims to disable branch prediction for all
>>> indirect branches in a piece of code, so that attackers cannot use
>>> branch predictor training to force the speculative execution of any
>>> available "gadgets" in the target code. I haven't yet seen any claims
>>> where branch predictor training by itself can be exploited -- it's
>>> simply one way to exploit the cache side channel vulnerabilities.
>>
>> I don't think it's just the victim code.  It seems to me that you'd
>> need to disable indirect branch prediction for all indirect branches
>> in the victim address space.  So it won't be sufficient to simply
>> relink the app with fancy PLT call code.  You'd need to relink *all*
>> libraries that make PLT calls, including libc.so, too.  (libc PLT
>> calls to __tls_get_addr, calloc and any ifunc come to mind as possible
>> attack vectors.)  And of course recompile everything to mitigate any
>> inline function pointer calls.
>>
>> Unless I'm missing something, this makes the fancy PLT mitigation
>> unworkable in practice.  You will definitely not want a slow shared
>> libc, libstdc++ etc. to be used by all applications.  So build a set
>> of hardened static libraries and link them into your hardened app.
>> No PLT calls involved, and thus no PLT mitigation needed.
>>
>
> Adding x86-64 psABI group.
>
> Also Florian pointed out, this doesn't work for shadow stack.  If you
> are really concerned about PLT, you should avoid PLT altogether as
> suggested by
>
> https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-secure.pdf
>
> This feature has been implemented in GCC + binutils.

Are you referring to the no-plt feature?   If that is the case, I
contributed to  the -fno-plt option in GCC and also in LLVM and the
gold patches to binutils,  and I did bring this up.  The problem is
there is still an indirect jmp directly from the callsite via the GOT
and hence it is still exposed to the attack.

Thanks
Sri

>
> --
> H.J.
Reply | Threaded
Open this post in threaded view
|

Re: Gold Linker Patch: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715 and in some places called "spectre".

Sourceware - binutils list mailing list
In reply to this post by Alan Modra-3
Hi Alan,

On Fri, Jan 5, 2018 at 6:53 PM, Alan Modra <[hidden email]> wrote:

> On Fri, Jan 05, 2018 at 03:28:34PM -0800, Cary Coutant wrote:
>> > It's also incompatible with shadow stack support, so the binary marker for
>> > that needs to be removed.
>>
>> Ugh. But that marker shouldn't be set in the first place, since this
>> linker option is useful only in conjunction with a corresponding
>> compiler option.
>>
>> > I don't think this is the right approach at all.  What is this trying to
>> > accomplish?  What kind of speculation barrier does this implement on current
>> > CPUs?  Isn't this *extremely* costly?
>>
>> Supposedly, this strategy aims to disable branch prediction for all
>> indirect branches in a piece of code, so that attackers cannot use
>> branch predictor training to force the speculative execution of any
>> available "gadgets" in the target code. I haven't yet seen any claims
>> where branch predictor training by itself can be exploited -- it's
>> simply one way to exploit the cache side channel vulnerabilities.
>
> I don't think it's just the victim code.  It seems to me that you'd
> need to disable indirect branch prediction for all indirect branches
> in the victim address space.  So it won't be sufficient to simply
> relink the app with fancy PLT call code.  You'd need to relink *all*
> libraries that make PLT calls, including libc.so, too.  (libc PLT
> calls to __tls_get_addr, calloc and any ifunc come to mind as possible
> attack vectors.)  And of course recompile everything to mitigate any
> inline function pointer calls.
>
> Unless I'm missing something, this makes the fancy PLT mitigation
> unworkable in practice.  You will definitely not want a slow shared
> libc, libstdc++ etc. to be used by all applications.  So build a set
> of hardened static libraries and link them into your hardened app.
> No PLT calls involved, and thus no PLT mitigation needed.

Thanks for pointing these out. We are working on mitigating the some
of the slowness from shared libraries. Here are some of the things we
considered:

* Static linking is out of question since we need to use PIE to enable
ASLR and PIE+static linking is not supported.
* We are working on something like partial static linking where we
still link to libc dynamically but statically link hot memops like
memcpy, memcmp etc. to avoid PLT + ifunc penalty for them.
* You are right that we would still have to re-build libc.so to use
retpoline but hopefully with some variant of partial static linking we
may be able to keep hot libc calls from incurring the penalty.

Thanks
Sri

>
> --
> Alan Modra
> Australia Development Lab, IBM
123