use of %fs segment register in x86_64 with -fstack-check

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

use of %fs segment register in x86_64 with -fstack-check

Maxim Blinov-2
Hi all,

I'm looking at some -fstack-check'ed code, and would appreciate it if
some gdb x86_64 gurus could double check my understanding of a trivial
example

here is the source:

big-access.c:
```
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

extern void foo(char *);

int main()
{
  char ch[8000];
  foo (ch);

  return 0;
}
```

foo.c:
```
void foo(char *ch) { }
```

And the compilation line:

$ gcc -O2 -fstack-check -o big-access big-access.c foo.c -fdump-rtl-final

And here is the gdb view (ignore the breakpoint and current insn caret):
```
B+ │0x555555554560 <main>           sub    $0x2f78,%rsp
   │0x555555554567 <main+7>         orq    $0x0,0xf58(%rsp)
   │0x555555554570 <main+16>        orq    $0x0,(%rsp)
   │0x555555554575 <main+21>        add    $0x1020,%rsp
   │0x55555555457c <main+28>        mov    %rsp,%rdi
   │0x55555555457f <main+31>        mov    %fs:0x28,%rax
  >│0x555555554588 <main+40>        mov    %rax,0x1f48(%rsp)
   │0x555555554590 <main+48>        xor    %eax,%eax
   │0x555555554592 <main+50>        callq  0x5555555546d0 <foo>
   │0x555555554597 <main+55>        mov    0x1f48(%rsp),%rdx
   │0x55555555459f <main+63>        xor    %fs:0x28,%rdx
   │0x5555555545a8 <main+72>        jne    0x5555555545b4 <main+84>
   │0x5555555545aa <main+74>        xor    %eax,%eax
   │0x5555555545ac <main+76>        add    $0x1f58,%rsp
   │0x5555555545b3 <main+83>        retq
   │0x5555555545b4 <main+84>        callq  0x555555554540 <__stack_chk_fail@plt>
   │0x5555555545b9                  nopl   0x0(%rax)
```

I would just like someone who knows their stuff to double check my
understanding:

The "orq" at the start are purposefully causing a "dummy" load/store
event so the VMM can decide whether or not it is sane for us to have
used those pages for the stack, right?

Another question, is at address 0x55555555457f. I presume that
%fs:0x28 is a memory address that points to a sentinel value. We load
it into %rax, and then we store it in strategic locations in our stack
to serve as sentinel values. Before we leave, we check that the memory
location hasn't changed at 0x55555555459f. That implies, that the
memory location %fs:0x28 is pointing to a globally-used sentinel
value?

But who sets %fs? Indeed what is the ABI usage of %fs in the context
of linux x86_64?
And why 0x28 offset?

Thankyou for reading,
Maxim
Reply | Threaded
Open this post in threaded view
|

Re: use of %fs segment register in x86_64 with -fstack-check

Ruslan Kabatsayev
Hi,

On Tue, 3 Mar 2020 at 17:53, Maxim Blinov <[hidden email]> wrote:

>
> Hi all,
>
> I'm looking at some -fstack-check'ed code, and would appreciate it if
> some gdb x86_64 gurus could double check my understanding of a trivial
> example
>
> here is the source:
>
> big-access.c:
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <stdint.h>
>
> extern void foo(char *);
>
> int main()
> {
>   char ch[8000];
>   foo (ch);
>
>   return 0;
> }
> ```
>
> foo.c:
> ```
> void foo(char *ch) { }
> ```
>
> And the compilation line:
>
> $ gcc -O2 -fstack-check -o big-access big-access.c foo.c -fdump-rtl-final
>
> And here is the gdb view (ignore the breakpoint and current insn caret):
> ```
> B+ │0x555555554560 <main>           sub    $0x2f78,%rsp
>    │0x555555554567 <main+7>         orq    $0x0,0xf58(%rsp)
>    │0x555555554570 <main+16>        orq    $0x0,(%rsp)
>    │0x555555554575 <main+21>        add    $0x1020,%rsp
>    │0x55555555457c <main+28>        mov    %rsp,%rdi
>    │0x55555555457f <main+31>        mov    %fs:0x28,%rax
>   >│0x555555554588 <main+40>        mov    %rax,0x1f48(%rsp)
>    │0x555555554590 <main+48>        xor    %eax,%eax
>    │0x555555554592 <main+50>        callq  0x5555555546d0 <foo>
>    │0x555555554597 <main+55>        mov    0x1f48(%rsp),%rdx
>    │0x55555555459f <main+63>        xor    %fs:0x28,%rdx
>    │0x5555555545a8 <main+72>        jne    0x5555555545b4 <main+84>
>    │0x5555555545aa <main+74>        xor    %eax,%eax
>    │0x5555555545ac <main+76>        add    $0x1f58,%rsp
>    │0x5555555545b3 <main+83>        retq
>    │0x5555555545b4 <main+84>        callq  0x555555554540 <__stack_chk_fail@plt>
>    │0x5555555545b9                  nopl   0x0(%rax)
> ```
>
> I would just like someone who knows their stuff to double check my
> understanding:
>
> The "orq" at the start are purposefully causing a "dummy" load/store
> event so the VMM can decide whether or not it is sane for us to have
> used those pages for the stack, right?

Not quite. As noted at [1] this OR is to ensure that stack hasn't
overflowed. This is the part added by -fstack-check (you can see it go
away when you remove this option). See [2] for documentation.

>
> Another question, is at address 0x55555555457f. I presume that
> %fs:0x28 is a memory address that points to a sentinel value. We load
> it into %rax, and then we store it in strategic locations in our stack
> to serve as sentinel values. Before we leave, we check that the memory
> location hasn't changed at 0x55555555459f. That implies, that the
> memory location %fs:0x28 is pointing to a globally-used sentinel
> value?

Right. But note that this is enabled not by -fstack-check, but rather
by some of the -fstack-protector* options that are on by default on
modern Linux distributions. You can confirm this by explicitly passing
-fno-stack-protector and seeing this sentinel checking gone.

>
> But who sets %fs? Indeed what is the ABI usage of %fs in the context
> of linux x86_64?

The FS segment base points to the TLS. See [3] and links therein.

> And why 0x28 offset?

It's the offset of stack_guard member of tcbhead_t. See the
corresponding glibc source [4].

>
> Thankyou for reading,
> Maxim

[1]: https://stackoverflow.com/a/44670648/673852
[2]: https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html
[3]: https://chao-tic.github.io/blog/2018/12/25/tls
[4]: https://code.woboq.org/userspace/glibc/sysdeps/x86_64/nptl/tls.h.html#42

Regards,
Ruslan
Reply | Threaded
Open this post in threaded view
|

Fwd: use of %fs segment register in x86_64 with -fstack-check

Maxim Blinov-2
(Sorry, forgot to CC gdb ml)

---------- Forwarded message ---------
From: Maxim Blinov <[hidden email]>
Date: Tue, 3 Mar 2020 at 18:37
Subject: Re: use of %fs segment register in x86_64 with -fstack-check
To: Ruslan Kabatsayev <[hidden email]>


Hi Ruslan, thankyou for your explanations. Unfortunately, I still
can't see the whole picture.

On Tue, 3 Mar 2020 at 16:51, Ruslan Kabatsayev <[hidden email]> wrote:
> Not quite. As noted at [1] this OR is to ensure that stack hasn't
> overflowed. This is the part added by -fstack-check (you can see it go
> away when you remove this option). See [2] for documentation.

I don't understand how the OR insns check that the stack hasn't overflowed.

From [1], the author writes "it just inserts a NULL byte". What is
*it* in this context? I don't see anyone writing anything to the stack
in the assembly. Does linux do it on our behalf, and then the OR insns
check that those bytes are indeed NULL?

Furthermore, I can't see who uses the result of the OR operation. I'm
under the impression that there is some page fault magic happening
under the hood, but what is that magic? No insns after the ORs perform
any conditional jumps based on the ORs results that I can see
(although I am not very knowledgeable about x86_64 asm.) So I am still
confused.

I did read [2] before posting, but unfortunately I didn't find it very helpful.

I tried to step through each insn in my head to demonstrate where i dont get it:

0x555555554560 <main>           sub    $0x2f78,%rsp
Ok, whatever %rsp was, its now %rsp - 12152. Thats a lot more than
8000, but fine.
Lets call %rsp before we subtracted it "%original".

0x555555554567 <main+7>         orq    $0x0,0xf58(%rsp)
Ok, we OR with memory location %rsp + 3928. Taking into account the
previous offset, we're accessing %original + (3928 - 12152) which is
%original - 8224. So this is about 200 bytes after the stack array
ends. The instruction doesn't change the value at 0xf58(%rsp). My
understanding is that this instruction will fetch the quadword at
0xf58(%rsp), OR it with $0x0, and then store the result of that
computation back to the same address. How does this check that no
stack overflow has occurred?

0x555555554570 <main+16>        orq    $0x0,(%rsp)
We do it again, this time at %original - 12152 (the bottom of the
stack). Is this because we might span over two pages?

0x555555554575 <main+21>        add    $0x1020,%rsp
Now we set %rsp to be %original - 8024. So now we are actually
pointing to the stack byte just after the large array.

0x55555555457c <main+28>        mov    %rsp,%rdi
Now we save %rsp to %rdi, despite %rdi not being used anywhere... not
sure about this one.

0x55555555457f <main+31>        mov    %fs:0x28,%rax
Load the magic sentinel pattern, OK.

0x555555554588 <main+40>        mov    %rax,0x1f48(%rsp)
0x1f48 corresponds to %original - 16. So we are writing a sentinel
value to almost the start of the stack for this func.

0x555555554590 <main+48>        xor    %eax,%eax
0x555555554592 <main+50>        callq  0x5555555546d0 <foo>

Clear %eax for foo's return value and call foo.

0x555555554597 <main+55>        mov    0x1f48(%rsp),%rdx
0x55555555459f <main+63>        xor    %fs:0x28,%rdx
0x5555555545a8 <main+72>        jne    0x5555555545b4 <main+84>

Now we double-check that the sentinel value at %original - 16 is
exactly the same as it was before we called foo, and if it isn't, we
go to __stack_chk_fail. So, this protects us against the case where
foo trashed the start of our stack?

0x5555555545aa <main+74>        xor    %eax,%eax
0x5555555545ac <main+76>        add    $0x1f58,%rsp
0x5555555545b3 <main+83>        retq

Clear our own return value, cleanup the stack, and exit.

I just don't understand how the ORs are ensuring the stack hasn't overflowed.

> Right. But note that this is enabled not by -fstack-check, but rather
> by some of the -fstack-protector* options that are on by default on
> modern Linux distributions. You can confirm this by explicitly passing
> -fno-stack-protector and seeing this sentinel checking gone.

Ok, I see.

> The FS segment base points to the TLS. See [3] and links therein.
...
> It's the offset of stack_guard member of tcbhead_t. See the
> corresponding glibc source [4].

Got it, thankyou.

> [1]: https://stackoverflow.com/a/44670648/673852
> [2]: https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html
> [3]: https://chao-tic.github.io/blog/2018/12/25/tls
> [4]: https://code.woboq.org/userspace/glibc/sysdeps/x86_64/nptl/tls.h.html#42
>
> Regards,
> Ruslan
Reply | Threaded
Open this post in threaded view
|

Re: use of %fs segment register in x86_64 with -fstack-check

Ruslan Kabatsayev
On Tue, 3 Mar 2020 at 21:37, Maxim Blinov <[hidden email]> wrote:

>
> Hi Ruslan, thankyou for your explanations. Unfortunately, I still
> can't see the whole picture.
>
> On Tue, 3 Mar 2020 at 16:51, Ruslan Kabatsayev <[hidden email]> wrote:
> > Not quite. As noted at [1] this OR is to ensure that stack hasn't
> > overflowed. This is the part added by -fstack-check (you can see it go
> > away when you remove this option). See [2] for documentation.
>
> I don't understand how the OR insns check that the stack hasn't overflowed.
>
> From [1], the author writes "it just inserts a NULL byte". What is
> *it* in this context? I don't see anyone writing anything to the stack
> in the assembly. Does linux do it on our behalf, and then the OR insns
> check that those bytes are indeed NULL?
>
> Furthermore, I can't see who uses the result of the OR operation. I'm
> under the impression that there is some page fault magic happening
> under the hood, but what is that magic? No insns after the ORs perform
> any conditional jumps based on the ORs results that I can see
> (although I am not very knowledgeable about x86_64 asm.) So I am still
> confused.
>
> I did read [2] before posting, but unfortunately I didn't find it very helpful.
>
> I tried to step through each insn in my head to demonstrate where i dont get it:
>
> 0x555555554560 <main>           sub    $0x2f78,%rsp
> Ok, whatever %rsp was, its now %rsp - 12152. Thats a lot more than
> 8000, but fine.
> Lets call %rsp before we subtracted it "%original".
>
> 0x555555554567 <main+7>         orq    $0x0,0xf58(%rsp)
> Ok, we OR with memory location %rsp + 3928. Taking into account the
> previous offset, we're accessing %original + (3928 - 12152) which is
> %original - 8224. So this is about 200 bytes after the stack array
> ends. The instruction doesn't change the value at 0xf58(%rsp). My
> understanding is that this instruction will fetch the quadword at
> 0xf58(%rsp), OR it with $0x0, and then store the result of that
> computation back to the same address. How does this check that no
> stack overflow has occurred?
>
> 0x555555554570 <main+16>        orq    $0x0,(%rsp)
> We do it again, this time at %original - 12152 (the bottom of the
> stack). Is this because we might span over two pages?

Not merely "might", we _do_ span two pages. Pages are 4096 bytes in size.

>
> 0x555555554575 <main+21>        add    $0x1020,%rsp
> Now we set %rsp to be %original - 8024. So now we are actually
> pointing to the stack byte just after the large array.
>
> 0x55555555457c <main+28>        mov    %rsp,%rdi
> Now we save %rsp to %rdi, despite %rdi not being used anywhere... not
> sure about this one.

Actually it _is_ used—in the callee. That's how the first integral
argument is passed, see System V x86-64 psABI for more details. So RSP
(and EDI) now contains the address of the first byte of the array.

>
> 0x55555555457f <main+31>        mov    %fs:0x28,%rax
> Load the magic sentinel pattern, OK.
>
> 0x555555554588 <main+40>        mov    %rax,0x1f48(%rsp)
> 0x1f48 corresponds to %original - 16. So we are writing a sentinel
> value to almost the start of the stack for this func.
>
> 0x555555554590 <main+48>        xor    %eax,%eax
> 0x555555554592 <main+50>        callq  0x5555555546d0 <foo>
>
> Clear %eax for foo's return value and call foo.

No, it's not clearing for the return value. The return type of foo is
void, so this must be something other. I'd guess it's clearing the
sentinel value so that foo doesn't have easy access to it. Otherwise
it could somehow (e.g. due to an uninitialized variable) be written by
foo into the area being protected, which would defy the protector's
efforts, since stack smashing will then not be detected.

>
> 0x555555554597 <main+55>        mov    0x1f48(%rsp),%rdx
> 0x55555555459f <main+63>        xor    %fs:0x28,%rdx
> 0x5555555545a8 <main+72>        jne    0x5555555545b4 <main+84>
>
> Now we double-check that the sentinel value at %original - 16 is
> exactly the same as it was before we called foo, and if it isn't, we
> go to __stack_chk_fail. So, this protects us against the case where
> foo trashed the start of our stack?

Yes, this protects us from the case when buffer overrun overwrites
return address and thus possibly lands us somewhere at malicious (if
this buffer overrun is being exploited) code at return.

>
> 0x5555555545aa <main+74>        xor    %eax,%eax
> 0x5555555545ac <main+76>        add    $0x1f58,%rsp
> 0x5555555545b3 <main+83>        retq
>
> Clear our own return value, cleanup the stack, and exit.
>
> I just don't understand how the ORs are ensuring the stack hasn't overflowed.

I think this is supposed to ensure that, as you've grown stack to some
large size (by RSP subtraction), the whole allocated space actually
belongs to the stack. Otherwise, you could e.g. grow it by 2GiB, write
to the newly-allocated space—and clobber heap, not noticing the gap
under the lowest stack location. These ORs will ensure that this gap
is noticed (and gets you SIGSEGV).

>
> > Right. But note that this is enabled not by -fstack-check, but rather
> > by some of the -fstack-protector* options that are on by default on
> > modern Linux distributions. You can confirm this by explicitly passing
> > -fno-stack-protector and seeing this sentinel checking gone.
>
> Ok, I see.
>
> > The FS segment base points to the TLS. See [3] and links therein.
> ...
> > It's the offset of stack_guard member of tcbhead_t. See the
> > corresponding glibc source [4].
>
> Got it, thankyou.
>
> > [1]: https://stackoverflow.com/a/44670648/673852
> > [2]: https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html
> > [3]: https://chao-tic.github.io/blog/2018/12/25/tls
> > [4]: https://code.woboq.org/userspace/glibc/sysdeps/x86_64/nptl/tls.h.html#42
> >
> > Regards,
> > Ruslan
Reply | Threaded
Open this post in threaded view
|

Re: use of %fs segment register in x86_64 with -fstack-check

Florian Weimer-5
In reply to this post by Maxim Blinov-2
* Maxim Blinov:

> I'm looking at some -fstack-check'ed code, and would appreciate it if
> some gdb x86_64 gurus could double check my understanding of a trivial
> example

What's your motivation for this?  -fstack-check is mostly there to
support certain Ada uses, yet you post a C snippet.

The more generally useful stack overflow detection switch is called
-fstack-clash-protection.

Thanks,
Florian