exec-file-mismatch and native-gdbserver testing

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
Hello,

I noticed an issue when running tests that use standard_temp_file with the native-gdbserver board.

In gdb_remote_download, when called without tofile argument, as it is, for example, when starting gdbserver via gdb_reload, we set the filename to standard_output_file [file tail $fromfile] and copy the file.

GDB and gdbserver now use a copy of the same file at different locations.

This triggers an exec-file-mismatch warning and, with the default “ask” setting, a user prompt that isn’t handled by the tests and eventually leads to a timeout.  This can be seen with all tests that use gdb_simple_compile, e.g. via skip_*_test calls.  An example would be gdb.btrace/*.exp.

In exec.c:validate_exec_file (), we check the filenames and, if they differ, print a warning and re-load the symbol file.

Should validate_exec_file () check more than just the filenames?
Should gdb_simple_compile use standard_output_file instead of standard_temp_file?

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Wed, 2020-04-08 at 14:02 +0000, Metzger, Markus T via Gdb wrote:

> Hello,
>
> I noticed an issue when running tests that use standard_temp_file with the native-gdbserver board.
>
> In gdb_remote_download, when called without tofile argument, as it is, for example, when starting gdbserver via gdb_reload, we set the filename to standard_output_file [file tail $fromfile] and copy the file.
>
> GDB and gdbserver now use a copy of the same file at different locations.
>
> This triggers an exec-file-mismatch warning and, with the default “ask” setting, a user prompt that isn’t handled by the tests and eventually leads to a timeout.  This can be seen with all tests that use gdb_simple_compile, e.g. via skip_*_test calls.  An example would be gdb.btrace/*.exp.
>
> In exec.c:validate_exec_file (), we check the filenames and, if they differ, print a warning and re-load the symbol file.
>
> Should validate_exec_file () check more than just the filenames?
You mean: if the filenames differs, gdb could compare the contents of files and if equal,
not ask the question, considering there is no mismatch ?

> Should gdb_simple_compile use standard_output_file instead of standard_temp_file?

Alternatively, set the value of exec-file-mismatch to warn ?
The reason of this setting is to allow to disable this exec-file-mismatch
logic in case of "unusual" setup, and the above seems to qualify as an unusual setup.
During review, some argued that we might hardcode 'ask' (or even not ask, just
automatically reload). It looks like the setting might have its usefulness :).

Note that on this exec-file-mismatch functionality, we still have PR
https://sourceware.org/bugzilla/show_bug.cgi?id=25475
waiting for (some) feedback about the way to fix/improve

(this PR should better be fixed before the next release, as afterwards,
CLI behaviour is much more difficult to change).

Thanks

Philippe



Reply | Threaded
Open this post in threaded view
|

RE: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
Hello Philippe,

> > In exec.c:validate_exec_file (), we check the filenames and, if they differ, print
> a warning and re-load the symbol file.
> >
> > Should validate_exec_file () check more than just the filenames?
> You mean: if the filenames differs, gdb could compare the contents of files and if
> equal,
> not ask the question, considering there is no mismatch ?

It could compare build-ids, for example.

If there are no build-ids, it may fall back to comparing file contents as you
suggested.

 
> > Should gdb_simple_compile use standard_output_file instead of
> standard_temp_file?
>
> Alternatively, set the value of exec-file-mismatch to warn ?

That would work, too.

I didn't want to suggest turning off or ignoring warnings in our test
suite, though.  I don't think that's generally a good idea.

I'd rather make gdb_simple_compile use standard_output_file if we agree
that exec-file-mismatch shouldn't be changed and that this should be handled
in the test suite.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Reply | Threaded
Open this post in threaded view
|

RE: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
Hello Philippe,

> > > In exec.c:validate_exec_file (), we check the filenames and, if they differ,
> print
> > a warning and re-load the symbol file.
> > >
> > > Should validate_exec_file () check more than just the filenames?
> > You mean: if the filenames differs, gdb could compare the contents of files and
> if
> > equal,
> > not ask the question, considering there is no mismatch ?
>
> It could compare build-ids, for example.
>
> If there are no build-ids, it may fall back to comparing file contents as you
> suggested.

Does that sound OK to you?

Thanks,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Fri, 2020-05-08 at 10:30 +0000, Metzger, Markus T wrote:

> Hello Philippe,
>
> > > > In exec.c:validate_exec_file (), we check the filenames and, if they differ,
> > print
> > > a warning and re-load the symbol file.
> > > > Should validate_exec_file () check more than just the filenames?
> > > You mean: if the filenames differs, gdb could compare the contents of files and
> > if
> > > equal,
> > > not ask the question, considering there is no mismatch ?
> >
> > It could compare build-ids, for example.
> >
> > If there are no build-ids, it may fall back to comparing file contents as you
> > suggested.
>
> Does that sound OK to you?
>
> Thanks,
> Markus.
Hello Markus,

If build ids of 2 files are equal, then effectively, validate_exec_file
can consider the existing file is still ok.
I had however no time to dig on that, and see if such build ids can be
easily extracted/retrieved in validate_exec_file.

I had no time to dig more on that, sorry

Philippe


Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
In reply to this post by Sourceware - gdb list mailing list
On 4/8/20 3:02 PM, Metzger, Markus T via Gdb wrote:

> Hello,
>
> I noticed an issue when running tests that use standard_temp_file with the native-gdbserver board.
>
> In gdb_remote_download, when called without tofile argument, as it is, for example, when starting gdbserver via gdb_reload, we set the filename to standard_output_file [file tail $fromfile] and copy the file.
>
> GDB and gdbserver now use a copy of the same file at different locations.
>
> This triggers an exec-file-mismatch warning and, with the default “ask” setting, a user prompt that isn’t handled by the tests and eventually leads to a timeout.  This can be seen with all tests that use gdb_simple_compile, e.g. via skip_*_test calls.  An example would be gdb.btrace/*.exp.
>
> In exec.c:validate_exec_file (), we check the filenames and, if they differ, print a warning and re-load the symbol file.
>
> Should validate_exec_file () check more than just the filenames?
> Should gdb_simple_compile use standard_output_file instead of standard_temp_file?

It probably can't hurt to make gdb_simple_compile use standard_output_file
instead of standard_temp_file.  I've tried it locally and ran the testsuite
with that, and it indeed fixes most of the problem.  It avoids copying
the binary when testing with local gdbserver boards, so seems like a win
regardless.

Looking at the testresults made me realize that the new feature exposed
at least one bug, fixed here:
  https://sourceware.org/pipermail/gdb-patches/2020-May/168680.html

So I've been looking at the remaining failures that the gdb_simple_compile
change wouldn't fix, to look for other bugs.

The next one I'm looking at is gdb.base/argv0-symlink.exp.  When tested
with native-extended-gdbserver, we get:

(gdb) run
Starting program: /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink-filelink
warning: Mismatch between current exec-file /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink-filelink
and automatically determined exec-file /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink
exec-file-mismatch handling is currently "ask"
Load new symbol table from "/home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink"? (y or n)

I first thought that a quick fix at least for local boards could be to
make validate_exec_file resolve symlinks before comparing the filename.

But then, I thought -- how hard can it be to validate the build ids?

So I cooked up something.  Below's the resulting preliminary patch.

Seems to work nicely -- it fixes gdb.base/argv0-symlink.exp at least.
I haven't run the testsuite yet.

There's (at least) one issue that I'll need to fix.  It's to
get rid of the "transfers from remote targets can be slow" warning
when we open the remote file to read the build id:

 (gdb) r
 Starting program: /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/break/break
 Reading /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink from remote target...
 warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 warning: Mismatch between current exec-file /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/break/break
 and automatically determined exec-file /home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink
 exec-file-mismatch handling is currently "ask"
 Load new symbol table from "/home/pedro/brno/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/argv0-symlink/argv0-symlink"? (y or n)

For that, I'll need to pass down a struct with a warn_if_slow field
as closure to gdb_bfd_openr_iovec, I think.

I was worried that bfd would read a lot of stuff from the binary in
order to extract the build-id, making it potentially slow, but turns
out we don't read all that much.  Maybe a couple hundred bytes, and
most of it is the read-ahead cache, I think.  So I'm not worried about
that.  Otherwise I'd consider whether a new qXfer:buildid:read would
be better.  But I'm happy that we seemingly don't need to worry
about that.

From 51124c1ae7dde0c7ba033b4f841777b7d523c864 Mon Sep 17 00:00:00 2001
From: Pedro Alves <[hidden email]>
Date: Sat, 16 May 2020 20:56:54 +0100
Subject: [PATCH] validate_exec_file & build-id

---
 gdb/exec.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/gdb/exec.c b/gdb/exec.c
index a2added5e22..04d94994894 100644
--- a/gdb/exec.c
+++ b/gdb/exec.c
@@ -37,6 +37,7 @@
 #include "gdb_bfd.h"
 #include "gcore.h"
 #include "source.h"
+#include "build-id.h"
 
 #include <fcntl.h>
 #include "readline/tilde.h"
@@ -247,11 +248,45 @@ validate_exec_file (int from_tty)
   struct inferior *inf = current_inferior ();
   /* Try to determine a filename from the process itself.  */
   const char *pid_exec_file = target_pid_to_exec_file (inf->pid);
+  bool build_id_mismatch = false;
 
-  /* If wee cannot validate the exec file, return.  */
+  /* If we cannot validate the exec file, return.  */
   if (current_exec_file == NULL || pid_exec_file == NULL)
     return;
 
+  /* Try validating via build-id, if available.  This is the most
+     reliable check.  */
+  const bfd_build_id *exec_file_build_id = build_id_bfd_get (exec_bfd);
+  if (exec_file_build_id != nullptr)
+    {
+      /* Prepend the target prefix, to force gdb_bfd_open to open the
+ file on the remote file system (if indeed remote).  */
+      std::string target_pid_exec_file
+ = std::string (TARGET_SYSROOT_PREFIX) + pid_exec_file;
+
+      gdb_bfd_ref_ptr abfd (gdb_bfd_open (target_pid_exec_file.c_str (),
+  gnutarget, -1));
+      if (abfd != nullptr)
+ {
+  const bfd_build_id *target_exec_file_build_id
+    = build_id_bfd_get (abfd.get ());
+
+  if (target_exec_file_build_id != nullptr)
+    {
+      if (exec_file_build_id->size == target_exec_file_build_id->size
+  && memcmp (exec_file_build_id->data,
+     target_exec_file_build_id->data,
+     exec_file_build_id->size) == 0)
+ {
+  /* Match.  */
+  return;
+ }
+      else
+ build_id_mismatch = true;
+    }
+ }
+    }
+
   std::string exec_file_target (pid_exec_file);
 
   /* In case the exec file is not local, exec_file_target has to point at
@@ -259,7 +294,9 @@ validate_exec_file (int from_tty)
   if (is_target_filename (current_exec_file) && !target_filesystem_is_local ())
     exec_file_target = TARGET_SYSROOT_PREFIX + exec_file_target;
 
-  if (exec_file_target != current_exec_file)
+  /* If build-id validation wasn't possible, fallback to validating by
+     filename.  */
+  if (build_id_mismatch || exec_file_target != current_exec_file)
     {
       warning
  (_("Mismatch between current exec-file %ps\n"

base-commit: 7cfd74cfc6e14034779e6cc048c68877b7a08f88
--
2.14.5

Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Sat, 2020-05-16 at 21:10 +0100, Pedro Alves via Gdb wrote:
>
> So I cooked up something.  Below's the resulting preliminary patch.
>
> Seems to work nicely -- it fixes gdb.base/argv0-symlink.exp at least.
> I haven't run the testsuite yet.
I have looked at the patch and played a little bit in a native setup.
It worked as expected, the patch looks ok to me.

Note that buildid comparison means that the exec-file used by
GDB might not be the (same physical) exec-file of the process
being debugged.

For some specific scenarios, it might have an impact,
such as the user wanting to debug a copy of the file to avoid
'Text file busy', maybe some interaction with setuid/setgid, ... ?

Maybe good enough to mention this in the user manual and/or in the
'help set exec-file-
mismatch' ?
Or maybe GDB should give a message to the user for different files
but same buildid ?


>
> There's (at least) one issue that I'll need to fix.  It's to
> get rid of the "transfers from remote targets can be slow" warning
> when we open the remote file to read the build id:

Note that before GDB 10 goes out with this new exec-file-mismatch feature,
we should sort out: https://sourceware.org/bugzilla/show_bug.cgi?id=25475
as possibly fixing this bug might imply to change the options of
   'set exec-file-mismatch'
(see last comment in the bug).

Thanks

Philippe


Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On 5/17/20 6:24 AM, Philippe Waroquiers via Gdb wrote:

> On Sat, 2020-05-16 at 21:10 +0100, Pedro Alves via Gdb wrote:
>>
>> So I cooked up something.  Below's the resulting preliminary patch.
>>
>> Seems to work nicely -- it fixes gdb.base/argv0-symlink.exp at least.
>> I haven't run the testsuite yet.
> I have looked at the patch and played a little bit in a native setup.
> It worked as expected, the patch looks ok to me.
>
> Note that buildid comparison means that the exec-file used by
> GDB might not be the (same physical) exec-file of the process
> being debugged.

Right.  A common use case is for the target to run a stripped
executable, while gdb is loaded with an executable with debug
info.  The build IDs will match in this case, while filenames
most probably won't.

Related, when remote debugging, it's very common that the path on
the remote system is different from the path of the file loaded on
gdb.  Many people doing embedded development will have setups where
the IDE copies a local file to the target for debugging, and
remotely puts the file under $HOME, or under /tmp, just like
that dejagnu does when remote testing.  You can get a sense of that if
you try the testsuite with --target_board=remote-gdbserver-on-localhost:

 (gdb) spawn /usr/bin/ssh -l pedro localhost /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/../../gdbserver/gdbserver --once localhost:2346 break
 Process /home/pedro/break created; pid = 12712
 Listening on port 2346
 target remote localhost:2346
 Remote debugging using localhost:2346
 warning: Mismatch between current exec-file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.base/break/break
 and automatically determined exec-file /home/pedro/break
 exec-file-mismatch handling is currently "ask"
 Load new symbol table from "/home/pedro/break"? (y or n)

> For some specific scenarios, it might have an impact,
> such as the user wanting to debug a copy of the file to avoid
> 'Text file busy', maybe some interaction with setuid/setgid, ... ?

These seem like rarer scenarios, which would cause warnings/errors
anyway if you pick the wrong executable?

I'm thinking, if we support build ID validation, do we really want
to fallback to filename validation?  It seems to me that it causes
more false positives than desirable.

>
> Maybe good enough to mention this in the user manual and/or in the
> 'help set exec-file-
> mismatch' ?
> Or maybe GDB should give a message to the user for different files
> but same buildid ?
>
>
>>
>> There's (at least) one issue that I'll need to fix.  It's to
>> get rid of the "transfers from remote targets can be slow" warning
>> when we open the remote file to read the build id:
>
> Note that before GDB 10 goes out with this new exec-file-mismatch feature,
> we should sort out: https://sourceware.org/bugzilla/show_bug.cgi?id=25475
> as possibly fixing this bug might imply to change the options of
>    'set exec-file-mismatch'
> (see last comment in the bug).

Indeed we should.

I'll be sending the build ID validation patch to gdb-patches soon.

Thanks,
Pedro Alves

Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Sun, 2020-05-17 at 18:47 +0100, Pedro Alves wrote:
> On 5/17/20 6:24 AM, Philippe Waroquiers via Gdb wrote:
> > For some specific scenarios, it might have an impact,
> > such as the user wanting to debug a copy of the file to avoid
> > 'Text file busy', maybe some interaction with setuid/setgid, ... ?
>
> These seem like rarer scenarios, which would cause warnings/errors
> anyway if you pick the wrong executable?

For sure, these scenarios are unusual, but it might be better
that the user knows what happens.  GDB silently deciding to use
another (physical) file than what the process really executes
is maybe not ideal.

E.g. I am wondering if the below will be visible and cause
an (understandable) warning/error/behaviour for the user:
If the user has debugged a first process with orig_exe,
then the user copied orig_exe to copy_orig_exe, and then GDB is
attached to a process that runs copy_orig_exe, the user does not expect
to have orig_exe protected/accessed anymore, and so might change it
or remove it or ..., while GDB still use orig_exe instead of copy_orig_exe.

So, I was wondering if such a case of equal build ID
but different (local?) file names are not worth a warning.

>
> I'm thinking, if we support build ID validation, do we really want
> to fallback to filename validation?  It seems to me that it causes
> more false positives than desirable.
You mean that the filename comparison is useless (or even harmful)
if we found the build ID in the files ?
Effectively, if build ID are different but filenames are equal,
that is likely a false positive 'file are matching'
(only possible in remote debugging setup I suppose).

Philippe


Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On 5/17/20 7:15 PM, Philippe Waroquiers via Gdb wrote:

> On Sun, 2020-05-17 at 18:47 +0100, Pedro Alves wrote:
>> On 5/17/20 6:24 AM, Philippe Waroquiers via Gdb wrote:
>>> For some specific scenarios, it might have an impact,
>>> such as the user wanting to debug a copy of the file to avoid
>>> 'Text file busy', maybe some interaction with setuid/setgid, ... ?
>>
>> These seem like rarer scenarios, which would cause warnings/errors
>> anyway if you pick the wrong executable?
>
> For sure, these scenarios are unusual, but it might be better
> that the user knows what happens.  GDB silently deciding to use
> another (physical) file than what the process really executes
> is maybe not ideal.
>
> E.g. I am wondering if the below will be visible and cause
> an (understandable) warning/error/behaviour for the user:
> If the user has debugged a first process with orig_exe,
> then the user copied orig_exe to copy_orig_exe, and then GDB is
> attached to a process that runs copy_orig_exe, the user does not expect
> to have orig_exe protected/accessed anymore, and so might change it
> or remove it or ..., while GDB still use orig_exe instead of copy_orig_exe.

But this seems like a pretty benign problem?  But I'm not sure
I understood it.  What exactly goes wrong in this scenario?

>
> So, I was wondering if such a case of equal build ID
> but different (local?) file names are not worth a warning.

IMO it isn't, because it is very common to have different
filenames (if you consider the whole path) for executable
loaded in gdb compared to the executable that the process is
running when you consider remote debugging.

>
>>
>> I'm thinking, if we support build ID validation, do we really want
>> to fallback to filename validation?  It seems to me that it causes
>> more false positives than desirable.
> You mean that the filename comparison is useless (or even harmful)
> if we found the build ID in the files ?
> Effectively, if build ID are different but filenames are equal,
> that is likely a false positive 'file are matching'
> (only possible in remote debugging setup I suppose).

No, I mean, let's consider the feature from scratch again.
I'm saying that IMHO filename comparison on its own is pretty
weak and annoyingly chatty.  I'd think e.g., a basename
match + segments match (compare addresses and sizes of
of text, data, etc, segments) would already be much better.
But that's a path that's been considered in all other scenarios
where we have to match binaries, and ultimately, build ID
was invented to fix this kind of scenario without heuristics,
because heuristics can always fail.  

So given that we can do buildid matching, shouldn't we just forget
all other kinds of matching, and just stick with build id matching,
with no fallback?  I.e., add build id matching, remove the filename
matching, and raise the bar for any fallback matching -- as in if
you want some fallback, it has to be better than just filenames.

IIRC, the main motivation for the feature is when you attach to
a process running bar, while you have foo (completely unrelated to bar)
loaded in gdb.  GDB previously would assume that foo is the symbol file
for bar, so it gladly continued debugging bar with the foo binary.
Buildid detects this, and also detects the scenario of attaching to
a process that is running an older version of bar than the version
you have loaded in gdb (because you rebuilt the program before
attaching, for example).

More contrived use cases can be imagined, but it seems to me like
if you want to catch them, then you're better off making sure your
binaries include build ids.  Which is true by default on modern
GNU/Linux OSs at least.

Thanks,
Pedro Alves

Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Sun, 2020-05-17 at 20:50 +0100, Pedro Alves wrote:

> > E.g. I am wondering if the below will be visible and cause
> > an (understandable) warning/error/behaviour for the user:
> > If the user has debugged a first process with orig_exe,
> > then the user copied orig_exe to copy_orig_exe, and then GDB is
> > attached to a process that runs copy_orig_exe, the user does not expect
> > to have orig_exe protected/accessed anymore, and so might change it
> > or remove it or ..., while GDB still use orig_exe instead of copy_orig_exe.
>
> But this seems like a pretty benign problem?  But I'm not sure
> I understood it.  What exactly goes wrong in this scenario?
The user expects orig_exe to not be 'busy' anymore, and so
expects to be able to freely modify it, without e.g. impacting
the GDB session debugging the executable running copy_orig_exe.
(I guess that orig_exe will not cause 'Text busy' error, as no
process is still executing it from the kernel point of view).

>
> > So, I was wondering if such a case of equal build ID
> > but different (local?) file names are not worth a warning.
>
> IMO it isn't, because it is very common to have different
> filenames (if you consider the whole path) for executable
> loaded in gdb compared to the executable that the process is
> running when you consider remote debugging.
>
> > > I'm thinking, if we support build ID validation, do we really want
> > > to fallback to filename validation?  It seems to me that it causes
> > > more false positives than desirable.
> > You mean that the filename comparison is useless (or even harmful)
> > if we found the build ID in the files ?
> > Effectively, if build ID are different but filenames are equal,
> > that is likely a false positive 'file are matching'
> > (only possible in remote debugging setup I suppose).
>
> No, I mean, let's consider the feature from scratch again.
> I'm saying that IMHO filename comparison on its own is pretty
> weak and annoyingly chatty.  I'd think e.g., a basename
> match + segments match (compare addresses and sizes of
> of text, data, etc, segments) would already be much better.
> But that's a path that's been considered in all other scenarios
> where we have to match binaries, and ultimately, build ID
> was invented to fix this kind of scenario without heuristics,
> because heuristics can always fail.  
>
> So given that we can do buildid matching, shouldn't we just forget
> all other kinds of matching, and just stick with build id matching,
> with no fallback?  I.e., add build id matching, remove the filename
> matching, and raise the bar for any fallback matching -- as in if
> you want some fallback, it has to be better than just filenames.
>
> IIRC, the main motivation for the feature is when you attach to
> a process running bar, while you have foo (completely unrelated to bar)
> loaded in gdb.  GDB previously would assume that foo is the symbol file
> for bar, so it gladly continued debugging bar with the foo binary.
> Buildid detects this, and also detects the scenario of attaching to
> a process that is running an older version of bar than the version
> you have loaded in gdb (because you rebuilt the program before
> attaching, for example).
>
> More contrived use cases can be imagined, but it seems to me like
> if you want to catch them, then you're better off making sure your
> binaries include build ids.  Which is true by default on modern
> GNU/Linux OSs at least.
At my work, objdump -h some_exe does not show a build ID, not clear
why (RHEL 7.8, but using gold linker from Adacore gnatpro).

So, my main original use case needs filename comparison :(.

Philippe


Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On 5/17/20 9:11 PM, Philippe Waroquiers wrote:

> On Sun, 2020-05-17 at 20:50 +0100, Pedro Alves wrote:
>>> E.g. I am wondering if the below will be visible and cause
>>> an (understandable) warning/error/behaviour for the user:
>>> If the user has debugged a first process with orig_exe,
>>> then the user copied orig_exe to copy_orig_exe, and then GDB is
>>> attached to a process that runs copy_orig_exe, the user does not expect
>>> to have orig_exe protected/accessed anymore, and so might change it
>>> or remove it or ..., while GDB still use orig_exe instead of copy_orig_exe.
>>
>> But this seems like a pretty benign problem?  But I'm not sure
>> I understood it.  What exactly goes wrong in this scenario?
> The user expects orig_exe to not be 'busy' anymore, and so
> expects to be able to freely modify it, without e.g. impacting
> the GDB session debugging the executable running copy_orig_exe.
> (I guess that orig_exe will not cause 'Text busy' error, as no
> process is still executing it from the kernel point of view).

Do you really see these "Text busy" errors nowadays?  I don't
think I ever saw those on GNU/Linux.

Still, I'm not seeing the same kind of problem that ending
up with the wrong binary loaded in GDB causes.  If you end
up with the wrong binary loaded in GDB, then GDB may
for example install breakpoints at the wrong addresses,
and that may even cause the inferior to crash, because the
breakpoint address may fall in the middle of instructions,
resulting in the inferior potentially executing invalid
instructions, or worse, executing valid instructions with
disastrous side effects.

The type of problem you're describing seems more like an
annoyance, which will be detected some other way ("Text busy"
or some other side effect), and the user can still fix it,
with e.g., the "file" command.

>
>>
>>> So, I was wondering if such a case of equal build ID
>>> but different (local?) file names are not worth a warning.
>>
>> IMO it isn't, because it is very common to have different
>> filenames (if you consider the whole path) for executable
>> loaded in gdb compared to the executable that the process is
>> running when you consider remote debugging.
>>
>>>> I'm thinking, if we support build ID validation, do we really want
>>>> to fallback to filename validation?  It seems to me that it causes
>>>> more false positives than desirable.
>>> You mean that the filename comparison is useless (or even harmful)
>>> if we found the build ID in the files ?
>>> Effectively, if build ID are different but filenames are equal,
>>> that is likely a false positive 'file are matching'
>>> (only possible in remote debugging setup I suppose).
>>
>> No, I mean, let's consider the feature from scratch again.
>> I'm saying that IMHO filename comparison on its own is pretty
>> weak and annoyingly chatty.  I'd think e.g., a basename
>> match + segments match (compare addresses and sizes of
>> of text, data, etc, segments) would already be much better.
>> But that's a path that's been considered in all other scenarios
>> where we have to match binaries, and ultimately, build ID
>> was invented to fix this kind of scenario without heuristics,
>> because heuristics can always fail.  
>>
>> So given that we can do buildid matching, shouldn't we just forget
>> all other kinds of matching, and just stick with build id matching,
>> with no fallback?  I.e., add build id matching, remove the filename
>> matching, and raise the bar for any fallback matching -- as in if
>> you want some fallback, it has to be better than just filenames.
>>
>> IIRC, the main motivation for the feature is when you attach to
>> a process running bar, while you have foo (completely unrelated to bar)
>> loaded in gdb.  GDB previously would assume that foo is the symbol file
>> for bar, so it gladly continued debugging bar with the foo binary.
>> Buildid detects this, and also detects the scenario of attaching to
>> a process that is running an older version of bar than the version
>> you have loaded in gdb (because you rebuilt the program before
>> attaching, for example).
>>
>> More contrived use cases can be imagined, but it seems to me like
>> if you want to catch them, then you're better off making sure your
>> binaries include build ids.  Which is true by default on modern
>> GNU/Linux OSs at least.
> At my work, objdump -h some_exe does not show a build ID, not clear
> why (RHEL 7.8, but using gold linker from Adacore gnatpro).
>
> So, my main original use case needs filename comparison :(.

According to:

 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/compiling-build-id

"Each executable or shared library built with Red Hat Enterprise Linux Server 6 or later is assigned a unique identification 160-bit SHA-1 string, generated as a checksum of selected parts of the binary. "

Maybe older gold versions didn't emit the build id by default, while
GNU ld did.  I tried it with master gold, and it emits the build id
by default.  does explicitly specifying --build-id on the link work?
Since you're already not using the default tools, you could tweak
your build system to explicitly request a build id?

> So, my main original use case needs filename comparison :(.

I think that doesn't follow -- you could say that the build id
isn't sufficient for you, and that you need a fallback, but
that doesn't mean that the fallback must be the straight
full path filename comparison as is it today.

Thanks,
Pedro Alves

Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Sun, 2020-05-17 at 22:19 +0100, Pedro Alves wrote:

> On 5/17/20 9:11 PM, Philippe Waroquiers wrote:
> > On Sun, 2020-05-17 at 20:50 +0100, Pedro Alves wrote:
> > > > E.g. I am wondering if the below will be visible and cause
> > > > an (understandable) warning/error/behaviour for the user:
> > > > If the user has debugged a first process with orig_exe,
> > > > then the user copied orig_exe to copy_orig_exe, and then GDB is
> > > > attached to a process that runs copy_orig_exe, the user does not expect
> > > > to have orig_exe protected/accessed anymore, and so might change it
> > > > or remove it or ..., while GDB still use orig_exe instead of copy_orig_exe.
> > >
> > > But this seems like a pretty benign problem?  But I'm not sure
> > > I understood it.  What exactly goes wrong in this scenario?
> > The user expects orig_exe to not be 'busy' anymore, and so
> > expects to be able to freely modify it, without e.g. impacting
> > the GDB session debugging the executable running copy_orig_exe.
> > (I guess that orig_exe will not cause 'Text busy' error, as no
> > process is still executing it from the kernel point of view).
>
> Do you really see these "Text busy" errors nowadays?  I don't
> think I ever saw those on GNU/Linux.
The below reproduces it for me:
philippe@md:~$ cp /bin/sleep mysleep
philippe@md:~$ ./mysleep 100&
[1] 7721
philippe@md:~$ echo coucou > mysleep
bash: mysleep: Text file busy
philippe@md:~$ cat /etc/debian_version
10.4
philippe@md:~$

Maybe typical linkers renaming or removing
the exe file before re-creating it, and thereby avoiding (most of)
text busy errors ?

>
> Still, I'm not seeing the same kind of problem that ending
> up with the wrong binary loaded in GDB causes.  If you end
> up with the wrong binary loaded in GDB, then GDB may
> for example install breakpoints at the wrong addresses,
> and that may even cause the inferior to crash, because the
> breakpoint address may fall in the middle of instructions,
> resulting in the inferior potentially executing invalid
> instructions, or worse, executing valid instructions with
> disastrous side effects.
If the executable file is modified while GDB is busy using it,
could that not cause some problems ?



> > So, my main original use case needs filename comparison :(.
>
> According to:
>
>  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/compiling-build-id
>
> "Each executable or shared library built with Red Hat Enterprise Linux Server 6 or later is assigned a unique identification 160-bit SHA-1 string, generated as a checksum of selected parts of the binary. "
>
> Maybe older gold versions didn't emit the build id by default, while
> GNU ld did.  I tried it with master gold, and it emits the build id
> by default.  does explicitly specifying --build-id on the link work?
> Since you're already not using the default tools, you could tweak
> your build system to explicitly request a build id?
I will check tomorrow if I can persuade the build system
to generate a build ID.
If yes (and assuming all what we have to debug but we do not build
ourselves has a build ID), then build ID will cover my use case.

>
> > So, my main original use case needs filename comparison :(.
>
> I think that doesn't follow -- you could say that the build id
> isn't sufficient for you, and that you need a fallback, but
> that doesn't mean that the fallback must be the straight
> full path filename comparison as is it today.

The filename comparison was an easy way to cover the cases I saw,
reasonably OS independent, while build ID is more precise
but not working everywhere, so a fallback (whatever it is) for
missing build ID situations would be useful.

Thanks

Philippe


Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On 5/17/20 10:43 PM, Philippe Waroquiers via Gdb wrote:

> On Sun, 2020-05-17 at 22:19 +0100, Pedro Alves wrote:
>> On 5/17/20 9:11 PM, Philippe Waroquiers wrote:
>>> On Sun, 2020-05-17 at 20:50 +0100, Pedro Alves wrote:
>>>>> E.g. I am wondering if the below will be visible and cause
>>>>> an (understandable) warning/error/behaviour for the user:
>>>>> If the user has debugged a first process with orig_exe,
>>>>> then the user copied orig_exe to copy_orig_exe, and then GDB is
>>>>> attached to a process that runs copy_orig_exe, the user does not expect
>>>>> to have orig_exe protected/accessed anymore, and so might change it
>>>>> or remove it or ..., while GDB still use orig_exe instead of copy_orig_exe.
>>>>
>>>> But this seems like a pretty benign problem?  But I'm not sure
>>>> I understood it.  What exactly goes wrong in this scenario?
>>> The user expects orig_exe to not be 'busy' anymore, and so
>>> expects to be able to freely modify it, without e.g. impacting
>>> the GDB session debugging the executable running copy_orig_exe.
>>> (I guess that orig_exe will not cause 'Text busy' error, as no
>>> process is still executing it from the kernel point of view).
>>
>> Do you really see these "Text busy" errors nowadays?  I don't
>> think I ever saw those on GNU/Linux.
> The below reproduces it for me:
> philippe@md:~$ cp /bin/sleep mysleep
> philippe@md:~$ ./mysleep 100&
> [1] 7721
> philippe@md:~$ echo coucou > mysleep
> bash: mysleep: Text file busy
> philippe@md:~$ cat /etc/debian_version
> 10.4
> philippe@md:~$

Ah, OK.  Yeah, OK if you try to write directly to the file
like that.  But no one does that.

>
> Maybe typical linkers renaming or removing
> the exe file before re-creating it, and thereby avoiding (most of)
> text busy errors ?

Yeah, gcc/ld unlink the file before recreating it, so in essence
they don't write to the preexisting inode, they create a new one.

>
>>
>> Still, I'm not seeing the same kind of problem that ending
>> up with the wrong binary loaded in GDB causes.  If you end
>> up with the wrong binary loaded in GDB, then GDB may
>> for example install breakpoints at the wrong addresses,
>> and that may even cause the inferior to crash, because the
>> breakpoint address may fall in the middle of instructions,
>> resulting in the inferior potentially executing invalid
>> instructions, or worse, executing valid instructions with
>> disastrous side effects.
> If the executable file is modified while GDB is busy using it,
> could that not cause some problems ?

How does filename detection help with this?  If GDB is busy
using it, you're already past the initial filename verification.
And if you modify the binary but don't change the filename,
then the filename verification doesn't help either.  There
are many ways to shoot yourself in the foot, I don't think
we need to protect against all of them.

>
>
>
>>> So, my main original use case needs filename comparison :(.
>>
>> According to:
>>
>>  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/compiling-build-id
>>
>> "Each executable or shared library built with Red Hat Enterprise Linux Server 6 or later is assigned a unique identification 160-bit SHA-1 string, generated as a checksum of selected parts of the binary. "
>>
>> Maybe older gold versions didn't emit the build id by default, while
>> GNU ld did.  I tried it with master gold, and it emits the build id
>> by default.  does explicitly specifying --build-id on the link work?
>> Since you're already not using the default tools, you could tweak
>> your build system to explicitly request a build id?
> I will check tomorrow if I can persuade the build system
> to generate a build ID.
> If yes (and assuming all what we have to debug but we do not build
> ourselves has a build ID), then build ID will cover my use case.

Great, thanks.

>
>>
>>> So, my main original use case needs filename comparison :(.
>>
>> I think that doesn't follow -- you could say that the build id
>> isn't sufficient for you, and that you need a fallback, but
>> that doesn't mean that the fallback must be the straight
>> full path filename comparison as is it today.
>
> The filename comparison was an easy way to cover the cases I saw,
> reasonably OS independent, while build ID is more precise
> but not working everywhere, so a fallback (whatever it is) for
> missing build ID situations would be useful.

OK.  But I argue that the filename matching is more harmful than
helpful (see e.g., the remote debugging scenarios I presented).
I would rather remove it before a release is out with it, and
consider a better fallback if one is found & needed.  If
we keep it, we also have to come up with ways to unbreak the
testsuite on the different configurations that are presently
broken for it.  That alone would be sufficient grounds for a
reversion until it is fixed, IMHO.

Thanks,
Pedro Alves

Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On Sun, 2020-05-17 at 22:58 +0100, Pedro Alves wrote:
> If the executable file is modified while GDB is busy using it,
> > could that not cause some problems ?
>
> How does filename detection help with this?  If GDB is busy
> using it, you're already past the initial filename verification.
> And if you modify the binary but don't change the filename,
> then the filename verification doesn't help either.  There
> are many ways to shoot yourself in the foot, I don't think
> we need to protect against all of them.
I think we have 2 different aspects:
  * can GDB detect and protect against all problems ? Answer is no :).
  * can GDB silently do something that might afterwards lead
    to unexpected behaviour ? IMO, answer is preferably no:
    If GDB does something (like *not* using the file that
    a process is using), IMO, GDB should at least tell that to the user.
    (like GDB is telling to the user that it is reloading a changed file).
 

>
> >
> >
> > > > So, my main original use case needs filename comparison :(.
> > >
> > > According to:
> > >
> > >  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/compiling-build-id
> > >
> > > "Each executable or shared library built with Red Hat Enterprise Linux Server 6 or later is assigned a unique identification 160-bit SHA-1 string, generated as a checksum of selected parts of the binary. "
> > >
> > > Maybe older gold versions didn't emit the build id by default, while
> > > GNU ld did.  I tried it with master gold, and it emits the build id
> > > by default.  does explicitly specifying --build-id on the link work?
> > > Since you're already not using the default tools, you could tweak
> > > your build system to explicitly request a build id?
> > I will check tomorrow if I can persuade the build system
> > to generate a build ID.
> > If yes (and assuming all what we have to debug but we do not build
> > ourselves has a build ID), then build ID will cover my use case.
I managed to have a build ID generated by adding --build-id linker arg,
thanks for the hint of explicitly passing --build-id.

For info, these are the linker versions:
gnatpro-20.0-20191009-L7/libexec/gcc/x86_64-pc-linux-gnu/8.3.1/ld.gold --version
GNU gold (GNU Binutils 2.30.52.20180618) 1.15
gnatpro-20.0-20191009-L7/libexec/gcc/x86_64-pc-linux-gnu/8.3.1/ld --version
GNU ld (GNU Binutils) 2.30.52.20180618


So, comparing build id is good enough for my @work use case.


> OK.  But I argue that the filename matching is more harmful than
> helpful (see e.g., the remote debugging scenarios I presented).
> I would rather remove it before a release is out with it, and
> consider a better fallback if one is found & needed.  If
> we keep it, we also have to come up with ways to unbreak the
> testsuite on the different configurations that are presently
> broken for it.  That alone would be sufficient grounds for a
> reversion until it is fixed, IMHO.
GDB silently keeping a wrong exec-file e.g. when using attach
is very confusing.
The exec-file-mismatch is supposed to avoid such confusion, and
if that confusion can be avoided when there are no build ids,
that is preferable IMO.

Maybe what we can do is:
 - If build ids are equal, then no problem (and no need to compare file names).
 - If build ids are different, then ask or warn user about mismatch
   (and no need to compare file names).
 - If build ids are not available, then try to detect confusing situation via
   some fallback (such as the filename comparison).

Will that not solve (most of) the problems of the testsuite/remote debugging
and detect (more) exec file mismatches in various setups/platforms
(like the default setup at my work) ?

If the filename fallback is giving too much problems but only
in case of remote setup, we can maybe disable file name comparisons
in such remote debugging.

But as said above, for my @work use case, I (selfishly :)) see that
build ids are good enough for me.

So, if you think filename comparison will do more harm than good,
fine for me to remove it.

Thanks for the build id patch/investigation/help

Philippe




Reply | Threaded
Open this post in threaded view
|

Re: exec-file-mismatch and native-gdbserver testing

Sourceware - gdb list mailing list
On 5/18/20 11:35 AM, Philippe Waroquiers via Gdb wrote:

> On Sun, 2020-05-17 at 22:58 +0100, Pedro Alves wrote:
>> If the executable file is modified while GDB is busy using it,
>>> could that not cause some problems ?
>>
>> How does filename detection help with this?  If GDB is busy
>> using it, you're already past the initial filename verification.
>> And if you modify the binary but don't change the filename,
>> then the filename verification doesn't help either.  There
>> are many ways to shoot yourself in the foot, I don't think
>> we need to protect against all of them.
> I think we have 2 different aspects:
>   * can GDB detect and protect against all problems ? Answer is no :).
>   * can GDB silently do something that might afterwards lead
>     to unexpected behaviour ? IMO, answer is preferably no:
>     If GDB does something (like *not* using the file that
>     a process is using), IMO, GDB should at least tell that to the user.
>     (like GDB is telling to the user that it is reloading a changed file).
>  
>>
>>>
>>>
>>>>> So, my main original use case needs filename comparison :(.
>>>>
>>>> According to:
>>>>
>>>>  https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/developer_guide/compiling-build-id
>>>>
>>>> "Each executable or shared library built with Red Hat Enterprise Linux Server 6 or later is assigned a unique identification 160-bit SHA-1 string, generated as a checksum of selected parts of the binary. "
>>>>
>>>> Maybe older gold versions didn't emit the build id by default, while
>>>> GNU ld did.  I tried it with master gold, and it emits the build id
>>>> by default.  does explicitly specifying --build-id on the link work?
>>>> Since you're already not using the default tools, you could tweak
>>>> your build system to explicitly request a build id?
>>> I will check tomorrow if I can persuade the build system
>>> to generate a build ID.
>>> If yes (and assuming all what we have to debug but we do not build
>>> ourselves has a build ID), then build ID will cover my use case.
> I managed to have a build ID generated by adding --build-id linker arg,
> thanks for the hint of explicitly passing --build-id.
>
> For info, these are the linker versions:
> gnatpro-20.0-20191009-L7/libexec/gcc/x86_64-pc-linux-gnu/8.3.1/ld.gold --version
> GNU gold (GNU Binutils 2.30.52.20180618) 1.15
> gnatpro-20.0-20191009-L7/libexec/gcc/x86_64-pc-linux-gnu/8.3.1/ld --version
> GNU ld (GNU Binutils) 2.30.52.20180618
>
>
> So, comparing build id is good enough for my @work use case.

Great, thanks for confirming that!

>
>
>> OK.  But I argue that the filename matching is more harmful than
>> helpful (see e.g., the remote debugging scenarios I presented).
>> I would rather remove it before a release is out with it, and
>> consider a better fallback if one is found & needed.  If
>> we keep it, we also have to come up with ways to unbreak the
>> testsuite on the different configurations that are presently
>> broken for it.  That alone would be sufficient grounds for a
>> reversion until it is fixed, IMHO.
> GDB silently keeping a wrong exec-file e.g. when using attach
> is very confusing.
> The exec-file-mismatch is supposed to avoid such confusion, and
> if that confusion can be avoided when there are no build ids,
> that is preferable IMO.
>
> Maybe what we can do is:
>  - If build ids are equal, then no problem (and no need to compare file names).
>  - If build ids are different, then ask or warn user about mismatch
>    (and no need to compare file names).
>  - If build ids are not available, then try to detect confusing situation via
>    some fallback (such as the filename comparison).

This was actually what my patch was doing.  If we could compare build ids
and that detected a mismatch, then no filename comparison was made.
What I think is subpar is that the fallback is purely based on comparing
filenames.

>
> Will that not solve (most of) the problems of the testsuite/remote debugging
> and detect (more) exec file mismatches in various setups/platforms
> (like the default setup at my work) ?
>
> If the filename fallback is giving too much problems but only
> in case of remote setup, we can maybe disable file name comparisons
> in such remote debugging.
>
> But as said above, for my @work use case, I (selfishly :)) see that
> build ids are good enough for me.
>
> So, if you think filename comparison will do more harm than good,
> fine for me to remove it.

Alright, I really think the filename comparison is too simplistic
and causes more harm than good (especially the querying).  I've updated
the patch over at gdb-patches to remove it for now.  I'd say, only
revisit if we can't convince people that they should enable build ids.
Over time, given that the GNU toolchain (at least) enables
it by default nowadays, I think that the issue will sort itself
out, and we can just forget about fallbacks.  :-)

> Thanks for the build id patch/investigation/help
Thanks,
Pedro Alves