[PATCH] gdb: Don't hard code 0 as end marker in GDB's line table

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] gdb: Don't hard code 0 as end marker in GDB's line table

Andrew Burgess
Currently GDB hard-codes the use of 0 as the end of sequence marker in
its line tables.  After this commit GDB now uses a named
constant (linetable_entry::end_marker) as the end of sequence marker.
The value of this constant is -1, not 0.  This change was made in
order to aid fixing bug PR gdb/26243.

Bug PR gdb/26243 is about allowing line number 0 to be used to
indicate a program address that has no corresponding source line (this
use is specified in the DWARF standard).  Currently GDB can't use line
number 0 as this line number is hard coded as the end of sequence
marker, but, after this commit line number 0 no longer has any special
meaning.

This commit does not fix PR gdb/26243, but it is step towards allowing
that issue to be fixed.

gdb/ChangeLog:

        PR gdb/26243
        * buildsym.c (buildsym_compunit::record_line): Add an assert for
        the incoming line number.  Update comments to not mention 0
        specifically.  Update to check for linetable_entry::end_marker
        rather than 0.
        * disasm.c (do_mixed_source_and_assembly_deprecated): Check for
        linetable_entry::end_marker not 0.
        (do_mixed_source_and_assembly): Likewise.
        * dwarf2/read.c (dwarf_finish_line): Pass
        linetable_entry::end_marker not 0.
        * mdebugread.c (add_line): Set is_stmt field.
        * python/py-linetable.c (ltpy_get_all_source_lines): Check for
        linetable_entry::end_marker not 0, update comments.
        (ltpy_iternext): Likewise.
        * record-btrace.c (btrace_find_line_range): Likewise.
        * symmisc.c (maintenance_print_one_line_table): Likewise.
        * symtab.c (find_pc_sect_line): Likewise.
        (find_line_symtab): Likewise.
        (skip_prologue_using_sal): Likewise.
        * symtab.h (linetable_entry::end_marker): New const static member
        variable.  Add a static assert for this field.
        * xcoffread.c (arrange_linetable): Check for
        linetable_entry::end_marker not 0.
---
 gdb/ChangeLog             | 26 ++++++++++++++++++++++++++
 gdb/buildsym.c            | 36 ++++++++++++++++++++----------------
 gdb/disasm.c              |  9 +++++----
 gdb/dwarf2/read.c         |  3 ++-
 gdb/mdebugread.c          |  1 +
 gdb/python/py-linetable.c | 13 +++++++------
 gdb/record-btrace.c       |  3 ++-
 gdb/symmisc.c             |  2 +-
 gdb/symtab.c              | 22 +++++++++++++---------
 gdb/symtab.h              | 12 ++++++++++++
 gdb/xcoffread.c           | 11 +++++++----
 11 files changed, 96 insertions(+), 42 deletions(-)

diff --git a/gdb/buildsym.c b/gdb/buildsym.c
index bd0ca491401..e6d4dc117c1 100644
--- a/gdb/buildsym.c
+++ b/gdb/buildsym.c
@@ -671,6 +671,10 @@ buildsym_compunit::record_line (struct subfile *subfile, int line,
 {
   struct linetable_entry *e;
 
+  /* Is this an asset?  Or is this processing user input and so should we
+     be handling, or throwing an error for invalid data?  */
+  gdb_assert (line == linetable_entry::end_marker || line >= 0);
+
   /* Make sure line vector exists and is big enough.  */
   if (!subfile->line_vector)
     {
@@ -692,20 +696,19 @@ buildsym_compunit::record_line (struct subfile *subfile, int line,
       * sizeof (struct linetable_entry))));
     }
 
-  /* Normally, we treat lines as unsorted.  But the end of sequence
-     marker is special.  We sort line markers at the same PC by line
-     number, so end of sequence markers (which have line == 0) appear
-     first.  This is right if the marker ends the previous function,
-     and there is no padding before the next function.  But it is
-     wrong if the previous line was empty and we are now marking a
-     switch to a different subfile.  We must leave the end of sequence
-     marker at the end of this group of lines, not sort the empty line
-     to after the marker.  The easiest way to accomplish this is to
-     delete any empty lines from our table, if they are followed by
-     end of sequence markers.  All we lose is the ability to set
-     breakpoints at some lines which contain no instructions
-     anyway.  */
-  if (line == 0)
+  /* Normally, we treat lines as unsorted.  But the end of sequence marker
+     is special.  We sort line markers at the same PC by line number, so
+     end of sequence markers (which have line ==
+     linetable_entry::end_marker) appear first.  This is right if the
+     marker ends the previous function, and there is no padding before the
+     next function.  But it is wrong if the previous line was empty and we
+     are now marking a switch to a different subfile.  We must leave the
+     end of sequence marker at the end of this group of lines, not sort the
+     empty line to after the marker.  The easiest way to accomplish this is
+     to delete any empty lines from our table, if they are followed by end
+     of sequence markers.  All we lose is the ability to set breakpoints at
+     some lines which contain no instructions anyway.  */
+  if (line == linetable_entry::end_marker)
     {
       while (subfile->line_vector->nitems > 0)
  {
@@ -944,8 +947,9 @@ buildsym_compunit::end_symtab_with_blockvector (struct block *static_block,
   const linetable_entry &ln2) -> bool
       {
  if (ln1.pc == ln2.pc
-    && ((ln1.line == 0) != (ln2.line == 0)))
-  return ln1.line == 0;
+    && ((ln1.line == linetable_entry::end_marker)
+ != (ln2.line == linetable_entry::end_marker)))
+  return ln1.line == linetable_entry::end_marker;
 
  return (ln1.pc < ln2.pc);
       };
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 143ba2f59b9..4b7cc88fbfb 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -383,7 +383,7 @@ do_mixed_source_and_assembly_deprecated
  continue;
 
       /* Skip any end-of-function markers.  */
-      if (le[i].line == 0)
+      if (le[i].line == linetable_entry::end_marker)
  continue;
 
       mle[newlines].line = le[i].line;
@@ -571,7 +571,7 @@ do_mixed_source_and_assembly (struct gdbarch *gdbarch,
   gdb::optional<ui_out_emit_list> list_emitter;
 
   last_symtab = NULL;
-  last_line = 0;
+  last_line = linetable_entry::end_marker;
   pc = low;
 
   while (pc < high)
@@ -591,7 +591,7 @@ do_mixed_source_and_assembly (struct gdbarch *gdbarch,
 
   /* If this is the first line of output, check for any preceding
      lines.  */
-  if (last_line == 0
+  if (last_line == linetable_entry::end_marker
       && first_le != NULL
       && first_le->line < sal.line)
     {
@@ -604,7 +604,8 @@ do_mixed_source_and_assembly (struct gdbarch *gdbarch,
   /* Same source file as last time.  */
   if (sal.symtab != NULL)
     {
-      if (sal.line > last_line + 1 && last_line != 0)
+      if (last_line != linetable_entry::end_marker
+  && sal.line > last_line + 1)
  {
   int l;
 
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 39ed455def5..bdbecf640ff 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -20390,7 +20390,8 @@ dwarf_finish_line (struct gdbarch *gdbarch, struct subfile *subfile,
   paddress (gdbarch, address));
     }
 
-  dwarf_record_line_1 (gdbarch, subfile, 0, address, true, cu);
+  dwarf_record_line_1 (gdbarch, subfile, linetable_entry::end_marker,
+       address, true, cu);
 }
 
 void
diff --git a/gdb/mdebugread.c b/gdb/mdebugread.c
index d38372041d7..01029fbd971 100644
--- a/gdb/mdebugread.c
+++ b/gdb/mdebugread.c
@@ -4516,6 +4516,7 @@ add_line (struct linetable *lt, int lineno, CORE_ADDR adr, int last)
     return lineno;
 
   lt->item[lt->nitems].line = lineno;
+  lt->item[lt->nitems].is_stmt = 1;
   lt->item[lt->nitems++].pc = adr << 2;
   return lineno;
 }
diff --git a/gdb/python/py-linetable.c b/gdb/python/py-linetable.c
index 858313bb22d..22a44fe5f36 100644
--- a/gdb/python/py-linetable.c
+++ b/gdb/python/py-linetable.c
@@ -238,10 +238,11 @@ ltpy_get_all_source_lines (PyObject *self, PyObject *args)
     {
       item = &(SYMTAB_LINETABLE (symtab)->item[index]);
 
-      /* 0 is used to signify end of line table information.  Do not
- include in the source set. */
-      if (item->line > 0)
+      /* The special value linetable_entry::end_marker is used to signify
+ end of line table information.  Do not include in the source set.  */
+      if (item->line != linetable_entry::end_marker)
  {
+  gdb_assert (item->line >= 0);
   gdbpy_ref<> line = gdb_py_object_from_longest (item->line);
 
   if (line == NULL)
@@ -407,9 +408,9 @@ ltpy_iternext (PyObject *self)
 
   item = &(SYMTAB_LINETABLE (symtab)->item[iter_obj->current_index]);
 
-  /* Skip over internal entries such as 0.  0 signifies the end of
-     line table data and is not useful to the API user.  */
-  while (item->line < 1)
+  /* Skip over internal entries such as the end of sequence marker,
+     linetable_entry::end_marker as this is not useful to the API user.  */
+  while (item->line == linetable_entry::end_marker)
     {
       iter_obj->current_index++;
 
diff --git a/gdb/record-btrace.c b/gdb/record-btrace.c
index 718de62f280..36ae671fa90 100644
--- a/gdb/record-btrace.c
+++ b/gdb/record-btrace.c
@@ -732,7 +732,8 @@ btrace_find_line_range (CORE_ADDR pc)
  possibly adding more line numbers to the range.  At the time this
  change was made I was unsure how to test this so chose to go with
  maintaining the existing experience.  */
-      if ((lines[i].pc == pc) && (lines[i].line != 0)
+      if ((lines[i].pc == pc)
+  && (lines[i].line != linetable_entry::end_marker)
   && (lines[i].is_stmt == 1))
  range = btrace_line_range_add (range, lines[i].line);
     }
diff --git a/gdb/symmisc.c b/gdb/symmisc.c
index fc56cfa9381..9b05ca2f4da 100644
--- a/gdb/symmisc.c
+++ b/gdb/symmisc.c
@@ -1036,7 +1036,7 @@ maintenance_print_one_line_table (struct symtab *symtab, void *data)
   item = &linetable->item [i];
   ui_out_emit_tuple tuple_emitter (uiout, nullptr);
   uiout->field_signed ("index", i);
-  if (item->line > 0)
+  if (item->line != linetable_entry::end_marker)
     uiout->field_signed ("line", item->line);
   else
     uiout->field_string ("line", _("END"));
diff --git a/gdb/symtab.c b/gdb/symtab.c
index f96ad9554d9..02a08481008 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -3248,7 +3248,8 @@ find_pc_sect_line (CORE_ADDR pc, struct obj_section *section, int notcurrent)
          save prev if it represents the end of a function (i.e. line number
          0) instead of a real line.  */
 
-      if (prev && prev->line && (!best || prev->pc > best->pc))
+      if (prev && prev->line != linetable_entry::end_marker
+  && (!best || prev->pc > best->pc))
  {
   best = prev;
   best_symtab = iter_s;
@@ -3264,7 +3265,8 @@ find_pc_sect_line (CORE_ADDR pc, struct obj_section *section, int notcurrent)
     {
       struct linetable_entry *tmp = best;
       while (tmp > first && (tmp - 1)->pc == tmp->pc
-     && (tmp - 1)->line != 0 && !tmp->is_stmt)
+     && (tmp - 1)->line != linetable_entry::end_marker
+     && !tmp->is_stmt)
  --tmp;
       if (tmp->is_stmt)
  best = tmp;
@@ -3291,7 +3293,7 @@ find_pc_sect_line (CORE_ADDR pc, struct obj_section *section, int notcurrent)
  don't make some up.  */
       val.pc = pc;
     }
-  else if (best->line == 0)
+  else if (best->line == linetable_entry::end_marker)
     {
       /* If our best fit is in a range of PC's for which no line
  number info is available (line number is zero) then we didn't
@@ -3378,14 +3380,14 @@ find_line_symtab (struct symtab *sym_tab, int line,
          the GLOBAL_BLOCK of a symtab has a begin and end address).  */
 
       /* BEST is the smallest linenumber > LINE so far seen,
-         or 0 if none has been seen so far.
+         or the end marker if none has been seen so far.
          BEST_INDEX and BEST_LINETABLE identify the item for it.  */
       int best;
 
       if (best_index >= 0)
  best = best_linetable->item[best_index].line;
       else
- best = 0;
+ best = linetable_entry::end_marker;
 
       for (objfile *objfile : current_program_space->objfiles ())
  {
@@ -3419,7 +3421,8 @@ find_line_symtab (struct symtab *sym_tab, int line,
   best_symtab = s;
   goto done;
  }
-      if (best == 0 || l->item[ind].line < best)
+      if (best == linetable_entry::end_marker
+  || l->item[ind].line < best)
  {
   best = l->item[ind].line;
   best_index = ind;
@@ -3718,7 +3721,8 @@ skip_prologue_using_lineinfo (CORE_ADDR func_addr, struct symtab *symtab)
       /* Don't use line numbers of zero, they mark special entries in
  the table.  See the commentary on symtab.h before the
  definition of struct linetable.  */
-      if (item->line > 0 && func_start <= item->pc && item->pc < func_end)
+      if (item->line != linetable_entry::end_marker
+  && func_start <= item->pc && item->pc < func_end)
  return item->pc;
     }
 
@@ -3946,11 +3950,11 @@ skip_prologue_using_sal (struct gdbarch *gdbarch, CORE_ADDR func_addr)
   /* Skip any earlier lines, and any end-of-sequence marker
      from a previous function.  */
   while (linetable->item[idx].pc != prologue_sal.pc
- || linetable->item[idx].line == 0)
+ || linetable->item[idx].line == linetable_entry::end_marker)
     idx++;
 
   if (idx+1 < linetable->nitems
-      && linetable->item[idx+1].line != 0
+      && linetable->item[idx+1].line != linetable_entry::end_marker
       && linetable->item[idx+1].pc == start_pc)
     return start_pc;
  }
diff --git a/gdb/symtab.h b/gdb/symtab.h
index 0b186554ea1..83d60d8d64b 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -1306,6 +1306,11 @@ struct rust_vtable_symbol : public symbol
 
 struct linetable_entry
 {
+  /* Special value placed into the LINE field to indicate an end of
+     sequence in the line table.  */
+
+  static const int end_marker = -1;
+
   /* The line number for this entry.  */
   int line;
 
@@ -1316,6 +1321,13 @@ struct linetable_entry
   CORE_ADDR pc;
 };
 
+/* Normally line numbers in a program are positive integers greater than
+   zero.  The line number 0 is reserved in DWARF to indicate instructions
+   that don't have associated source code.  The end marker then must be
+   less than zero.  */
+
+gdb_static_assert (linetable_entry::end_marker < 0);
+
 /* The order of entries in the linetable is significant.  They should
    be sorted by increasing values of the pc field.  If there is more than
    one entry for a given pc, then I'm not sure what should happen (and
diff --git a/gdb/xcoffread.c b/gdb/xcoffread.c
index a792c0fea2e..4c99c2da3ef 100644
--- a/gdb/xcoffread.c
+++ b/gdb/xcoffread.c
@@ -435,7 +435,7 @@ arrange_linetable (struct linetable *oldLineTb)
       if (oldLineTb->item[ii].is_stmt == 0)
  continue;
 
-      if (oldLineTb->item[ii].line == 0)
+      if (oldLineTb->item[ii].line == linetable_entry::end_marker)
  { /* Function entry found.  */
   if (function_count >= fentry_size)
     { /* Make sure you have room.  */
@@ -477,9 +477,11 @@ arrange_linetable (struct linetable *oldLineTb)
      a function begin.  */
 
   newline = 0;
-  if (oldLineTb->item[0].line != 0)
+  if (oldLineTb->item[0].line != linetable_entry::end_marker)
     for (newline = 0;
-    newline < oldLineTb->nitems && oldLineTb->item[newline].line; ++newline)
+ newline < oldLineTb->nitems
+   && oldLineTb->item[newline].line != linetable_entry::end_marker;
+ ++newline)
       newLineTb->item[newline] = oldLineTb->item[newline];
 
   /* Now copy function lines one by one.  */
@@ -498,7 +500,8 @@ arrange_linetable (struct linetable *oldLineTb)
  }
 
       for (jj = fentry[ii].line + 1;
-   jj < oldLineTb->nitems && oldLineTb->item[jj].line != 0;
+   jj < oldLineTb->nitems
+     && oldLineTb->item[jj].line != linetable_entry::end_marker;
    ++jj, ++newline)
  newLineTb->item[newline] = oldLineTb->item[jj];
     }
--
2.25.4

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] gdb: Don't hard code 0 as end marker in GDB's line table

Simon Marchi-4
On 2020-07-20 8:55 a.m., Andrew Burgess wrote:

> Currently GDB hard-codes the use of 0 as the end of sequence marker in
> its line tables.  After this commit GDB now uses a named
> constant (linetable_entry::end_marker) as the end of sequence marker.
> The value of this constant is -1, not 0.  This change was made in
> order to aid fixing bug PR gdb/26243.
>
> Bug PR gdb/26243 is about allowing line number 0 to be used to
> indicate a program address that has no corresponding source line (this
> use is specified in the DWARF standard).  Currently GDB can't use line
> number 0 as this line number is hard coded as the end of sequence
> marker, but, after this commit line number 0 no longer has any special
> meaning.
>
> This commit does not fix PR gdb/26243, but it is step towards allowing
> that issue to be fixed.
>
> gdb/ChangeLog:
>
> PR gdb/26243
> * buildsym.c (buildsym_compunit::record_line): Add an assert for
> the incoming line number.  Update comments to not mention 0
> specifically.  Update to check for linetable_entry::end_marker
> rather than 0.
> * disasm.c (do_mixed_source_and_assembly_deprecated): Check for
> linetable_entry::end_marker not 0.
> (do_mixed_source_and_assembly): Likewise.
> * dwarf2/read.c (dwarf_finish_line): Pass
> linetable_entry::end_marker not 0.
> * mdebugread.c (add_line): Set is_stmt field.
> * python/py-linetable.c (ltpy_get_all_source_lines): Check for
> linetable_entry::end_marker not 0, update comments.
> (ltpy_iternext): Likewise.
> * record-btrace.c (btrace_find_line_range): Likewise.
> * symmisc.c (maintenance_print_one_line_table): Likewise.
> * symtab.c (find_pc_sect_line): Likewise.
> (find_line_symtab): Likewise.
> (skip_prologue_using_sal): Likewise.
> * symtab.h (linetable_entry::end_marker): New const static member
> variable.  Add a static assert for this field.
> * xcoffread.c (arrange_linetable): Check for
> linetable_entry::end_marker not 0.

I think the idea is good.  But we need to make sure to audit all the places
that use linetable_entry::line to see if they now need to treat the value 0
specially.  One case that came to mind while reading your patch is
"disassemble /s".  Here's an example using bug 26243's reproducer, showing
the end of "disassemble /s tree_insert".  The instruction at 0x40135f maps
to line 0.

This is with GDB 8.3:

---
38            else
39              root->right = make_node(val);
   0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
   0x000000000040134d <+141>:   callq  0x401250 <make_node(int)>
   0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
   0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)

40          }
   0x000000000040135a <+154>:   jmpq   0x40135f <tree_insert(node*, int)+159>
   0x000000000040135f <+159>:   jmpq   0x401364 <tree_insert(node*, int)+164>

41      }
   0x0000000000401364 <+164>:   add    $0x10,%rsp
   0x0000000000401368 <+168>:   pop    %rbp
   0x0000000000401369 <+169>:   retq
---

This made 0x40135f appear as if it was part of the previous line.

This is with GDB master:

---
38            else
39              root->right = make_node(val);
   0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
   0x000000000040134d <+141>:   call   0x401250 <_Z9make_nodei>
   0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
   0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)

40          }
   0x000000000040135a <+154>:   jmp    0x40135f <_Z11tree_insertP4nodei+159>

unknown:
--- no source info for this pc ---
   0x000000000040135f <+159>:   jmp    0x401364 <_Z11tree_insertP4nodei+164>

test.cpp:
26      {
27        if (val < root->id)
28          {
29            if (root->left)
30              tree_insert (root->left, val);
31            else
32              root->left = make_node(val);
33          }
34        else if (val > root->id)
35          {
36            if (root->right)
37              tree_insert (root->right, val);
38            else
39              root->right = make_node(val);
40          }
41      }
   0x0000000000401364 <+164>:   add    $0x10,%rsp
   0x0000000000401368 <+168>:   pop    %rbp
   0x0000000000401369 <+169>:   ret
End of assembler dump.
---

I don't know what happens here, but it doesn't look good.  The "no source info for this pc"
part is ok, but the "unknown:" and source lines 26-40 look wrong.

This is with GDB master + your patch:

---
38            else
39              root->right = make_node(val);
   0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
   0x000000000040134d <+141>:   call   0x401250 <_Z9make_nodei>
   0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
   0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)

40          }
   0x000000000040135a <+154>:   jmp    0x40135f <_Z11tree_insertP4nodei+159>

Line number 0 out of range; test.cpp has 80 lines.
---

The disassembly gets interrupted, because the line number is invalid.

In this case, line 0 should probably get handled specially so that we print
the "no source info for this pc" message and keep printing the rest correctly.

So, I think we just need to be careful and look at the possible behavior changes
of all these spots, not just blindly change `line == 0` for `line == end_marker`.

Simon
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] gdb: Don't hard code 0 as end marker in GDB's line table

Andrew Burgess
* Simon Marchi <[hidden email]> [2020-07-20 12:01:21 -0400]:

> On 2020-07-20 8:55 a.m., Andrew Burgess wrote:
> > Currently GDB hard-codes the use of 0 as the end of sequence marker in
> > its line tables.  After this commit GDB now uses a named
> > constant (linetable_entry::end_marker) as the end of sequence marker.
> > The value of this constant is -1, not 0.  This change was made in
> > order to aid fixing bug PR gdb/26243.
> >
> > Bug PR gdb/26243 is about allowing line number 0 to be used to
> > indicate a program address that has no corresponding source line (this
> > use is specified in the DWARF standard).  Currently GDB can't use line
> > number 0 as this line number is hard coded as the end of sequence
> > marker, but, after this commit line number 0 no longer has any special
> > meaning.
> >
> > This commit does not fix PR gdb/26243, but it is step towards allowing
> > that issue to be fixed.
> >
> > gdb/ChangeLog:
> >
> > PR gdb/26243
> > * buildsym.c (buildsym_compunit::record_line): Add an assert for
> > the incoming line number.  Update comments to not mention 0
> > specifically.  Update to check for linetable_entry::end_marker
> > rather than 0.
> > * disasm.c (do_mixed_source_and_assembly_deprecated): Check for
> > linetable_entry::end_marker not 0.
> > (do_mixed_source_and_assembly): Likewise.
> > * dwarf2/read.c (dwarf_finish_line): Pass
> > linetable_entry::end_marker not 0.
> > * mdebugread.c (add_line): Set is_stmt field.
> > * python/py-linetable.c (ltpy_get_all_source_lines): Check for
> > linetable_entry::end_marker not 0, update comments.
> > (ltpy_iternext): Likewise.
> > * record-btrace.c (btrace_find_line_range): Likewise.
> > * symmisc.c (maintenance_print_one_line_table): Likewise.
> > * symtab.c (find_pc_sect_line): Likewise.
> > (find_line_symtab): Likewise.
> > (skip_prologue_using_sal): Likewise.
> > * symtab.h (linetable_entry::end_marker): New const static member
> > variable.  Add a static assert for this field.
> > * xcoffread.c (arrange_linetable): Check for
> > linetable_entry::end_marker not 0.
>
> I think the idea is good.  But we need to make sure to audit all the places
> that use linetable_entry::line to see if they now need to treat the value 0
> specially.  One case that came to mind while reading your patch is
> "disassemble /s".  Here's an example using bug 26243's reproducer, showing
> the end of "disassemble /s tree_insert".  The instruction at 0x40135f maps
> to line 0.
>
> This is with GDB 8.3:
>
> ---
> 38            else
> 39              root->right = make_node(val);
>    0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
>    0x000000000040134d <+141>:   callq  0x401250 <make_node(int)>
>    0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
>    0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)
>
> 40          }
>    0x000000000040135a <+154>:   jmpq   0x40135f <tree_insert(node*, int)+159>
>    0x000000000040135f <+159>:   jmpq   0x401364 <tree_insert(node*, int)+164>
>
> 41      }
>    0x0000000000401364 <+164>:   add    $0x10,%rsp
>    0x0000000000401368 <+168>:   pop    %rbp
>    0x0000000000401369 <+169>:   retq
> ---
>
> This made 0x40135f appear as if it was part of the previous line.
>
> This is with GDB master:
>
> ---
> 38            else
> 39              root->right = make_node(val);
>    0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
>    0x000000000040134d <+141>:   call   0x401250 <_Z9make_nodei>
>    0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
>    0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)
>
> 40          }
>    0x000000000040135a <+154>:   jmp    0x40135f <_Z11tree_insertP4nodei+159>
>
> unknown:
> --- no source info for this pc ---
>    0x000000000040135f <+159>:   jmp    0x401364 <_Z11tree_insertP4nodei+164>
>
> test.cpp:
> 26      {
> 27        if (val < root->id)
> 28          {
> 29            if (root->left)
> 30              tree_insert (root->left, val);
> 31            else
> 32              root->left = make_node(val);
> 33          }
> 34        else if (val > root->id)
> 35          {
> 36            if (root->right)
> 37              tree_insert (root->right, val);
> 38            else
> 39              root->right = make_node(val);
> 40          }
> 41      }
>    0x0000000000401364 <+164>:   add    $0x10,%rsp
>    0x0000000000401368 <+168>:   pop    %rbp
>    0x0000000000401369 <+169>:   ret
> End of assembler dump.
> ---
>
> I don't know what happens here, but it doesn't look good.  The "no source info for this pc"
> part is ok, but the "unknown:" and source lines 26-40 look wrong.
>
> This is with GDB master + your patch:
>
> ---
> 38            else
> 39              root->right = make_node(val);
>    0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
>    0x000000000040134d <+141>:   call   0x401250 <_Z9make_nodei>
>    0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
>    0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)
>
> 40          }
>    0x000000000040135a <+154>:   jmp    0x40135f <_Z11tree_insertP4nodei+159>
>
> Line number 0 out of range; test.cpp has 80 lines.
> ---
>
> The disassembly gets interrupted, because the line number is invalid.
>
> In this case, line 0 should probably get handled specially so that we print
> the "no source info for this pc" message and keep printing the rest correctly.
>
> So, I think we just need to be careful and look at the possible behavior changes
> of all these spots, not just blindly change `line == 0` for `line == end_marker`.

Thanks for pointing this out.  I'll investigate.

I'm confused as to how we could ever have been handling 'line == 0'
as anything other than 'line == end_marker', or how these 'line == 0'
entries were coming from, but I'll start with the example you gave,
and I'm sure it'll make sense.

Thanks,
Andrew
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] gdb: Don't hard code 0 as end marker in GDB's line table

Andrew Burgess
In reply to this post by Simon Marchi-4
* Simon Marchi <[hidden email]> [2020-07-20 12:01:21 -0400]:

> On 2020-07-20 8:55 a.m., Andrew Burgess wrote:
> > Currently GDB hard-codes the use of 0 as the end of sequence marker in
> > its line tables.  After this commit GDB now uses a named
> > constant (linetable_entry::end_marker) as the end of sequence marker.
> > The value of this constant is -1, not 0.  This change was made in
> > order to aid fixing bug PR gdb/26243.
> >
> > Bug PR gdb/26243 is about allowing line number 0 to be used to
> > indicate a program address that has no corresponding source line (this
> > use is specified in the DWARF standard).  Currently GDB can't use line
> > number 0 as this line number is hard coded as the end of sequence
> > marker, but, after this commit line number 0 no longer has any special
> > meaning.
> >
> > This commit does not fix PR gdb/26243, but it is step towards allowing
> > that issue to be fixed.
> >
> > gdb/ChangeLog:
> >
> > PR gdb/26243
> > * buildsym.c (buildsym_compunit::record_line): Add an assert for
> > the incoming line number.  Update comments to not mention 0
> > specifically.  Update to check for linetable_entry::end_marker
> > rather than 0.
> > * disasm.c (do_mixed_source_and_assembly_deprecated): Check for
> > linetable_entry::end_marker not 0.
> > (do_mixed_source_and_assembly): Likewise.
> > * dwarf2/read.c (dwarf_finish_line): Pass
> > linetable_entry::end_marker not 0.
> > * mdebugread.c (add_line): Set is_stmt field.
> > * python/py-linetable.c (ltpy_get_all_source_lines): Check for
> > linetable_entry::end_marker not 0, update comments.
> > (ltpy_iternext): Likewise.
> > * record-btrace.c (btrace_find_line_range): Likewise.
> > * symmisc.c (maintenance_print_one_line_table): Likewise.
> > * symtab.c (find_pc_sect_line): Likewise.
> > (find_line_symtab): Likewise.
> > (skip_prologue_using_sal): Likewise.
> > * symtab.h (linetable_entry::end_marker): New const static member
> > variable.  Add a static assert for this field.
> > * xcoffread.c (arrange_linetable): Check for
> > linetable_entry::end_marker not 0.
>
> I think the idea is good.  But we need to make sure to audit all the places
> that use linetable_entry::line to see if they now need to treat the value 0
> specially.  One case that came to mind while reading your patch is
> "disassemble /s".  Here's an example using bug 26243's reproducer, showing
> the end of "disassemble /s tree_insert".  The instruction at 0x40135f maps
> to line 0.
>
> This is with GDB 8.3:
>
> ---
> 38            else
> 39              root->right = make_node(val);
>    0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
>    0x000000000040134d <+141>:   callq  0x401250 <make_node(int)>
>    0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
>    0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)
>
> 40          }
>    0x000000000040135a <+154>:   jmpq   0x40135f <tree_insert(node*, int)+159>
>    0x000000000040135f <+159>:   jmpq   0x401364 <tree_insert(node*, int)+164>
>
> 41      }
>    0x0000000000401364 <+164>:   add    $0x10,%rsp
>    0x0000000000401368 <+168>:   pop    %rbp
>    0x0000000000401369 <+169>:   retq
> ---
>
> This made 0x40135f appear as if it was part of the previous line.

Address 0x40135f maps to line 0, and is a non-statement line.  GDB 8.3
ignored all non-statement lines, so the address was attached to the
previous line as we see here.

>
> This is with GDB master:
>
> ---
> 38            else
> 39              root->right = make_node(val);
>    0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
>    0x000000000040134d <+141>:   call   0x401250 <_Z9make_nodei>
>    0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
>    0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)
>
> 40          }
>    0x000000000040135a <+154>:   jmp    0x40135f <_Z11tree_insertP4nodei+159>
>
> unknown:
> --- no source info for this pc ---
>    0x000000000040135f <+159>:   jmp    0x401364 <_Z11tree_insertP4nodei+164>
>
> test.cpp:
> 26      {
> 27        if (val < root->id)
> 28          {
> 29            if (root->left)
> 30              tree_insert (root->left, val);
> 31            else
> 32              root->left = make_node(val);
> 33          }
> 34        else if (val > root->id)
> 35          {
> 36            if (root->right)
> 37              tree_insert (root->right, val);
> 38            else
> 39              root->right = make_node(val);
> 40          }
> 41      }
>    0x0000000000401364 <+164>:   add    $0x10,%rsp
>    0x0000000000401368 <+168>:   pop    %rbp
>    0x0000000000401369 <+169>:   ret
> End of assembler dump.
> ---
>
> I don't know what happens here, but it doesn't look good.  The "no source info for this pc"
> part is ok, but the "unknown:" and source lines 26-40 look wrong.

The line 0 is now interpreted as an end of sequence marker.  As a
result when we look up the sal for address 0x40135f we end up with a
sal.symtab == NULL.

The 'unknown:' is telling use we changed to an unknown file, the 'no
source info for this pc' message is telling us the same thing again
(no sal.symtab).  Then we cache this nullptr symtab as the 'current'
symtab.  When on the next address we switch back to our original
symtab and line the disassembler seems to go a bit nuts and prints all
the lines we see here, clearly this is somehow related to it caching
the nullptr symtab in some way.

>
> This is with GDB master + your patch:
>
> ---
> 38            else
> 39              root->right = make_node(val);
>    0x000000000040134a <+138>:   mov    -0xc(%rbp),%edi
>    0x000000000040134d <+141>:   call   0x401250 <_Z9make_nodei>
>    0x0000000000401352 <+146>:   mov    -0x8(%rbp),%rcx
>    0x0000000000401356 <+150>:   mov    %rax,0x10(%rcx)
>
> 40          }
>    0x000000000040135a <+154>:   jmp    0x40135f <_Z11tree_insertP4nodei+159>
>
> Line number 0 out of range; test.cpp has 80 lines.
> ---
>
> The disassembly gets interrupted, because the line number is
> invalid.

So now it's sort-of, mostly working (ish).  It's understood that the
line 0 is associated with test.cpp, as you said GDB just needs to
understand what to do with this, right now it tries to pull line 0
from the file, which is clearly wrong.

>
> In this case, line 0 should probably get handled specially so that we print
> the "no source info for this pc" message and keep printing the rest correctly.
>
> So, I think we just need to be careful and look at the possible behavior changes
> of all these spots, not just blindly change `line == 0` for `line == end_marker`.

This was never meant to be a "provide full support for line == 0"
patch.  Rather, I just happened to have this patch mostly sitting
around already and when I saw it discussed in bug 26243 I figured I
should get it merged.

However, your feedback has helped me understand the mistake I made
here.  My incorrect thinking went:

  + Currently end-marker == 0 which conflicts with DWARF line == 0,
  therefore DWARF line == 0 debug will not work.

  + If we want to support DWARF line == 0 then we need to stop using
  end-marker == 0, so

  + Make end-marker == -1, the DWARF line == 0 will continue to not
  work just as it didn't work before.

  + In the future we can now make DWARF line == 0 work without the
  end-marker being in the way.

The problem with this logic is that yes, things are broken now, but
actually, they are broken in a way where things actually mostly kind
of work, while, after my change, they are still broken, but now they
mostly don't work.

So, right now we see DWARF line == 0, and create a GDB end-marker,
this is wrong, but, for example, end-markers don't have source code
associated, which, it turns out, is exactly what we want for DWARF
line == 0.  If we remove the conflict then DWARF line == 0 is no
longer special, and we start trying to access line 0 from the source
files.

So, when you say:

> So, I think we just need to be careful and look at the possible behavior changes
> of all these spots, not just blindly change `line == 0` for `line == end_marker`.

I both agree, and disagree with this statement.  I agree that you are
correct, we (I) need to think more about where special handling for
line == 0 needs to be added to GDB.

But, I disagree that there were any mistakes made in the original
patch, GDB doesn't currently support line == 0, so any place we are
currently checking line == 0 (I claim) we do mean 'line ==
end_marker'.

My proposal for now then is that we could (should?) make the change in
the patch below.  Unlike the previous patch, the _value_ for
end_marker is now left as 0, rather than changing it to -1.  Then
means that the "it just sort-of to works" behaviour for DWARF line == 0,
will continue to "just sort-of work" as before, however, I think GDB
is now saying what it means.

I already have a patch that adds in special handling for line == 0
that fixes the disassembler.  The problem is (as you pointed out)
there are likely many places in GDB where line == 0 will cause
problems, if 0 != end_marker.  This close to a release branch I don't
think we should change the end_marker value to anything but 0.

After the release we can change end_marker to -1, fix the
disassembler, and then try to find other cases where line == 0 might
cause issues.

So, how would you feel to merging the patch below?

Thanks,
Andrew

---

commit 5da1cb2497b06371aa4f1a3091d3888db35c3348
Author: Andrew Burgess <[hidden email]>
Date:   Mon Jul 20 13:10:00 2020 +0100

    gdb: Don't hard code 0 as end marker in GDB's line table
   
    Currently GDB hard-codes the use of 0 as the end of sequence marker in
    its line tables.  After this commit GDB now uses a named
    constant (linetable_entry::end_marker) as the end of sequence marker.
   
    The motivation for this change was discussion on PR gdb/26243.  This
    bug discusses the possibility of supporting DWARF's use of line number
    0 to indicate an address in the program with no associated source
    code.
   
    If we want to have line number 0 and end-of-sequence marker within
    GDB's line tables then it seems clear that the end-of-sequence marker
    can't take the value 0.  This change is a first step towards changing
    the value of the end-of-sequence marker.
   
    Initially I did change the value of the marker in this commit from 0
    to -1, however, it was pointed out during review that this causes
    problems in GDB when presented with some DWARF that includes use of
    line number 0.
   
    The problem is that currently and DWARF that uses line number 0 will
    conflict with the end-of-sequence marker, those addresses marked as
    line number 0 will (to GDB) appear as end-of-sequence markers.  It
    just so happens that in many cases treating an address (with line
    number 0) as an end-of-sequence marker is good enough, for example,
    end of sequence markers don't have associated source code.
   
    The minute we change the end-of-sequence marker value to -1 the line
    number 0 stops being special and we start to run into problems, for
    example GDB will try to access line 0 from source files.
   
    It is my proposal that we merge this commit when introduces the
    end_marker constant, and makes use of this throughout GDB, but leave
    the value of this constant as 0 so that GDB will continue to treat any
    DWARF with line number 0 as end-of-sequence markers.
   
    This commit can then be followed up with a series of work to extend
    GDB to correctly handle line number 0 in its own right.  Once we
    believe that all, or most cases needed to support line number 0 are
    being handled then the value of end_marker can be changed from 0 to -1
    in a future commit.
   
    For testing this commit I did change the value of end_marker from 0 to
    -1 and ran the regression testsuite which passed with no regressions.
    This indicates (to me) that I have correctly updated all (or the
    majority of) of the hard coded checks for 0 and replaced these with a
    check for end_marker instead.
   
    This commit does mention PR gdb/26243 in its ChangeLog, but does not
    fix this issue.  I placed this bug reference just so people can see
    the connection between this change and that bug in the future.
   
    gdb/ChangeLog:
   
            PR gdb/26243
            * buildsym.c (buildsym_compunit::record_line): Add an assert for
            the incoming line number.  Update comments to not mention 0
            specifically.  Update to check for linetable_entry::end_marker
            rather than 0.
            (buildsym_compunit::end_symtab_with_blockvector): Check for
            linetable_entry::end_marker, not 0.
            * disasm.c (line_is_less_than): Likewise.
            (do_mixed_source_and_assembly_deprecated): Likewise.
            (do_mixed_source_and_assembly): Likewise.
            * dwarf2/read.c (dwarf_finish_line): Pass
            linetable_entry::end_marker not 0.
            * mdebugread.c (add_line): Set is_stmt field.
            * python/py-linetable.c (ltpy_get_all_source_lines): Check for
            linetable_entry::end_marker not 0, update comments.
            (ltpy_iternext): Likewise.
            * record-btrace.c (btrace_find_line_range): Likewise.
            * symmisc.c (maintenance_print_one_line_table): Likewise.
            * symtab.c (find_pc_sect_line): Likewise.
            (find_line_symtab): Likewise.
            (skip_prologue_using_sal): Likewise.
            * symtab.h (linetable_entry::end_marker): New const static member
            variable.  Add a static assert for this field.
            * xcoffread.c (arrange_linetable): Check for
            linetable_entry::end_marker not 0.

diff --git a/gdb/buildsym.c b/gdb/buildsym.c
index bd0ca491401..e6d4dc117c1 100644
--- a/gdb/buildsym.c
+++ b/gdb/buildsym.c
@@ -671,6 +671,10 @@ buildsym_compunit::record_line (struct subfile *subfile, int line,
 {
   struct linetable_entry *e;
 
+  /* Is this an asset?  Or is this processing user input and so should we
+     be handling, or throwing an error for invalid data?  */
+  gdb_assert (line == linetable_entry::end_marker || line >= 0);
+
   /* Make sure line vector exists and is big enough.  */
   if (!subfile->line_vector)
     {
@@ -692,20 +696,19 @@ buildsym_compunit::record_line (struct subfile *subfile, int line,
       * sizeof (struct linetable_entry))));
     }
 
-  /* Normally, we treat lines as unsorted.  But the end of sequence
-     marker is special.  We sort line markers at the same PC by line
-     number, so end of sequence markers (which have line == 0) appear
-     first.  This is right if the marker ends the previous function,
-     and there is no padding before the next function.  But it is
-     wrong if the previous line was empty and we are now marking a
-     switch to a different subfile.  We must leave the end of sequence
-     marker at the end of this group of lines, not sort the empty line
-     to after the marker.  The easiest way to accomplish this is to
-     delete any empty lines from our table, if they are followed by
-     end of sequence markers.  All we lose is the ability to set
-     breakpoints at some lines which contain no instructions
-     anyway.  */
-  if (line == 0)
+  /* Normally, we treat lines as unsorted.  But the end of sequence marker
+     is special.  We sort line markers at the same PC by line number, so
+     end of sequence markers (which have line ==
+     linetable_entry::end_marker) appear first.  This is right if the
+     marker ends the previous function, and there is no padding before the
+     next function.  But it is wrong if the previous line was empty and we
+     are now marking a switch to a different subfile.  We must leave the
+     end of sequence marker at the end of this group of lines, not sort the
+     empty line to after the marker.  The easiest way to accomplish this is
+     to delete any empty lines from our table, if they are followed by end
+     of sequence markers.  All we lose is the ability to set breakpoints at
+     some lines which contain no instructions anyway.  */
+  if (line == linetable_entry::end_marker)
     {
       while (subfile->line_vector->nitems > 0)
  {
@@ -944,8 +947,9 @@ buildsym_compunit::end_symtab_with_blockvector (struct block *static_block,
   const linetable_entry &ln2) -> bool
       {
  if (ln1.pc == ln2.pc
-    && ((ln1.line == 0) != (ln2.line == 0)))
-  return ln1.line == 0;
+    && ((ln1.line == linetable_entry::end_marker)
+ != (ln2.line == linetable_entry::end_marker)))
+  return ln1.line == linetable_entry::end_marker;
 
  return (ln1.pc < ln2.pc);
       };
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 143ba2f59b9..cd04b170899 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -171,7 +171,8 @@ line_is_less_than (const deprecated_dis_line_entry &mle1,
 
   /* End of sequence markers have a line number of 0 but don't want to
      be sorted to the head of the list, instead sort by PC.  */
-  if (mle1.line == 0 || mle2.line == 0)
+  if (mle1.line == linetable_entry::end_marker
+      || mle2.line == linetable_entry::end_marker)
     {
       if (mle1.start_pc != mle2.start_pc)
  val = mle1.start_pc < mle2.start_pc;
@@ -346,7 +347,7 @@ do_mixed_source_and_assembly_deprecated
   struct symtab_and_line sal;
   int i;
   int out_of_order = 0;
-  int next_line = 0;
+  int next_line = linetable_entry::end_marker;
   int num_displayed = 0;
   print_source_lines_flags psl_flags = 0;
 
@@ -383,7 +384,7 @@ do_mixed_source_and_assembly_deprecated
  continue;
 
       /* Skip any end-of-function markers.  */
-      if (le[i].line == 0)
+      if (le[i].line == linetable_entry::end_marker)
  continue;
 
       mle[newlines].line = le[i].line;
@@ -425,7 +426,7 @@ do_mixed_source_and_assembly_deprecated
       /* Print out everything from next_line to the current line.  */
       if (mle[i].line >= next_line)
  {
-  if (next_line != 0)
+  if (next_line != linetable_entry::end_marker)
     {
       /* Just one line to print.  */
       if (next_line == mle[i].line)
@@ -571,7 +572,7 @@ do_mixed_source_and_assembly (struct gdbarch *gdbarch,
   gdb::optional<ui_out_emit_list> list_emitter;
 
   last_symtab = NULL;
-  last_line = 0;
+  last_line = linetable_entry::end_marker;
   pc = low;
 
   while (pc < high)
@@ -591,7 +592,7 @@ do_mixed_source_and_assembly (struct gdbarch *gdbarch,
 
   /* If this is the first line of output, check for any preceding
      lines.  */
-  if (last_line == 0
+  if (last_line == linetable_entry::end_marker
       && first_le != NULL
       && first_le->line < sal.line)
     {
@@ -604,7 +605,8 @@ do_mixed_source_and_assembly (struct gdbarch *gdbarch,
   /* Same source file as last time.  */
   if (sal.symtab != NULL)
     {
-      if (sal.line > last_line + 1 && last_line != 0)
+      if (last_line != linetable_entry::end_marker
+  && sal.line > last_line + 1)
  {
   int l;
 
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 39ed455def5..bdbecf640ff 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -20390,7 +20390,8 @@ dwarf_finish_line (struct gdbarch *gdbarch, struct subfile *subfile,
   paddress (gdbarch, address));
     }
 
-  dwarf_record_line_1 (gdbarch, subfile, 0, address, true, cu);
+  dwarf_record_line_1 (gdbarch, subfile, linetable_entry::end_marker,
+       address, true, cu);
 }
 
 void
diff --git a/gdb/mdebugread.c b/gdb/mdebugread.c
index d38372041d7..01029fbd971 100644
--- a/gdb/mdebugread.c
+++ b/gdb/mdebugread.c
@@ -4516,6 +4516,7 @@ add_line (struct linetable *lt, int lineno, CORE_ADDR adr, int last)
     return lineno;
 
   lt->item[lt->nitems].line = lineno;
+  lt->item[lt->nitems].is_stmt = 1;
   lt->item[lt->nitems++].pc = adr << 2;
   return lineno;
 }
diff --git a/gdb/python/py-linetable.c b/gdb/python/py-linetable.c
index 858313bb22d..22a44fe5f36 100644
--- a/gdb/python/py-linetable.c
+++ b/gdb/python/py-linetable.c
@@ -238,10 +238,11 @@ ltpy_get_all_source_lines (PyObject *self, PyObject *args)
     {
       item = &(SYMTAB_LINETABLE (symtab)->item[index]);
 
-      /* 0 is used to signify end of line table information.  Do not
- include in the source set. */
-      if (item->line > 0)
+      /* The special value linetable_entry::end_marker is used to signify
+ end of line table information.  Do not include in the source set.  */
+      if (item->line != linetable_entry::end_marker)
  {
+  gdb_assert (item->line >= 0);
   gdbpy_ref<> line = gdb_py_object_from_longest (item->line);
 
   if (line == NULL)
@@ -407,9 +408,9 @@ ltpy_iternext (PyObject *self)
 
   item = &(SYMTAB_LINETABLE (symtab)->item[iter_obj->current_index]);
 
-  /* Skip over internal entries such as 0.  0 signifies the end of
-     line table data and is not useful to the API user.  */
-  while (item->line < 1)
+  /* Skip over internal entries such as the end of sequence marker,
+     linetable_entry::end_marker as this is not useful to the API user.  */
+  while (item->line == linetable_entry::end_marker)
     {
       iter_obj->current_index++;
 
diff --git a/gdb/record-btrace.c b/gdb/record-btrace.c
index 718de62f280..36ae671fa90 100644
--- a/gdb/record-btrace.c
+++ b/gdb/record-btrace.c
@@ -732,7 +732,8 @@ btrace_find_line_range (CORE_ADDR pc)
  possibly adding more line numbers to the range.  At the time this
  change was made I was unsure how to test this so chose to go with
  maintaining the existing experience.  */
-      if ((lines[i].pc == pc) && (lines[i].line != 0)
+      if ((lines[i].pc == pc)
+  && (lines[i].line != linetable_entry::end_marker)
   && (lines[i].is_stmt == 1))
  range = btrace_line_range_add (range, lines[i].line);
     }
diff --git a/gdb/symmisc.c b/gdb/symmisc.c
index fc56cfa9381..9b05ca2f4da 100644
--- a/gdb/symmisc.c
+++ b/gdb/symmisc.c
@@ -1036,7 +1036,7 @@ maintenance_print_one_line_table (struct symtab *symtab, void *data)
   item = &linetable->item [i];
   ui_out_emit_tuple tuple_emitter (uiout, nullptr);
   uiout->field_signed ("index", i);
-  if (item->line > 0)
+  if (item->line != linetable_entry::end_marker)
     uiout->field_signed ("line", item->line);
   else
     uiout->field_string ("line", _("END"));
diff --git a/gdb/symtab.c b/gdb/symtab.c
index f96ad9554d9..02a08481008 100644
--- a/gdb/symtab.c
+++ b/gdb/symtab.c
@@ -3248,7 +3248,8 @@ find_pc_sect_line (CORE_ADDR pc, struct obj_section *section, int notcurrent)
          save prev if it represents the end of a function (i.e. line number
          0) instead of a real line.  */
 
-      if (prev && prev->line && (!best || prev->pc > best->pc))
+      if (prev && prev->line != linetable_entry::end_marker
+  && (!best || prev->pc > best->pc))
  {
   best = prev;
   best_symtab = iter_s;
@@ -3264,7 +3265,8 @@ find_pc_sect_line (CORE_ADDR pc, struct obj_section *section, int notcurrent)
     {
       struct linetable_entry *tmp = best;
       while (tmp > first && (tmp - 1)->pc == tmp->pc
-     && (tmp - 1)->line != 0 && !tmp->is_stmt)
+     && (tmp - 1)->line != linetable_entry::end_marker
+     && !tmp->is_stmt)
  --tmp;
       if (tmp->is_stmt)
  best = tmp;
@@ -3291,7 +3293,7 @@ find_pc_sect_line (CORE_ADDR pc, struct obj_section *section, int notcurrent)
  don't make some up.  */
       val.pc = pc;
     }
-  else if (best->line == 0)
+  else if (best->line == linetable_entry::end_marker)
     {
       /* If our best fit is in a range of PC's for which no line
  number info is available (line number is zero) then we didn't
@@ -3378,14 +3380,14 @@ find_line_symtab (struct symtab *sym_tab, int line,
          the GLOBAL_BLOCK of a symtab has a begin and end address).  */
 
       /* BEST is the smallest linenumber > LINE so far seen,
-         or 0 if none has been seen so far.
+         or the end marker if none has been seen so far.
          BEST_INDEX and BEST_LINETABLE identify the item for it.  */
       int best;
 
       if (best_index >= 0)
  best = best_linetable->item[best_index].line;
       else
- best = 0;
+ best = linetable_entry::end_marker;
 
       for (objfile *objfile : current_program_space->objfiles ())
  {
@@ -3419,7 +3421,8 @@ find_line_symtab (struct symtab *sym_tab, int line,
   best_symtab = s;
   goto done;
  }
-      if (best == 0 || l->item[ind].line < best)
+      if (best == linetable_entry::end_marker
+  || l->item[ind].line < best)
  {
   best = l->item[ind].line;
   best_index = ind;
@@ -3718,7 +3721,8 @@ skip_prologue_using_lineinfo (CORE_ADDR func_addr, struct symtab *symtab)
       /* Don't use line numbers of zero, they mark special entries in
  the table.  See the commentary on symtab.h before the
  definition of struct linetable.  */
-      if (item->line > 0 && func_start <= item->pc && item->pc < func_end)
+      if (item->line != linetable_entry::end_marker
+  && func_start <= item->pc && item->pc < func_end)
  return item->pc;
     }
 
@@ -3946,11 +3950,11 @@ skip_prologue_using_sal (struct gdbarch *gdbarch, CORE_ADDR func_addr)
   /* Skip any earlier lines, and any end-of-sequence marker
      from a previous function.  */
   while (linetable->item[idx].pc != prologue_sal.pc
- || linetable->item[idx].line == 0)
+ || linetable->item[idx].line == linetable_entry::end_marker)
     idx++;
 
   if (idx+1 < linetable->nitems
-      && linetable->item[idx+1].line != 0
+      && linetable->item[idx+1].line != linetable_entry::end_marker
       && linetable->item[idx+1].pc == start_pc)
     return start_pc;
  }
diff --git a/gdb/symtab.h b/gdb/symtab.h
index 0b186554ea1..8372cbd65c7 100644
--- a/gdb/symtab.h
+++ b/gdb/symtab.h
@@ -1306,6 +1306,37 @@ struct rust_vtable_symbol : public symbol
 
 struct linetable_entry
 {
+  /* Special value placed into the LINE field to indicate an end of
+     sequence in the line table.  The current value of zero was selected a
+     long time ago in GDB, and turns out to be special, care must be taken
+     when changing this.
+
+     The problem is that a line number of 0 in DWARF means that an address
+     has no source code associated with it.  What this means for GDB is
+     that any incoming DWARF that makes use of line number 0 will have
+     clashed with our end of line marker.
+
+     You might think we could just change this constant to something else
+     then, like -1.  Though that sounds like a good idea, and indeed is
+     the eventual goal, we can't just rush in and make this change.  The
+     reason is that in most cases end of sequence markers are something
+     that GDB ignores, for example, there's no source code associated with
+     an end of sequence marker.  It turns out that this behaviour is just
+     close enough to how line number 0 should be handled that in most
+     cases the behaviour of GDB is acceptable when using 0 as the end of
+     sequence marker.
+
+     However, if we change this marker value to -1 then suddenly 0 stops
+     being special.  Any DWARF that makes use of line number 0 is suddenly
+     going to end up causing GDB to start using 0 as though it was an
+     actual line number, for example GDB will start trying to access line
+     0 from source files.
+
+     The upshot of all this is that when END_MARKER was introduced, its
+     value has been left as zero for now.  */
+
+  static const int end_marker = 0;
+
   /* The line number for this entry.  */
   int line;
 
@@ -1316,6 +1347,14 @@ struct linetable_entry
   CORE_ADDR pc;
 };
 
+/* Normally line numbers in a program are positive integers greater than
+   zero.  The line number 0 is reserved in DWARF to indicate instructions
+   that don't have associated source code.  Currently we rely on the end
+   marker conflicting with line number 0, but hopefully this will change
+   in the future, at which point this assert should be updated.  */
+
+gdb_static_assert (linetable_entry::end_marker == 0);
+
 /* The order of entries in the linetable is significant.  They should
    be sorted by increasing values of the pc field.  If there is more than
    one entry for a given pc, then I'm not sure what should happen (and
diff --git a/gdb/xcoffread.c b/gdb/xcoffread.c
index a792c0fea2e..4c99c2da3ef 100644
--- a/gdb/xcoffread.c
+++ b/gdb/xcoffread.c
@@ -435,7 +435,7 @@ arrange_linetable (struct linetable *oldLineTb)
       if (oldLineTb->item[ii].is_stmt == 0)
  continue;
 
-      if (oldLineTb->item[ii].line == 0)
+      if (oldLineTb->item[ii].line == linetable_entry::end_marker)
  { /* Function entry found.  */
   if (function_count >= fentry_size)
     { /* Make sure you have room.  */
@@ -477,9 +477,11 @@ arrange_linetable (struct linetable *oldLineTb)
      a function begin.  */
 
   newline = 0;
-  if (oldLineTb->item[0].line != 0)
+  if (oldLineTb->item[0].line != linetable_entry::end_marker)
     for (newline = 0;
-    newline < oldLineTb->nitems && oldLineTb->item[newline].line; ++newline)
+ newline < oldLineTb->nitems
+   && oldLineTb->item[newline].line != linetable_entry::end_marker;
+ ++newline)
       newLineTb->item[newline] = oldLineTb->item[newline];
 
   /* Now copy function lines one by one.  */
@@ -498,7 +500,8 @@ arrange_linetable (struct linetable *oldLineTb)
  }
 
       for (jj = fentry[ii].line + 1;
-   jj < oldLineTb->nitems && oldLineTb->item[jj].line != 0;
+   jj < oldLineTb->nitems
+     && oldLineTb->item[jj].line != linetable_entry::end_marker;
    ++jj, ++newline)
  newLineTb->item[newline] = oldLineTb->item[jj];
     }
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] gdb: Don't hard code 0 as end marker in GDB's line table

Simon Marchi-4
On 2020-07-21 9:06 a.m., Andrew Burgess wrote:

> This was never meant to be a "provide full support for line == 0"
> patch.  Rather, I just happened to have this patch mostly sitting
> around already and when I saw it discussed in bug 26243 I figured I
> should get it merged.
>
> However, your feedback has helped me understand the mistake I made
> here.  My incorrect thinking went:
>
>   + Currently end-marker == 0 which conflicts with DWARF line == 0,
>   therefore DWARF line == 0 debug will not work.
>
>   + If we want to support DWARF line == 0 then we need to stop using
>   end-marker == 0, so
>
>   + Make end-marker == -1, the DWARF line == 0 will continue to not
>   work just as it didn't work before.
>
>   + In the future we can now make DWARF line == 0 work without the
>   end-marker being in the way.
>
> The problem with this logic is that yes, things are broken now, but
> actually, they are broken in a way where things actually mostly kind
> of work, while, after my change, they are still broken, but now they
> mostly don't work.
>
> So, right now we see DWARF line == 0, and create a GDB end-marker,
> this is wrong, but, for example, end-markers don't have source code
> associated, which, it turns out, is exactly what we want for DWARF
> line == 0.  If we remove the conflict then DWARF line == 0 is no
> longer special, and we start trying to access line 0 from the source
> files.

I agree with all of the above.

> So, when you say:
>
>> So, I think we just need to be careful and look at the possible behavior changes
>> of all these spots, not just blindly change `line == 0` for `line == end_marker`.
>
> I both agree, and disagree with this statement.  I agree that you are
> correct, we (I) need to think more about where special handling for
> line == 0 needs to be added to GDB.
>
> But, I disagree that there were any mistakes made in the original
> patch, GDB doesn't currently support line == 0, so any place we are
> currently checking line == 0 (I claim) we do mean 'line ==
> end_marker'.

You are right that all the `line == 0` do become `line == end_marker`.  I
meant that in some places, we also need to handle the new meaning of
`line == 0`.

> My proposal for now then is that we could (should?) make the change in
> the patch below.  Unlike the previous patch, the _value_ for
> end_marker is now left as 0, rather than changing it to -1.  Then
> means that the "it just sort-of to works" behaviour for DWARF line == 0,
> will continue to "just sort-of work" as before, however, I think GDB
> is now saying what it means.
>
> I already have a patch that adds in special handling for line == 0
> that fixes the disassembler.  The problem is (as you pointed out)
> there are likely many places in GDB where line == 0 will cause
> problems, if 0 != end_marker.  This close to a release branch I don't
> think we should change the end_marker value to anything but 0.
>
> After the release we can change end_marker to -1, fix the
> disassembler, and then try to find other cases where line == 0 might
> cause issues.

Agreed.

>
> So, how would you feel to merging the patch below?

I'm ambivalent.  I don't really see the point of merging this one its
own right now, versus when we do the complete work after branching.

In the mean time, I think something like what Tom de Vries proposed [1],
that filters out the line 0 entries, would be good for GDB 10.  It brings
us back to GDB 8.3 behavior, w.r.t. these line 0 entries, while keeping
the "is_stmt == false" entries that you added in 8c95582da858a.

[1] https://sourceware.org/pipermail/gdb-patches/2020-July/170506.html

> Thanks,
> Andrew
>
> ---
>
> commit 5da1cb2497b06371aa4f1a3091d3888db35c3348
> Author: Andrew Burgess <[hidden email]>
> Date:   Mon Jul 20 13:10:00 2020 +0100
>
>     gdb: Don't hard code 0 as end marker in GDB's line table
>    
>     Currently GDB hard-codes the use of 0 as the end of sequence marker in
>     its line tables.  After this commit GDB now uses a named
>     constant (linetable_entry::end_marker) as the end of sequence marker.
>    
>     The motivation for this change was discussion on PR gdb/26243.  This
>     bug discusses the possibility of supporting DWARF's use of line number
>     0 to indicate an address in the program with no associated source
>     code.
>    
>     If we want to have line number 0 and end-of-sequence marker within
>     GDB's line tables then it seems clear that the end-of-sequence marker
>     can't take the value 0.  This change is a first step towards changing
>     the value of the end-of-sequence marker.
>    
>     Initially I did change the value of the marker in this commit from 0
>     to -1, however, it was pointed out during review that this causes
>     problems in GDB when presented with some DWARF that includes use of
>     line number 0.
>    
>     The problem is that currently and DWARF that uses line number 0 will
>     conflict with the end-of-sequence marker, those addresses marked as
>     line number 0 will (to GDB) appear as end-of-sequence markers.  It
>     just so happens that in many cases treating an address (with line
>     number 0) as an end-of-sequence marker is good enough, for example,
>     end of sequence markers don't have associated source code.
>    
>     The minute we change the end-of-sequence marker value to -1 the line
>     number 0 stops being special and we start to run into problems, for
>     example GDB will try to access line 0 from source files.
>    
>     It is my proposal that we merge this commit when introduces the
>     end_marker constant, and makes use of this throughout GDB, but leave
>     the value of this constant as 0 so that GDB will continue to treat any
>     DWARF with line number 0 as end-of-sequence markers.
>    
>     This commit can then be followed up with a series of work to extend
>     GDB to correctly handle line number 0 in its own right.  Once we
>     believe that all, or most cases needed to support line number 0 are
>     being handled then the value of end_marker can be changed from 0 to -1
>     in a future commit.
>    
>     For testing this commit I did change the value of end_marker from 0 to
>     -1 and ran the regression testsuite which passed with no regressions.
>     This indicates (to me) that I have correctly updated all (or the
>     majority of) of the hard coded checks for 0 and replaced these with a
>     check for end_marker instead.

I don't understand this last part.  If you change the end_marker value to -1,
then we expect things to break with programs built with clang, like the
disassembly example I gave.  If you ran the testsuite with gcc, which presumably
never assigns an address to line 0, then you won't see a difference.  And we
probably don't have any DWARF tests right now that specifically check situations
with an address mapped to line 0.

>     This commit does mention PR gdb/26243 in its ChangeLog, but does not
>     fix this issue.  I placed this bug reference just so people can see
>     the connection between this change and that bug in the future.
>    
>     gdb/ChangeLog:
>    
>             PR gdb/26243
>             * buildsym.c (buildsym_compunit::record_line): Add an assert for
>             the incoming line number.  Update comments to not mention 0
>             specifically.  Update to check for linetable_entry::end_marker
>             rather than 0.
>             (buildsym_compunit::end_symtab_with_blockvector): Check for
>             linetable_entry::end_marker, not 0.
>             * disasm.c (line_is_less_than): Likewise.
>             (do_mixed_source_and_assembly_deprecated): Likewise.
>             (do_mixed_source_and_assembly): Likewise.
>             * dwarf2/read.c (dwarf_finish_line): Pass
>             linetable_entry::end_marker not 0.
>             * mdebugread.c (add_line): Set is_stmt field.
>             * python/py-linetable.c (ltpy_get_all_source_lines): Check for
>             linetable_entry::end_marker not 0, update comments.
>             (ltpy_iternext): Likewise.
>             * record-btrace.c (btrace_find_line_range): Likewise.
>             * symmisc.c (maintenance_print_one_line_table): Likewise.
>             * symtab.c (find_pc_sect_line): Likewise.
>             (find_line_symtab): Likewise.
>             (skip_prologue_using_sal): Likewise.
>             * symtab.h (linetable_entry::end_marker): New const static member
>             variable.  Add a static assert for this field.
>             * xcoffread.c (arrange_linetable): Check for
>             linetable_entry::end_marker not 0.
>
> diff --git a/gdb/buildsym.c b/gdb/buildsym.c
> index bd0ca491401..e6d4dc117c1 100644
> --- a/gdb/buildsym.c
> +++ b/gdb/buildsym.c
> @@ -671,6 +671,10 @@ buildsym_compunit::record_line (struct subfile *subfile, int line,
>  {
>    struct linetable_entry *e;
>  
> +  /* Is this an asset?  Or is this processing user input and so should we
> +     be handling, or throwing an error for invalid data?  */
> +  gdb_assert (line == linetable_entry::end_marker || line >= 0);

I think it's fine to assert here.  Any invalid input should be rejected earlier,
in the debug info reader.

Simon