Using the vcs_to_changelog.py script

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Using the vcs_to_changelog.py script

Simon Marchi
Hi binutils and gdb folks!

As you may or may not know, the glibc project has started using a script called
vcs_to_changelog.py to automatically generate their ChangeLogs.  They don't do
hand-written ChangeLog entries with their contributions.  Instead they generate
a ChangeLog file using that script when creating a release, passing it a range
of git commits for which to create ChangeLog entries.

I would very much like if we started using this in GDB, and it was suggested
that we could try to sync with binutils, as you might want to do the same.

Here's how it could work in practice:

1. We update our gnulib import to import the vcs_to_changelog.py script (it is
   distributed as a gnulib module).
2. We update src-release.sh to call the script and generate a single top-level
   ChangeLog that is included in the release tarball.

If we use the script as it is today, the ChangeLog generated for a binutils
release will contain entries for commits that don't concern binutils (for
example, that just touch a file under gdb/) and vice-versa.  I don't think
that's really a problem, since all the required information will be there,
there will just be additional information (which I doubt anybody will care
about).  But if really needed, I'm sure the script could be modified to filter
down the commits that touch only what's included in the binutils or GDB
release (I would prefer to avoid doing this unless absolutely necessary).

For illustrative purposes, here's what the script outputs for the last bunch
of commits in binutils-gdb:

  http://paste.ubuntu.com/p/x38zs82Rmt/

That's it, please let me know what you think.

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Alan Modra-3
On Wed, Feb 12, 2020 at 06:32:51PM -0500, Simon Marchi wrote:
> For illustrative purposes, here's what the script outputs for the last bunch
> of commits in binutils-gdb:
>
>   http://paste.ubuntu.com/p/x38zs82Rmt/

Not fit for the original purpose of change logs as far as developers
are concerned, but I for one don't use them at all nowadays to see
what changed when.  "git log" and "git blame" are far better.

Does this satisfy the FSF legal requirements?  "Who changed what"
won't be accurate unless committers remember to set the author
properly on commits made for other people.

--
Alan Modra
Australia Development Lab, IBM
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Jeff Law
On Thu, 2020-02-13 at 11:33 +1030, Alan Modra wrote:
> On Wed, Feb 12, 2020 at 06:32:51PM -0500, Simon Marchi wrote:
> > For illustrative purposes, here's what the script outputs for the last bunch
> > of commits in binutils-gdb:
> >
> >   http://paste.ubuntu.com/p/x38zs82Rmt/
>
> Not fit for the original purpose of change logs as far as developers
> are concerned, but I for one don't use them at all nowadays to see
> what changed when.  "git log" and "git blame" are far better.
I'm in a similar position.  For years ChangeLogs were my go-to for
initial archaeology, but I find myself using git log and git blame most
of the time now.

>
> Does this satisfy the FSF legal requirements?  "Who changed what"
> won't be accurate unless committers remember to set the author
> properly on commits made for other people.
RMS and/or the FSF blessed it for glibc a while back.  So it'd seem
suitable for other projects under the FSF umbrella.  I'm hoping we'll
make the same change for GCC, but there's some inertia to push through
:(

jeff

Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Simon Marchi
> On Thu, 2020-02-13 at 11:33 +1030, Alan Modra wrote:
>> Does this satisfy the FSF legal requirements?  "Who changed what"
>> won't be accurate unless committers remember to set the author
>> properly on commits made for other people.

This naturally happens when using the typical git workflow.  If someone sends
a patch using git-send-email and somebody else applies it using git-am, the
authorship information reflects the original author's identity.

We should always encourage people to send patches using git-send-email (or
worst case, git-format-patch) and not plain diffs, for this reason and many
others.

On 2020-02-12 9:29 p.m., Jeff Law wrote:
> RMS and/or the FSF blessed it for glibc a while back.  So it'd seem
> suitable for other projects under the FSF umbrella.  I'm hoping we'll
> make the same change for GCC, but there's some inertia to push through
> :(

That is exactly what I presumed and what I am hoping for.

But the proposed changes to standards.texi do specifically mention that using
this script is fine (the script name changed, but I am fairly sure it's
referring to that script):

https://lists.gnu.org/archive/html/bug-standards/2020-01/msg00000.html

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Eli Zaretskii
In reply to this post by Simon Marchi
> From: Simon Marchi <[hidden email]>
> Date: Wed, 12 Feb 2020 18:32:51 -0500
>
> As you may or may not know, the glibc project has started using a script called
> vcs_to_changelog.py to automatically generate their ChangeLogs.  They don't do
> hand-written ChangeLog entries with their contributions.  Instead they generate
> a ChangeLog file using that script when creating a release, passing it a range
> of git commits for which to create ChangeLog entries.
>
> I would very much like if we started using this in GDB, and it was suggested
> that we could try to sync with binutils, as you might want to do the same.
>
> Here's how it could work in practice:
>
> 1. We update our gnulib import to import the vcs_to_changelog.py script (it is
>    distributed as a gnulib module).
> 2. We update src-release.sh to call the script and generate a single top-level
>    ChangeLog that is included in the release tarball.

Several things we'd need to consider if we go this way:

 . AFAIK, the script you mention currently supports only C sources;
   we'd need to see how well it supports C++
 . Some files in our tree are neither C++ nor C: there are Python
   files, Guile files, shell scripts, and Texinfo files, to mention
   just a few: what to do about them?
 . Last, but not least: we'd need detailed instructions for how to
   produce the commit log messages under this regime, because the old
   conventions will not be valid anymore.
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Eli Zaretskii
> Cc: [hidden email], [hidden email]
> From: Simon Marchi <[hidden email]>
> Date: Thu, 13 Feb 2020 09:19:28 -0500
>
> >  . Some files in our tree are neither C++ nor C: there are Python
> >    files, Guile files, shell scripts, and Texinfo files, to mention
> >    just a few: what to do about them?
>
> I presume glibc is in the same situation... The script just says that
> the file is modified.

So for those other languages, the author of the changeset will have to
write the entries by hand?  Or are tyou suggesting omitting them
altogether and staying with file names alone?  (The latter would be
somewhat boring for the GDB manual, since almost all of it is in a
single file...)

> If we wanted to get more details for them, we would have to
> implement new parsers to figure out what changed between two
> revisions.

Is that practical?

> >  . Last, but not least: we'd need detailed instructions for how to
> >    produce the commit log messages under this regime, because the old
> >    conventions will not be valid anymore.
>
> What are you specifically referring to?  For GDB, I don't really see what
> would need to change.  We already enforce having detailed commit logs that
> explain:
>
> - the rationale for the change
> - technical details about the fix, if not trivial
>
> So the only difference is that we would not provide a manual ChangeLog entry.

And replace that part with nothing?  I thought something will have to
be done instead, to have the log messages be as informative as they
are now.
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Simon Marchi
On 2020-02-13 10:42 a.m., Eli Zaretskii wrote:

>> Cc: [hidden email], [hidden email]
>> From: Simon Marchi <[hidden email]>
>> Date: Thu, 13 Feb 2020 09:19:28 -0500
>>
>>>  . Some files in our tree are neither C++ nor C: there are Python
>>>    files, Guile files, shell scripts, and Texinfo files, to mention
>>>    just a few: what to do about them?
>>
>> I presume glibc is in the same situation... The script just says that
>> the file is modified.
>
> So for those other languages, the author of the changeset will have to
> write the entries by hand?  Or are tyou suggesting omitting them
> altogether and staying with file names alone?  (The latter would be
> somewhat boring for the GDB manual, since almost all of it is in a
> single file...)

I am not a user of ChangeLog files myself (other than writing entries for
them), so I would be fine if the entry only mentioned the file name.  But
perhaps others will not, I can't tell.

I don't think it would be practical if some entries were written by hand
and others were generated.  First, I think it would make the contribution
process confusing, but also I don't know how it would work process-wise.

>> If we wanted to get more details for them, we would have to
>> implement new parsers to figure out what changed between two
>> revisions.
>
> Is that practical?

I presume that it's much harder to make an ad-hoc parser for C and for a
texinfo file.  So if it has been done for C, it should be do-able to texinfo.

>> So the only difference is that we would not provide a manual ChangeLog entry.
>
> And replace that part with nothing?  I thought something will have to
> be done instead, to have the log messages be as informative as they
> are now.

My understanding from:

  - past dicussions (the big one on bug-
  - the proposed change to standards.texi that you sent

is that the ChangeLog entry is redundant enough with the combination of:

  - a good commit message that explains why the change is done
  - the diff of the change

which are both kept by the VCS.  And for that reason, it's fine not to keep
a ChangeLog file in the VCS.  However, for the benefit of people just using
tarballs, and not the VCS, we generate a ChangeLog file from the diff.
Naturally, the generated ChangeLog will be less informative than one written
by humans (it won't say what changed in a function, it will just say that
the function has been modified), but since that procedure was adopted by glibc,
and is mentioned in the proposed standards.texi change, then it must have been
considered an acceptable compromise.

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Eli Zaretskii
> Cc: [hidden email], [hidden email]
> From: Simon Marchi <[hidden email]>
> Date: Thu, 13 Feb 2020 11:26:11 -0500
>
> My understanding from:
>
>   - past dicussions (the big one on bug-
>   - the proposed change to standards.texi that you sent
>
> is that the ChangeLog entry is redundant enough with the combination of:
>
>   - a good commit message that explains why the change is done
>   - the diff of the change
>
> which are both kept by the VCS.  And for that reason, it's fine not to keep
> a ChangeLog file in the VCS.

Yes, but:

  1) the proposed change was not yet approved as part of the official
     policy

  2) we need some guidelines for "good commit messages", otherwise
     patch review will need to pay a lot of attention to discussing
     that and making sure the log messages are fine

> However, for the benefit of people just using
> tarballs, and not the VCS, we generate a ChangeLog file from the diff.
> Naturally, the generated ChangeLog will be less informative than one written
> by humans (it won't say what changed in a function, it will just say that
> the function has been modified), but since that procedure was adopted by glibc,
> and is mentioned in the proposed standards.texi change, then it must have been
> considered an acceptable compromise.

I have yet to see this accepted as GNU policy.  And at least
personally, having a ChangeLog in a tarball that just says which
function was changed on what date is almost useless to me (and I do
sometimes need to work without access to the VCS repositories).
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Philippe Waroquiers
On Thu, 2020-02-13 at 20:58 +0200, Eli Zaretskii wrote:
> I have yet to see this accepted as GNU policy.  And at least
> personally, having a ChangeLog in a tarball that just says which
> function was changed on what date is almost useless to me (and I do
> sometimes need to work without access to the VCS repositories).
It looks easy to have a script to generate a file that contains all
the git log messages + the diff for each commit
in case detailed changes have to be analysed off-line, without a git repository.

Philippe
 

Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Simon Marchi
In reply to this post by Eli Zaretskii
On 2020-02-13 1:58 p.m., Eli Zaretskii wrote:
>   2) we need some guidelines for "good commit messages", otherwise
>      patch review will need to pay a lot of attention to discussing
>      that and making sure the log messages are fine

We can write some guidelines for sure, it wouldn't hurt.  But I think that as a
project, we have already some quite good standards in terms of commit messages.
These discussions already happen during reviews.  And even with those guidelines
written, we'll still need to pay attention to it, because I can assure you that
we will still receive patches with bad or non-existent commit messages.

>> However, for the benefit of people just using
>> tarballs, and not the VCS, we generate a ChangeLog file from the diff.
>> Naturally, the generated ChangeLog will be less informative than one written
>> by humans (it won't say what changed in a function, it will just say that
>> the function has been modified), but since that procedure was adopted by glibc,
>> and is mentioned in the proposed standards.texi change, then it must have been
>> considered an acceptable compromise.
>
> I have yet to see this accepted as GNU policy.

Sure, we can wait for this to become official.

>
And at least
> personally, having a ChangeLog in a tarball that just says which
> function was changed on what date is almost useless to me (and I do
> sometimes need to work without access to the VCS repositories).

Indeed.

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Eli Zaretskii
> Cc: [hidden email], [hidden email]
> From: Simon Marchi <[hidden email]>
> Date: Thu, 13 Feb 2020 16:07:14 -0500
>
> On 2020-02-13 1:58 p.m., Eli Zaretskii wrote:
> >   2) we need some guidelines for "good commit messages", otherwise
> >      patch review will need to pay a lot of attention to discussing
> >      that and making sure the log messages are fine
>
> We can write some guidelines for sure, it wouldn't hurt.  But I think that as a
> project, we have already some quite good standards in terms of commit messages.

AFAIU, our current standards assume the ChangeLog-formatted entry is
part of the log message which describes the individual changes.  If
that is removed, we may wish to modify our standards to make up for
the loss.

E.g., compare the 2 sample log messages below.  The first one will
probably be quite incomplete if the ChangeLog part is removed, while
the second will probably not suffer too much.  So we may wish to make
sure log messages like the first one are augmented by additional
information.

  commit 66182876b46d40163e81504f7fa4f206268cb83c
  Author:     Eli Zaretskii <[hidden email]>
  AuthorDate: Mon Jan 6 21:54:21 2020 +0200
  Commit:     Eli Zaretskii <[hidden email]>
  CommitDate: Mon Jan 6 21:54:21 2020 +0200

      Fix MinGW native compilation of gdb/gdbsupport/gdb_wait.c

      gdb/ChangeLog
      2020-01-06  Eli Zaretskii  <[hidden email]>

              * gdbsupport/gdb_wait.c: Include <signal.h> instead of
              gdb/signals.h, as we are now using native signal symbols.

  commit cbfa85811792ca8e96ace40bef0aaaf507e54bcc
  Author:     Shahab Vahedi <[hidden email]>
  AuthorDate: Mon Jan 6 15:27:32 2020 +0100
  Commit:     Pedro Alves <[hidden email]>
  CommitDate: Mon Jan 6 19:47:20 2020 +0000

      GDB: Fix the overflow in addr/line_is_displayed()

      In tui_disasm_window::addr_is_displayed(), there can be situations
      where "content" is empty. For instance, it can happen when the
      "content" was not filled in tui_disasm_window::set_contents(),
      because tui_disassemble() threw an exception. Usually this exception
      is the result of fetching invalid PC addresses like the ones beyond
      the end of the program.

      Having "content.size ()" zero leads to an overflow in this condition
      check inside tui_disasm_window::addr_is_displayed():

        int i = 0;
        while (i < content.size () - threshold ...) {
          ... content[i] ...
        }

      "threshold" is 2 and there are times that "content.size ()" is 0.
      This results into an overflow and the loop is entered whereas it
      should have been skipped. Finally, "content[i]" access leads to
      a segmentation fault.

      Same problem applies to tui_source_window::line_is_displayed().

      The issue has been discussed at length in bug 25345:
        https://sourceware.org/bugzilla/show_bug.cgi?id=25345

      This commit avoids the segmentation faults with an early check:

        if (content.size () < SCROLL_THRESHOLD)
          return false;

      Moreover, those functions have been overhauled to a leaner code.

      gdb/ChangeLog:
      2020-01-06  Shahab Vahedi  <[hidden email]>

              * tui/tui-disasm.c (tui_disasm_window::addr_is_displayed): Avoid
              overflow by an early check of content vs threshold.
              * tui/tui-source.c (tui_source_window::line_is_displayed):
              Likewise.

Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Simon Marchi
On 2020-02-14 4:45 a.m., Eli Zaretskii wrote:

>> Cc: [hidden email], [hidden email]
>> From: Simon Marchi <[hidden email]>
>> Date: Thu, 13 Feb 2020 16:07:14 -0500
>>
>> On 2020-02-13 1:58 p.m., Eli Zaretskii wrote:
>>>   2) we need some guidelines for "good commit messages", otherwise
>>>      patch review will need to pay a lot of attention to discussing
>>>      that and making sure the log messages are fine
>>
>> We can write some guidelines for sure, it wouldn't hurt.  But I think that as a
>> project, we have already some quite good standards in terms of commit messages.
>
> AFAIU, our current standards assume the ChangeLog-formatted entry is
> part of the log message which describes the individual changes.  If
> that is removed, we may wish to modify our standards to make up for
> the loss.

Do you know where that is written?

>
> E.g., compare the 2 sample log messages below.  The first one will
> probably be quite incomplete if the ChangeLog part is removed, while
> the second will probably not suffer too much.  So we may wish to make
> sure log messages like the first one are augmented by additional
> information.
>
>   commit 66182876b46d40163e81504f7fa4f206268cb83c
>   Author:     Eli Zaretskii <[hidden email]>
>   AuthorDate: Mon Jan 6 21:54:21 2020 +0200
>   Commit:     Eli Zaretskii <[hidden email]>
>   CommitDate: Mon Jan 6 21:54:21 2020 +0200
>
>       Fix MinGW native compilation of gdb/gdbsupport/gdb_wait.c
>
>       gdb/ChangeLog
>       2020-01-06  Eli Zaretskii  <[hidden email]>
>
>      * gdbsupport/gdb_wait.c: Include <signal.h> instead of
>      gdb/signals.h, as we are now using native signal symbols.

Well, you would essentially just say the same thing, just not in "ChangeLog
entry" format.  I'm not sure what's the problem here.

Note that if I were to review this patch, I would probably ask for a bit more
context in the commit log though (on top of what you already say in the
ChangeLog entry).  I'm sure there was a big discussion that lead to this change,
so from your point of view, this change probably seemed obvious.  But as
somebody lacking the relevant context, I can't really tell why including
gdb/signals.h was wrong and why including signal.h is better.

I would therefore suggest adding:

- What's the problem you're trying to fix (compilation error?  if so please
  paste it in the commit log?)
- Why is this the right way to fix it?

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Simon Marchi
On 2020-02-14 3:15 p.m., Eli Zaretskii wrote:

> So I think it would be good to have some guidelines about that.
>
>> I would therefore suggest adding:
>>
>> - What's the problem you're trying to fix (compilation error?  if so please
>>   paste it in the commit log?)
>> - Why is this the right way to fix it?
>
> Something like that, yes.  I'm saying that we should have this on the
> wiki.

I think it's already quite well addressed by this section in the contribution
checklist:

https://sourceware.org/gdb/wiki/ContributionChecklist#Detailed_Explanation_of_the_Patch

Simon
Reply | Threaded
Open this post in threaded view
|

Re: Using the vcs_to_changelog.py script

Eli Zaretskii
> Cc: [hidden email], [hidden email]
> From: Simon Marchi <[hidden email]>
> Date: Fri, 14 Feb 2020 16:07:59 -0500
>
> >> - What's the problem you're trying to fix (compilation error?  if so please
> >>   paste it in the commit log?)
> >> - Why is this the right way to fix it?
> >
> > Something like that, yes.  I'm saying that we should have this on the
> > wiki.
>
> I think it's already quite well addressed by this section in the contribution
> checklist:
>
> https://sourceware.org/gdb/wiki/ContributionChecklist#Detailed_Explanation_of_the_Patch

That says the ChangeLog must be part of the patch, and the text
doesn't specifically address the lack of detailed information about
what was changed where.  So I think the text may need to be amended to
address the issue at hand, especially since most veteran contributors
came to rely on the ChangeLog-formatted part.