GitHub: Finding changes since last release


Time for release!

It’s an exciting time. Or, it should be. I want to do a release of fpm! However, there’s some preparation I need to do before I can publish.

I need to make sure the changelog file is updated and accurate, and that means I need to know all of the changes made since the previous release. I don’t mean the code changes, though. I mean what behaviors were changed? Who contributed to the change? Is a given change related to one or more open issues?

Doing it by hand

For years, I’ve managed the fpm changelog by hand. I would use git log to show me all git commits from the previous release until now. For today, I would use git log v1.14.2..main to show me all git commits between the v1.14.2 release and main.

Now, a “git commit” isn’t necessarily a meaningful unit of change. A pull request on GitHub may contain many commits! Further, a single git commit often has no reference to the issue it may have been solving, nor do they have a reference to the pull request that they are a part of.

So, I slowly travel the git history and take notes. After that’s done, I will cross-reference with GitHub issues and PRs (pull requests) to improve those notes. In any given change, I will try to document a few things:

The “who” is important to me, and my definition of “contribute” is different than what GitHub uses. This will become important in the next section.

To me, a contributor is anyone who participated in the issue or PR.

Let GitHub generate it?

In 2021, GitHub announced a way to automatically generate release notes. From the GitHub blog:

Throughout the year, we spoke with maintainers about what they wanted to see improved in the release process. Without hesitation, every maintainer said the same thing, “Release notes.”

Also? Without hesistation, I would say “release notes” is high on my list of things I’d love help with.

GitHub has documentation about automating release notes if you’d like to learn more.

The defaults seem pretty reasonable with two major sections, “What’s changed” and “New contributors”. The “What’s changed” seems to be all PRs merged into the release branch (“main” in my case) since the previous release which, by default, GitHub uses the “previous tag” which in most cases is likely the correct choice. “New contributors” is a nice way to welcome and highlight first-timer contributors.

An example “What’s changed” looks like this:

* Call ERB.new correctly depending on the RUBY_VERSION by @jordansissel in https://github.com/jordansissel/fpm/pull/1897

This format seems to be, roughly: The PR title, the person who opened the PR, and a link to the PR. In cases of single-commit PRs, the PR title defaults to the first line of the git commit message.

Unfortunately, this doesn’t make for a good changelog entry. For tools like fpm, the changelog should focus on the users, not developers. Developer-centric language is quite common in places developers might visit – like git commit messages.

A significant part of fpm’s release notes process is to represent every change in a language familiar and useful to the target audience, fpm’s users. In many cases, individual changelog entries are quite long. Far longer than the you’d see in a git commit message’s first line, in many cases, because Git and its ecosystem put strong pressure on having the first line of the commit be treated similar to an email subject line – it should be very short.

This means that if we use GitHub’s default release notes, I would have to edit every PR to replace the title with a user-centric language describing the change, impact, etc, which probably wouldn’t even fit in a PR title.

Futher, GitHub’s automatic release notes only credits the PR author, and, in a way, erases the contributions of everyone else who helped make that change happen.

Lastly, I try to include links to all related issues, not just the PR, and GitHub doesn’t link to any of these in the generated notes.

I did like that automatic release notes can be configured to generate sections (like breaking changes) depending on your PR labelling

GitHub’s close to something that works really well, but where it misses, for me:

What next?

With the idea that I might have to roll my own, or at least, I’d like to explore what rolling my own release notes tooling might cost me (in time and energy)… I’ve written a script that uses the GitHub API to fetch all pull requests merged since the previous release.

It’s in ruby and uses the octokit library. To start, I wanted to know all pull requests merged since the last release.

I wasn’t able to figure out how to query a tag in GitHub’s api. I was specifically looking for the date related to the tag (that is, the commit pointed to by the tag, what date it was committed). GitHub has a “get tags” API but you have to provide a git sha which isn’t a predictable thing like a version number.

  1. First step is local to my git repo. I need the timestamp of the git tag’s commit. I do this with the git command line:
% git show -s --pretty=%cI v1.14.2
2022-03-31T00:09:25-07:00

In ruby, we can use the Time class to parse this format and get us a Time object.

after_date = Time.parse("2022-03-31T00:09:25-07:00")
  1. First, we need to setup the octokit client.
client = Octokit::Client.new
client.access_token = ... # your github auth token
client.login

Note: Octokit has a feature called “autopaginate” where it will fetch all records for a given query. In this scenario, I do not enable auto-pagination because I don’t need all pull requests, just the most recent ones matching my criteria – ones merged since the last release. Running with auto-pagination is much slower especially when I don’t need all that data.

  1. Now we can list pull requests

List all pull requests which are closed; sorted by “updated” date, descending.

result = client.pulls(repository, state: "closed", sort: "updated", direc    tion: "desc")

The result is an array of a weird class called “Sawyer::Resource”. We can treat it as a hash. The actual content of this result seems undocumented by octokit, but we can read the GitHub API docs for the API call for listing pull requests which GitHub calls, slightly confusingly, the “pulls” api

  1. Only show merged pull requests

GitHub “state” for a pull request is only “closed” or “open”, there is no “merged” state, unfortunately. However, we’re in luck that the pull request object has a “merged_at” field which contains an ISO8601 timestamp for when it was merged.


done = false
while !done
  result.each do |pr|
    # Only look at merged PRs
    next unless pr.merged_at.to_i > 0

    if pr.merged_at > after_date
      puts "#{pr.html_url}    # #{pr.merged_at}"
    else
      done = true
      break
    end
  end

  break if done

  result = client.get(client.last_response.rels[:next].href)
end

It’s not fancy, but it’ll do. The rough steps were:

  1. Run it!
https://github.com/jordansissel/fpm/pull/1909    # 2022-11-10 06:43:35 UTC
https://github.com/jordansissel/fpm/pull/1946    # 2022-10-26 21:48:13 UTC
https://github.com/jordansissel/fpm/pull/1950    # 2022-11-03 04:49:17 UTC
https://github.com/jordansissel/fpm/pull/1948    # 2022-10-31 22:17:48 UTC
... [ output shortened for brevity ] ...

This printed out a url and timestamp for each PR. Nice.

Next steps

At a minimum, this exploration helped me bypass the git log ... process which I very much did not enjoy. Using either GitHub’s generator or my script, I now have a list of all relevant PRs for the project release.

I’ll probably continue to do the changelog by hand, but at least it’ll be less work! In the meantime, I’ll look around for more tools which could help me automate some of this.