pristine-tar considered harmful

published 2018-01-28, last modified 2019-02-04 in tag debian

If you want to follow along at home, clone this repository:

% GBP_CONF_FILES=:debian/gbp.conf gbp clone https://anonscm.debian.org/git/pkg-go/packages/golang-github-go-macaron-inject.git

Now, in the golang-github-go-macaron-inject directory, I’m aware of three ways to obtain an orig tarball (please correct me if there are more):

  1. Run gbp buildpackage, creating an orig tarball from git (upstream/0.0_git20160627.0.d8a0b86)
    The resulting sha1sum is d085a04b7b35856be24f8cc4a9a6d9799cdb59b4.
  2. Run pristine-tar checkout
    The resulting sha1sum is d51575c0b00db5fe2bbf8eea65bc7c4f767ee954.
  3. Run origtargz
    The resulting sha1sum is d51575c0b00db5fe2bbf8eea65bc7c4f767ee954.

Have a look at the archive’s golang-github-go-macaron-inject_0.0~git20160627.0.d8a0b86-2.dsc, however: the file entry orig tarball reads:

f5d5941c7b77e8941498910b64542f3db6daa3c2 7688 golang-github-go-macaron-inject_0.0~git20160627.0.d8a0b86.orig.tar.xz

So, why did we get a different tarball? Let’s go through the methods:

  1. The uploader must not have used gbp buildpackage to create their tarball. Perhaps they imported from a tarball created by dh-make-golang, or created manually, and then left that tarball in place (which is a perfectly fine, normal workflow).
  2. I’m not entirely sure why pristine-tar resulted in a different tarball than what’s in the archive. I think the most likely theory is that the uploader had to go back and modify the tarball, but forgot to update (or made a mistake while updating) the pristine-tar branch.
  3. origtargz, when it detects pristine-tar data, uses pristine-tar, hence the same tarball as ②.

Had we not used pristine-tar for this repository at all, origtargz would have pulled the correct tarball from the archive.

The above anecdote illustrates the fragility of the pristine-tar approach. In my experience from the pkg-go team, when the pristine-tar branch doesn’t contain outright incorrect data, it is often outdated. Even when everything is working correctly, a number of packagers are disgruntled about the extra work/mental complexity.

In the pkg-go team, we have (independently of this specific anecdote) collectively decided to have the upstream branch track the upstream remote’s master (or similar) branch directly, and get rid of pristine-tar in our repositories. This should result in method ① and ③ working correctly.

In conclusion, my recommendation for any repository is: don’t bother with pristine-tar. Instead, configure origtargz as a git-buildpackage postclone hook in your ~/.gbp.conf to always work with archive orig tarballs:

[clone]
# Ensure the correct orig tarball is present.
postclone=origtargz

[buildpackage]
# Pick up the orig tarballs created by the origtargz postclone hook.
tarball-dir = ..