How many packages does Debian Code Search contain?

published 2013-07-14, last modified 2019-02-04 in tag debian

The German computer magazine c't has covered Debsources in its most recent edition (c't 16/2013). In that article, they also state:

Debsources integriert auch eine Code-Suche, allerdings werden lediglich die
Quellen des Unstable-Zweigs durchsucht, der zirka ein Drittel des Quellcodes
von Debsources ausmacht.

This loosely translates to:

Debsources also integrates a code search engine, but it only searches the
sources of the unstable tree, which makes up for roughly one third of the
sources that Debsources covers.

I suspect the author of the article arrived at this conclusion because Debsources talks about 400 GiB of source code, whereas Debian Code Search talks about 130 GiB of source code.

However, it still struck me as odd and giving the wrong impression. I thought that packages don’t differ that much between stable, testing and unstable. So I fired up psql and pounded UDD until it revealed how many packages are different between stable, testing and unstable:

  • 3581 of wheezy’s packages have a newer upstream version in testing. The total amount of packages in wheezy is 17576, so this makes about 20.3%.
  • 1131 (6%) of these packages are even newer in unstable.
  • 13092 packages (74%) have identical upstream versions in wheezy, testing and unstable.

Conclusion

So, in conclusion, Debian Code Search covers 74% of wheezy, testing and unstable. Debsources also covers contrib and non-free, which explains the higher disk usage. In particular, there might be big blobs in non-free that account for a lot of disk space. Also, Debsources keeps sources around for a few weeks whereas Debian Code Search only keeps the most recent snapshot.

Query

SELECT
  COUNT(*)
FROM
  (SELECT
     migrations.source,
     sources.version AS stable_version,
     testing_version,
     unstable_version
   FROM sources
   LEFT JOIN migrations
   USING (source)
   WHERE sources.release = 'wheezy') AS x
WHERE
  regexp_replace(stable_version::text, E'-([0-9.]+)$', '') = regexp_replace(testing_version::text, E'-([0-9.]+)$', '');