How to get repositories by creation date?

There are various ways to get the creation date of repositories, depending the criteria we use to define creation date.

Method 1: first git commit

You can run a query on gitbase (possibly through the SQL Lab), to get the initial commit of each repository and extract its date. This is often the most accurate method, although the following should be considered:

  • The first commit in the repository could be created with an incorrect date. For example, a problem in the system date of the original author could have lead to a commit stamped with 1 January 1970 date.
  • Some git repositories are created as the result of a merge of various repositories that were originally independent. This method will show date oldest date across all the original repositories.
  • This method considers the oldest commit across all branches.
SELECT repository_id, MIN(commit_author_when) AS created_at
FROM commits
WHERE ARRAY_LENGTH(commit_parents) = 0
GROUP BY repository_id
ORDER BY 2;

Method 2: GitHub repository creation date

You can also use the following query using the metadata database. This will get the date at which the repository was created on GitHub.

SELECT full_name, created_at
FROM repositories
ORDER BY created_at;

An example

Let’s take the moby/qemu repository as an example, which is a fork of qemu/qemu:

  • Method 1 will result in 2003-02-18T22:55:36, the original creation date of qemu, which predates the Moby or Docker projects.
  • Method 2 will result in 2019-02-21T10:00:23+00:00, the date at which the Moby fork of qemu was created.

Now let’s create charts from both queries for the Moby organization. For method 1:

For method 2:

Using the first commit will always give previous dates. Even when the difference is not as big as with the qemu fork example, quite often the first commit of a repository was created locally by developers when they started the project, likely before the repository was created at GitHub.