There are various ways to get the creation date of repositories, depending the criteria we use to define creation date.
Method 1: first git commit
You can run a query on gitbase (possibly through the SQL Lab), to get the initial commit of each repository and extract its date. This is often the most accurate method, although the following should be considered:
- The first commit in the repository could be created with an incorrect date. For example, a problem in the system date of the original author could have lead to a commit stamped with 1 January 1970 date.
- Some git repositories are created as the result of a merge of various repositories that were originally independent. This method will show date oldest date across all the original repositories.
- This method considers the oldest commit across all branches.
SELECT repository_id, MIN(commit_author_when) AS created_at FROM commits WHERE ARRAY_LENGTH(commit_parents) = 0 GROUP BY repository_id ORDER BY 2;
Method 2: GitHub repository creation date
You can also use the following query using the
metadata database. This will get the date at which the repository was created on GitHub.
SELECT full_name, created_at FROM repositories ORDER BY created_at;
- Method 1 will result in
2003-02-18T22:55:36, the original creation date of qemu, which predates the Moby or Docker projects.
- Method 2 will result in
2019-02-21T10:00:23+00:00, the date at which the Moby fork of qemu was created.
Now let’s create charts from both queries for the Moby organization. For method 1:
For method 2:
Using the first commit will always give previous dates. Even when the difference is not as big as with the qemu fork example, quite often the first commit of a repository was created locally by developers when they started the project, likely before the repository was created at GitHub.