source{d} datasets, a blog series

We've always given back to the community at source{d}. Our data engineers have done an incredible job at fetching repositories from GitHub and packaging them into something portable and easily usable, so that MLonCode researchers or otherwise interested folks can avoid the nightmare of running a custom Git retrieval pipeline. I am talking about Public Git Archive (PGA), which we have already mentioned in our posts several times: 1, 2, 3. PGA was the main driver of launching src-d/datasets, the special GitHub repository to track down our emerging datasets which are potentially interesting for the externals.


This is a companion discussion topic for the original entry at https://blog.sourced.tech/post/source-d-datasets-a-blog-series/