How to get source code from an archive?

Hello,

I succeeded in installing and using bblfsh on short source file.
I also installed pga and go-siva in order to make run bblfsh on a large set of code.

I can select url and language to get archives files from some projects with commands like this:
pga get --lang=Java --url=https://github.com/facebookarchive/tsdash --output tsdashPrjSivaFiles but I cannot manage with siva to extract files from archives. I tried command like this one: siva unpack tsdashPrjSivaFiles/siva/latest/e1/.siva
I got files config, HEAD and directories objects and refs, but I don’t know where to read source files to run bblfsh

Thankyou for your help

Hello @mesnardo ! Thanks for your question. @michael @dennwc should be able to answer that one. Our of curiosity have you tried source{d} Community Edition instead https://go.sourced.tech/community-edition-download ?

I think @jfontan should explain the process of extracting Siva files a bit better than I would :wink:

But as far as I know, Siva files contain a (rooted) bare Git repository. So you may need to use go-git to access individual commits and files inside the archive.

@mesnardo, as @dennwc says sivas contain bare repositories. If you only want to extract files from a couple of siva files you can use the standard git command to convert it to a normal git repo:

$ siva unpack file.siva bare
$ git clone bare worktree # ignore warnings
$ cd worktree
# find the reference you are interested in, most probably a HEAD
$ git branch -a
   remotes/origin/HEAD/0169e931-7107-0dd5-b783-f4e91b6ccdc5
   remotes/origin/master/0169e931-7107-0dd5-b783-f4e91b6ccdc5
$ git checkout remotes/origin/HEAD/0169e931-7107-0dd5-b783-f4e91b6ccdc5 # change to the interesting reference

Now you’ll have the files for that reference. Note that a single siva file can contain several repositories. The UUID at the end of each reference identifies the repository it belongs to. To get a mapping of IDs and repositories you can query the original bare repo:

$ cd bare
$ git remote -v
0169e931-7107-0dd5-b783-f4e91b6ccdc5	git://github.com/003random/003Recon.git (fetch)
0169e931-7107-0dd5-b783-f4e91b6ccdc5	git://github.com/003random/003Recon.git (push)

If you want to extract more than a handful of repositories then it’s better that you automate it in some way. The fastest route is using a go program and the library go-borges. Make sure that you use the package legacysiva. Here is an example on how to open all sivas from a directory and find HEAD references (most probably what you want). You’ll have to modify it to set Bucket to 2, iterate over all files from each reference and call bblfsh with its contents:

Thanks a lot @jfontan
I made it run with git command and with borges and legacy siva for a set of siva files.
Olivier Mesnard