Datalad and gitolite

acnewton · March 14, 2019, 9:54am

Hello experts,

I am very interested in Datalad for supporting research data management in our HPC center in the Netherlands, for instance using Datalad in our supercomputer (with job-queuing done with slurm) or our cloud services.

As a showcase, I am trying it out with a small Gitolite server which is working with git-annex, just for testing purposes. I can create a bare repository from my local machine (git clone git@gitoliteserver:repo) and initialize with ‘git-annex init’, add data to the annex and sync data to this server etc.
How would you be able to initialize an empty repository in this example directly with Datalad? It is not clear to me how to do this with ‘datalad install git@myserver:repo’ or ‘datalad create-sibling git@myserver’ etc?

I would be grateful for some more guidance on how to do this. Thanks.

Arthur Newton

yarikoptic · March 15, 2019, 8:42pm

that should be the one to use – how it fails for you? Seems to work for me:

$> datalad create ds
[INFO   ] Creating a new annex repo at /tmp/ds 
create(ok): /tmp/ds (dataset)

$> cd ds

$> datalad create-sibling $USER@localhost:/tmp/sibling 
[INFO   ] Connecting ...                              
[INFO   ] Considering to create a target dataset /tmp/ds at /tmp/sibling of localhost 
[INFO   ] Fetching updates for <Dataset path=/tmp/ds> 
.: localhost(+) [yoh@localhost:/tmp/sibling (git)]                                                          
[INFO   ] Adjusting remote git configuration 
[INFO   ] Enabling git post-update hook ... 
[INFO   ] Running post-update hooks in all created siblings 
create_sibling(ok): . (dataset)

but what datalad doesn’t work with is bare repositories (see/comment/etc on https://github.com/datalad/datalad/issues/2820), i.e. datalad commands would not work within bare repo. But if a remote (sibling) is a bare annex repo, it should (I don’t think I ever tested) work.
I would say you might not actually want bare repos anyways, since then you could provide your users with a view of the filetree so they could just use those datasets even without cloning etc (if they don’t care about them being a specific version)

acnewton · March 20, 2019, 11:55am

Thanks for your reply. From the " A typical collaborative data management workflow" section of the docs, I now tried this workflow with a simple SSH remote on a unix system. This works fine!

However, the same workflow does not work for me with a Gitolite server. I was curious how Datalad would work with such an access control layer, but maybe it is not as simple.

I tried the following:
I create a dataset, add some data, and then try to create the sibling to a ‘remote’ repository (e.g. Gitolite, Gogs, GIN etc.).

$ datalad create testdl     
[INFO   ] Creating a new annex repo at /dataladtest/testdl 
create(ok): /dataladtest/testdl (dataset)

$ cd testdl/

$ datalad run cp ~/Downloads/picture.jpeg .
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 
add(ok): picture.jpeg (file)
save(ok): /dataladtest/testdl (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

$ datalad create-sibling -s gin git@localhost:/tesuser/testdl    
[INFO   ] Connecting ... 
[ERROR  ] git-annex is missing. on the remote system [create_sibling.py:__call__:580] (MissingExternalDependency)

Git-annex-shell is installed, but I am probably missing something…
Of course, I could use the SSH remote or the create-sibling-github which both work fine. Thanks again.

effigies · March 21, 2019, 7:42pm

Gitolite doesn’t make it trivial to create new repositories. You need to create them using the gitolite-admin repository.