Datalad throwback: doing as I'm told

datalad

#1

Still here paying early-adopter tax for datalad.

I tried creating a bare repository with datalad today and got this error:

datalad create -D "Central Repo (origin) Project" -d . --shared-access group --git-opts bare project
[WARNING] `git_opts` argument is presently ignored, please complain!
[ERROR  ] 'str' object does not support item assignment [create.py:__call__:291] (TypeError)

I am doing as I am told and am here complaining about this. It would be very useful to be able to create bare repositories and using datalad in general as a singular wrapper for git annex and git.

I can potentially contribute to this project if pointed in the right direction…


#2

wow – what a shame we have missed this report! Sorry about that.
Overall summary: do not try to create/manipulate bare git(-annex) repositories with datalad - it is not really used/supported ATM. Having said that - It should be perfectly fine to use bare repositories as the remotes!

I personally

  • underused (or not used at all) bare annex repos… If I need a central exchange repo, I just create a datalad sibling repo (non-bare), which is then configured to progress forward with the pushes. This way I could see/use the files in that repository if needed
  • on clones where users have direct access to it, I stopped using group sharing option since that removes a level of protection for data files - annex makes them writeable then. See https://git-annex.branchable.com/bugs/--shared_setting_of_git_causes_annex__39__ed_files_to_be_writeable__33__/ . I do realize that if I was using bare repo, then I would have a lower chance to screw those files up.

Now to particulars:

Back to your workflow – as a workaround you could initiate a shared bare git/annex repository manually and use it as a submodule within your “project” (or whatever) dataset just fine:

/tmp > mkdir -p barerepos/repo1.git; cd barerepos/repo1.git; chown yoh.adm .; git init --bare --shared=group; git annex init "Central Repo (origin) Project"                                                                    Initialized empty shared Git repository in /tmp/barerepos/repo1.git/
init Central Repo (origin) Project ok
(recording state in git...)

/tmp/barerepos/repo1.git > cd /tmp/; datalad create projrepos/project1; cd projrepos/project1; datalad install -d . -s /tmp/barerepos/repo1.git;   cd repo1; datalad create --force .; 
[INFO   ] Creating a new annex repo at /tmp/projrepos/project1        
create(ok): /tmp/projrepos/project1 (dataset)
[INFO   ] Cloning /tmp/barerepos/repo1.git into '/tmp/projrepos/project1/repo1' 
install(ok): repo1 (dataset)
action summary:
  add (notneeded: 1)
  install (ok: 1)
[INFO   ] Creating a new annex repo at /tmp/projrepos/project1/repo1 
create(ok): /tmp/projrepos/project1/repo1 (dataset)

/tmp/projrepos/project1/repo1 > echo 123 > 123; datalad add -m "Added precious data file" 123; datalad publish --to origin 123                                                                                                  
add(ok): /tmp/projrepos/project1/repo1/123 (file)
save(ok): /tmp/projrepos/project1/repo1 (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
[INFO   ] Publishing <Dataset path=/tmp/projrepos/project1/repo1> data to origin 
publish(ok): /tmp/projrepos/project1/repo1/123 (file)
[INFO   ] Publishing <Dataset path=/tmp/projrepos/project1/repo1> to origin 
publish(ok): /tmp/projrepos/project1/repo1 (dataset) [pushed to origin: ['026535c..d9ddf4b', '[new branch]']]
action summary:
  publish (ok: 2)

/tmp/projrepos/project1/repo1 > git lg
* 7aded21 - (HEAD -> master, origin/master, synced/master) Added precious data file (38 seconds ago) [Yaroslav Halchenko]
* 45c868a - [DATALAD] new dataset (47 seconds ago) [Yaroslav Halchenko]
* 3012e81 - [DATALAD] Set default backend for all files to be MD5E (48 seconds ago) [Yaroslav Halchenko]

/tmp/projrepos/project1/repo1 > find /tmp/barerepos/repo1.git/annex/objects -ls
  1082908      4 drwxrws---   3 yoh      adm          4096 Sep 13 10:46 /tmp/barerepos/repo1.git/annex/objects
  1082909      4 drwxrws---   3 yoh      adm          4096 Sep 13 10:46 /tmp/barerepos/repo1.git/annex/objects/ff4
  1082910      4 drwxrws---   3 yoh      adm          4096 Sep 13 10:46 /tmp/barerepos/repo1.git/annex/objects/ff4/c57
  1082911      4 drwxrws---   2 yoh      adm          4096 Sep 13 10:46 /tmp/barerepos/repo1.git/annex/objects/ff4/c57/MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f
  1082907      4 -rw-rw----   1 yoh      adm             4 Sep 13 10:46 /tmp/barerepos/repo1.git/annex/objects/ff4/c57/MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f/MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f

Note: One little “side-effect” in above would be that repo1 is not added a submodule to project1, because it is still empty when we datalad install -d . it, and git cannot track empty repositories. So you would need to datalad add -d . repo1 it after you added some files to it, and adjust url manually if you want by default to point to the bare one:

/tmp/projrepos/project1/repo1 > cd ..

/tmp/projrepos/project1 > git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	repo1/

nothing added to commit but untracked files present (use "git add" to track)

/tmp/projrepos/project1 > datalad add -d . -m "adding now non empty" repo1
add(ok): repo1 (dataset) [added new subdataset]                           
add(notneeded): .gitmodules (file) [already included in the dataset]
add(notneeded): repo1 (dataset) [nothing to add from /tmp/projrepos/project1/repo1]
save(ok): /tmp/projrepos/project1 (dataset)
action summary:
  add (notneeded: 2, ok: 1)
  save (ok: 1)

/tmp/projrepos/project1 > cat .gitmodules 
[submodule "repo1"]
	path = repo1
	url = ./repo1

/tmp/projrepos/project1 > git config -f .gitmodules submodule.repo1.url /tmp/barerepos/repo1.git 

/tmp/projrepos/project1 > datalad save -m "adjusted url to point to the bare repo" 
save(ok): /tmp/projrepos/project1 (dataset)

/tmp/projrepos/project1 > git show
commit b69ee06dc512861de5b2d95fa8dd77f88c1decb0 (HEAD -> master)
Author: Yaroslav Halchenko <debian@onerussian.com>
Date:   Thu Sep 13 10:49:17 2018 -0400

    adjusted url to point to the bare repo

diff --git a/.gitmodules b/.gitmodules
index cdf4af7..f3888dc 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,3 @@
 [submodule "repo1"]
        path = repo1
-   url = ./repo1
+ url = /tmp/barerepos/repo1.git

this last step shouldn’t be necessary if bare repo is already populated:

/tmp > cd /tmp/; datalad create projrepos/project2; cd projrepos/project2; datalad install -d . -s /tmp/barerepos/repo1.git; cd repo1; ls -l                                                                                    
[INFO   ] Creating a new annex repo at /tmp/projrepos/project2 
create(ok): /tmp/projrepos/project2 (dataset)
[INFO   ] Cloning /tmp/barerepos/repo1.git into '/tmp/projrepos/project2/repo1' 
install(ok): repo1 (dataset)
action summary:
  add (notneeded: 2, ok: 1)
  install (ok: 1)
  save (ok: 1)
total 4
lrwxrwxrwx 1 yoh yoh 108 Sep 13 10:52 123 -> .git/annex/objects/pF/Zf/MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f/MD5E-s4--ba1f2511fc30423bdbb183fe33f3dd0f

/tmp/projrepos/project2/repo1 > cd ..

/tmp/projrepos/project2 > git status
On branch master
nothing to commit, working tree clean

/tmp/projrepos/project2 > cat .gitmodules 
[submodule "repo1"]
	path = repo1
	url = /tmp/barerepos/repo1.git

I hope it clears things up :wink: