Datalad dataset cleaning questions

Dear Community,

my lab tries to implement datalad on our HPC. We are facing some issues and I hope for some guidance regarding them.

When cleaning up datasets after (failed) test computations and deletion of result branches with git annex unused; git annex dropunused users which do not own the subdataset (but group members with full permission; rwx) run into permission errors. Is there a way to collaboratively clean a dataset?

Furthermore, when fetching from an ephemeral clone during a job a not so well cleaned dataset with a lot of unused files in its annex with datalad get -n seemingly the whole (large) annex is transferred to the clone. As our workflow is to create the clone on a nodes scratch partition for optimized I/O which has limited storage space, this quickly fills up the node leading to premature job termination and even when origin has a cleaned up annex, hinders across subject parallelization on a single node. I wonder whether there is a way to only get the storage localisation information and the symlinks without transferring all the data to the clone as that is the way I understood datalad get -n primarily.

Cheers,
Marvin

1 Like

at least I personally would get a better grasp of the situation if some commands, “dumps” of output/logs, permissions on sample files, datalad and git annex version where shown/disclosed. Otherwise hard to guess what is going on, in particular for datalad get -n which should not get any annex’ed data .

Hi Yaroslav,

happy to provide further information. Yesterday was a little bit late :smile:

I cloned a test dataset (origin). Within origin is a bids subdataset raw_bids/ which contains subject subdatasets.

In the clone I used datalad get -n raw_bids. After get was finished the .git in raw_bids/ required 4.5gb of storage with nearly all of that was due to files .git/objects. By the way the size of .git in origin is 8.9gb after cleaning it with git gc and git annex unused; git annex dropunused while there are no branches besides master and git-annex. I am not sure where the large files in .git come from because besides some text files and the subject subdatasets there is nothing (no bulk files) in raw_bids/. The subject subdatasets (e.g. raw_bids/sub-ewgenia001) have (after cleaning them with the same procedure as above) a .git size of 882M. However, the niftis should make up 294M of that


$ datalad -l debug get -n raw_bids/

output

[DEBUG  ] Command line args 1st pass for DataLad 0.14.4. Parsed: Namespace() Unparsed: ['get', '-n', 'raw_bids/'] 
[DEBUG  ] Discovering plugins 
[DEBUG  ] Building doc for <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG  ] Building doc for <class 'datalad.core.distributed.clone.Clone'> 
[DEBUG  ] Building doc for <class 'datalad.distribution.get.Get'> 
[DEBUG  ] Parsing known args among ['/work/fatx405/miniconda3/bin/datalad', '-l', 'debug', 'get', '-n', 'raw_bids/'] 
[DEBUG  ] Async run:
|  cwd=None
|  cmd=['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', '--git-dir=', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 32490 started 
[DEBUG  ] Waiting for process 32490 to complete 
[DEBUG  ] Process 32490 exited with return code 0 
[DEBUG  ] Determined class of decorated function: <class 'datalad.distribution.get.Get'> 
[DEBUG  ] Resolved dataset for get content: /work/fatx405/projects/CSI_TEST_scaling 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 32520 started 
[DEBUG  ] Waiting for process 32520 to complete 
[DEBUG  ] Process 32520 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST_scaling/.datalad/config'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST_scaling/.datalad/config'] 
[DEBUG  ] Process 32550 started 
[DEBUG  ] Waiting for process 32550 to complete 
[DEBUG  ] Process 32550 exited with return code 0 
[DEBUG  ] Determined class of decorated function: <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG  ] Resolved dataset for subdataset reporting/modification: /work/fatx405/projects/CSI_TEST_scaling 
[DEBUG  ] Query subdatasets of None 
[DEBUG  ] Query subdatasets underneath paths: [PosixPath('/work/fatx405/projects/CSI_TEST_scaling/data')] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Process 32586 started 
[DEBUG  ] Waiting for process 32586 to complete 
[DEBUG  ] Process 32586 exited with return code 0 
[DEBUG  ] AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling).get_content_info(...) 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Process 32616 started 
[DEBUG  ] Waiting for process 32616 to complete 
[DEBUG  ] Process 32616 exited with return code 0 
[DEBUG  ] AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling).get_content_info(...) 
[DEBUG  ] Query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Process 32647 started 
[DEBUG  ] Waiting for process 32647 to complete 
[DEBUG  ] Process 32647 exited with return code 0 
[DEBUG  ] Done query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Done AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling).get_content_info(...) 
[DEBUG  ] Query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z', '--', 'data'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z', '--', 'data'] 
[DEBUG  ] Process 32682 started 
[DEBUG  ] Waiting for process 32682 to complete 
[DEBUG  ] Process 32682 exited with return code 0 
[DEBUG  ] Done query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Done AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling).get_content_info(...) 
[DEBUG  ] Determined class of decorated function: <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG  ] Resolved dataset for subdataset reporting/modification: /work/fatx405/projects/CSI_TEST_scaling 
[DEBUG  ] Query subdatasets of Dataset(/work/fatx405/projects/CSI_TEST_scaling) 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Process 32716 started 
[DEBUG  ] Waiting for process 32716 to complete 
[DEBUG  ] Process 32716 exited with return code 0 
[DEBUG  ] AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling).get_content_info(...) 
[DEBUG  ] Query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Process 32750 started 
[DEBUG  ] Waiting for process 32750 to complete 
[DEBUG  ] Process 32750 exited with return code 0 
[DEBUG  ] Done query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Done AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling).get_content_info(...) 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] 
[DEBUG  ] Process 312 started 
[DEBUG  ] Waiting for process 312 to complete 
[DEBUG  ] Process 312 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'rev-list', '-n1', 'HEAD', '--', 'data/raw_bids'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'rev-list', '-n1', 'HEAD', '--', 'data/raw_bids'] 
[DEBUG  ] Process 342 started 
[DEBUG  ] Waiting for process 342 to complete 
[DEBUG  ] Process 342 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', '--contains=652448a398fdcd59bdcecf940b436230130a5b5f', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', '--contains=652448a398fdcd59bdcecf940b436230130a5b5f', 'refs/remotes'] 
[DEBUG  ] Process 373 started 
[DEBUG  ] Waiting for process 373 to complete 
[DEBUG  ] Process 373 exited with return code 0 
[DEBUG  ] Git clone from /work/fatx405/projects/CSI_TEST/data/raw_bids to /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids 
[DEBUG  ] Async run:                              
|  cwd=None
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'clone', '--progress', '/work/fatx405/projects/CSI_TEST/data/raw_bids', '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'clone', '--progress', '/work/fatx405/projects/CSI_TEST/data/raw_bids', '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids'] 
[DEBUG  ] Process 457 started                     
[DEBUG  ] Waiting for process 457 to complete     
[DEBUG  ] Non-progress stderr: b"Cloning into '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids'...\n" 
[DEBUG  ] Non-progress stderr: b'done.\n'         
[DEBUG  ] Process 457 exited with return code 0   
[DEBUG  ] Git clone completed                     
[DEBUG  ] Async run:                              
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Process 1466 started 
[DEBUG  ] Waiting for process 1466 to complete 
[DEBUG  ] Process 1466 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Process 1500 started 
[DEBUG  ] Waiting for process 1500 to complete 
[DEBUG  ] Process 1500 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 1534 started 
[DEBUG  ] Waiting for process 1534 to complete 
[DEBUG  ] Process 1534 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids/.datalad/config'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids/.datalad/config'] 
[DEBUG  ] Process 1568 started 
[DEBUG  ] Waiting for process 1568 to complete 
[DEBUG  ] Process 1568 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Process 1599 started 
[DEBUG  ] Waiting for process 1599 to complete 
[DEBUG  ] Process 1599 exited with return code 0 
[DEBUG  ] Determined origin to be remote of Dataset(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids) 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Process 1629 started 
[DEBUG  ] Waiting for process 1629 to complete 
[DEBUG  ] Process 1629 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'rev-parse', '--verify', 'HEAD^{commit}'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'rev-parse', '--verify', 'HEAD^{commit}'] 
[DEBUG  ] Process 1659 started 
[DEBUG  ] Waiting for process 1659 to complete 
[DEBUG  ] Process 1659 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Process 1689 started 
[DEBUG  ] Waiting for process 1689 to complete 
[DEBUG  ] Process 1689 exited with return code 0 
[DEBUG  ] Initializing annex repo at /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] 
[DEBUG  ] Process 1720 started 
[DEBUG  ] Waiting for process 1720 to complete 
[DEBUG  ] Process 1720 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads'] 
[DEBUG  ] Process 1758 started 
[DEBUG  ] Waiting for process 1758 to complete 
[DEBUG  ] Process 1758 exited with return code 0 
[DEBUG  ] Launching process ['/work/fatx405/miniconda3/bin/python', '--version'] 
[DEBUG  ] Process 1788 started 
[DEBUG  ] Waiting for process 1788 to complete 
[DEBUG  ] Process 1788 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=None
|  cmd=['git', 'annex', 'version', '--raw'] 
[DEBUG  ] Launching process ['git', 'annex', 'version', '--raw'] 
[DEBUG  ] Process 1789 started 
[DEBUG  ] Waiting for process 1789 to complete 
[DEBUG  ] Process 1789 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'init', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'annex', 'init', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] 
[DEBUG  ] Process 1826 started 
[DEBUG  ] Waiting for process 1826 to complete 
[INFO   ] Scanning for unlocked files (this may take some time) 
[DEBUG  ] Process 1826 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 1945 started 
[DEBUG  ] Waiting for process 1945 to complete 
[DEBUG  ] Process 1945 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'cat-file', 'blob', 'git-annex:remote.log'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'cat-file', 'blob', 'git-annex:remote.log'] 
[DEBUG  ] Process 1976 started 
[DEBUG  ] Waiting for process 1976 to complete 
[DEBUG  ] Process 1976 exited with return code 128 
[Level 11] CommandError: 'git -c diff.ignoreSubmodules=none cat-file blob git-annex:remote.log' failed with exitcode 128 under /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids [err: 'fatal: Not a valid object name git-annex:remote.log'] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 2006 started 
[DEBUG  ] Waiting for process 2006 to complete 
[DEBUG  ] Process 2006 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST/data/raw_bids/.datalad/config'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST/data/raw_bids/.datalad/config'] 
[DEBUG  ] Process 2037 started 
[DEBUG  ] Waiting for process 2037 to complete 
[DEBUG  ] Process 2037 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'symbolic-ref', 'HEAD'] 
[DEBUG  ] Process 2067 started 
[DEBUG  ] Waiting for process 2067 to complete 
[DEBUG  ] Process 2067 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'rev-parse', '--quiet', '--verify', 'HEAD^{commit}'] 
[DEBUG  ] Process 2097 started 
[DEBUG  ] Waiting for process 2097 to complete 
[DEBUG  ] Process 2097 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', 'config', '--local', '--replace-all', 'submodule.data/raw_bids.active', 'true'] 
[DEBUG  ] Launching process ['git', 'config', '--local', '--replace-all', 'submodule.data/raw_bids.active', 'true'] 
[DEBUG  ] Process 2127 started 
[DEBUG  ] Waiting for process 2127 to complete 
[DEBUG  ] Process 2127 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', 'config', '--local', '--replace-all', 'submodule.data/raw_bids.url', '/work/fatx405/projects/CSI_TEST/data/raw_bids'] 
[DEBUG  ] Launching process ['git', 'config', '--local', '--replace-all', 'submodule.data/raw_bids.url', '/work/fatx405/projects/CSI_TEST/data/raw_bids'] 
[DEBUG  ] Process 2157 started 
[DEBUG  ] Waiting for process 2157 to complete 
[DEBUG  ] Process 2157 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling
|  cmd=['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 2187 started 
[DEBUG  ] Waiting for process 2187 to complete 
[DEBUG  ] Process 2187 exited with return code 0 
[DEBUG  ] Installed subdataset in order to get /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids [install(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids)] 
install(ok): /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids (dataset) [Installed subdataset in order to get /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids]
[DEBUG  ] Determined class of decorated function: <class 'datalad.local.subdatasets.Subdatasets'> 
[DEBUG  ] Resolved dataset for subdataset reporting/modification: /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids 
[DEBUG  ] Query subdatasets of Dataset(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids) 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Process 2218 started 
[DEBUG  ] Waiting for process 2218 to complete 
[DEBUG  ] Process 2218 exited with return code 0 
[DEBUG  ] AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids).get_content_info(...) 
[DEBUG  ] Query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Process 2256 started 
[DEBUG  ] Waiting for process 2256 to complete 
[DEBUG  ] Process 2256 exited with return code 0 
[DEBUG  ] Done query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Done AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids).get_content_info(...) 
[DEBUG  ] Not reporting result (excluded by filter <function get_result_filter.<locals>._fx at 0x1471282ebd40> with arguments {'path': None, 'dataset': Dataset('/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids'), 'fulfilled': None, 'recursive': False, 'recursion_limit': None, 'contains': PosixPath('/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids'), 'bottomup': False, 'set_property': None, 'delete_property': None, 'on_failure': 'ignore', 'result_filter': <function is_ok_dataset at 0x147128e8f9e0>} [utils.py:keep_result:678]): {'action': 'subdataset', 'path': '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids', 'status': 'impossible', 'message': 'path not contained in any matching subdataset'} 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin'] 
[DEBUG  ] Process 2286 started 
[DEBUG  ] Waiting for process 2286 to complete 
[DEBUG  ] Process 2286 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids/.datalad/config'] 
[DEBUG  ] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids/.datalad/config'] 
[DEBUG  ] Process 2324 started 
[DEBUG  ] Waiting for process 2324 to complete 
[DEBUG  ] Process 2324 exited with return code 0 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'config', '-z', '-l', '--file', '.gitmodules'] 
[DEBUG  ] Process 2358 started 
[DEBUG  ] Waiting for process 2358 to complete 
[DEBUG  ] Process 2358 exited with return code 0 
[DEBUG  ] AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids).get_content_info(...) 
[DEBUG  ] Query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Async run:
|  cwd=/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
|  cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Launching process ['git', '-c', 'diff.ignoreSubmodules=none', 'ls-files', '--stage', '-z'] 
[DEBUG  ] Process 2395 started 
[DEBUG  ] Waiting for process 2395 to complete 
[DEBUG  ] Process 2395 exited with return code 0 
[DEBUG  ] Done query repo: ['ls-files', '--stage', '-z'] 
[DEBUG  ] Done AnnexRepo(/work/fatx405/projects/CSI_TEST_scaling/data/raw_bids).get_content_info(...) 

Size of .git in raw_bids/of the clone

output
du -sch .git/*
8.5K    .git/annex
512     .git/config
512     .git/description
512     .git/HEAD
26K     .git/hooks
2.0K    .git/index
1.5K    .git/info
5.0K    .git/logs
4.5G    .git/objects
512     .git/packed-refs
3.5K    .git/refs
4.5G    total

Size of .git in raw_bids/of the original dataset

output
du -sch .git/*
4.5G    .git/annex
512     .git/COMMIT_EDITMSG
1.5K    .git/config
0       .git/config.dataladlock
512     .git/description
512     .git/HEAD
26K     .git/hooks
2.0K    .git/index
1.5K    .git/info
40K     .git/logs
4.5G    .git/objects
512     .git/ORIG_HEAD
2.0K    .git/refs
8.9G    total

raw_bids/sub-ewgenia001$ du -sch .git/*

output

881M .git/annex
512 .git/COMMIT_EDITMSG
512 .git/config
0 .git/config.dataladlock
512 .git/description
512 .git/HEAD
26K .git/hooks
2.0K .git/index
2.0K .git/info
5.0K .git/logs
424K .git/objects
512 .git/ORIG_HEAD
512 .git/packed-refs
1.5K .git/refs
882M total


raw_bids/sub-ewgenia001$ du -schL */*/*

output

2.0K ses-1/anat/sub-ewgenia001_ses-1_FLAIR.json
17M ses-1/anat/sub-ewgenia001_ses-1_FLAIR.nii.gz
2.0K ses-1/anat/sub-ewgenia001_ses-1_T1w.json
17M ses-1/anat/sub-ewgenia001_ses-1_T1w.nii.gz
512 ses-1/dwi/sub-ewgenia001_ses-1_dir-AP_dwi.bval
3.0K ses-1/dwi/sub-ewgenia001_ses-1_dir-AP_dwi.bvec
2.0K ses-1/dwi/sub-ewgenia001_ses-1_dir-AP_dwi.json
220M ses-1/dwi/sub-ewgenia001_ses-1_dir-AP_dwi.nii.gz
2.5K ses-1/func/sub-ewgenia001_ses-1_task-rest_bold.json
42M ses-1/func/sub-ewgenia001_ses-1_task-rest_bold.nii.gz
294M total




We reproduced the permission error in an unrelated dataset (qsiprep). This is the output of a colleague.

git annex unused

output
unused . (checking for unused data...) (checking master...) (checking job-1451292-qsiprep-sub-ewgenia008-11082021...) (checking job-1451292-qsiprep-sub-ewgenia007-11082021...) (checking job-1451292-qsiprep-sub-ewgenia006-11082021...) (checking job-1451292-qsiprep-sub-ewgenia005-11082021...) (checking job-1451291-qsiprep-sub-ewgenia004-11082021...) (checking job-1451291-qsiprep-sub-ewgenia003-11082021...) (checking job-1451291-qsiprep-sub-ewgenia002-11082021...) (checking job-1451291-qsiprep-sub-ewgenia001-11082021...) (checking job-1451258-qsiprep-sub-ewgenia010-11082021...) (checking job-1451258-qsiprep-sub-ewgenia009-11082021...) (checking job-1450319-qsiprep-sub-ewgenia001-09082021...) (checking job-1441518-qsiprep-sub-ewgenia010-26072021...) (checking job-1436590-qsiprep-sub-ewgenia004-14072021...) (checking job-1436589-qsiprep-sub-ewgenia009-14072021...) (checking job-1436206-qsiprep-sub-ewgenia009-14072021...) (checking job-1436205-qsiprep-sub-ewgenia008-14072021...) (checking job-1436204-qsiprep-sub-ewgenia007-14072021...) (checking job-1436202-qsiprep-sub-ewgenia005-14072021...) (checking job-1436200-qsiprep-sub-ewgenia003-14072021...) (checking job-1436199-qsiprep-sub-ewgenia002-14072021...) (checking job-1436198-qsiprep-sub-ewgenia001-14072021...) (checking job-1436197-qsiprep-sub-ewgenia010-14072021...) (checking job-1436018-qsiprep-sub-ewgenia007-14072021...) (checking job-1436018-qsiprep-sub-ewgenia006-14072021...) (checking job-1435923-qsiprep-sub-ewgenia010-14072021...) (checking job-1435603-qsiprep-sub-ewgenia005-13072021...) (checking job-1435601-qsiprep-sub-ewgenia002-13072021...) (checking job-1435601-qsiprep-sub-ewgenia001-13072021...) (checking job-1433449-qsirecon-sub-ewgenia001...) 
  Some annexed data is no longer used by any files:
    NUMBER  KEY
    1       MD5E-s3456724--3e282f5f6b7b34ddb9db27f54cda85eb.mif.gz
    2       MD5E-s93998--e48e09668a34d299e85c421734598ced.mif.gz
    3       MD5E-s91838--de119e1ae84111150d639689b07734d5.mif.gz
    4       MD5E-s61013--8f542c7dffb440679134c7de78ec3d67.nii.gz
    5       MD5E-s3476347--d04409241b77121be7361e602626236d.mif.gz
    6       MD5E-s91855--01b626dfbe94f321a9561b62c2c9c41a.mif.gz
    7       MD5E-s102726--8f737b752aed2f7cbaa6058926c9a225.mif.gz
    8       MD5E-s166542--8890611035dce931dcac2735110a6225.nii.gz
    9       MD5E-s14500549--d00fc69e9add87193480c611400d8fe6.mat
    10      MD5E-s144729--8261ee60217139191a1fd9881f6c8f1e.nii.gz
    11      MD5E-s118409--c0bd9c2a4a5518511f4a06e41667edcf.mif.gz
    12      MD5E-s144861--19bd122d89d4f79ea0605abeb2577c95.nii.gz
    13      MD5E-s42777--b3598fba50c6587e16b6b48da7421120.nii.gz
    14      MD5E-s131189--c97d3073ba88641fc73ed1e1ef111bd6.nii.gz
    15      MD5E-s132947--95947f3ea284a1375513cb1bb4624ae5.nii.gz
    16      MD5E-s118312--7e5784d5b43e4d665f6f03c9bfb7324c.mif.gz
    17      MD5E-s37308--08237bab959e1f4eb29d538692800ca2.mif.gz
    18      MD5E-s133058--e8ef5c82f58ff26b3cda05b3c5db2e7d.nii.gz
    19      MD5E-s1381--59931915edf042a9c85387fb2c5a276f.txt
    20      MD5E-s152070--c154efbb93eeca2c432176707d080049.nii.gz
    21      MD5E-s160967--da6173dd364b659727a1c58a15a470d7.nii.gz
    22      MD5E-s119669--d0006b7fdc951defb93df1545cd6b596.mif.gz
    23      MD5E-s160606--fa58a9b4f13e32071ddb812b8e580d10.nii.gz
    24      MD5E-s31661558--e42f36aff2910897cc925964c6b0a6de.mif.gz
    25      MD5E-s14032580--6c05da058e3cddbd4bfba82fb13977e5.nii.gz
    26      MD5E-s12330650760--2dff34dde8a38b179eb5659e9da71a22.tck
    27      MD5E-s102739--9f8d1cfd1fc975fd83a849c9456c2527.mif.gz
    28      MD5E-s123862--dec1d65b280c49552c922b74e1b70696.mif.gz
    29      MD5E-s124595534--da2041e5cc68334ed48b81623e0983d5.csv
    30      MD5E-s164914--2ee129dff2d0131db770b880fbc3c6ff.nii.gz
    31      MD5E-s1107--31cc2b0ac3c4cf313024ce6cdf262c11.txt
    32      MD5E-s105323--7ba513e64c8907521e657bf85bad5d29.mif.gz
    33      MD5E-s1107--f294b38eb0c026ea3d8f11621afb61a3.txt
  (To see where data was previously used, try: git log --stat --no-textconv -S'KEY')
  
  To remove unwanted data: git-annex dropunused NUMBER
  
ok


git annex dropunused all --force

output
git-annex: failed to lock content: .git/annex/objects/Qm/jM/MD5E-s3456724--3e282f5f6b7b34ddb9db27f54cda85eb.mif.gz/MD5E-s3456724--3e282f5f6b7b34ddb9db27f54cda85eb.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 2 
git-annex: failed to lock content: .git/annex/objects/3M/Qk/MD5E-s93998--e48e09668a34d299e85c421734598ced.mif.gz/MD5E-s93998--e48e09668a34d299e85c421734598ced.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 3 
git-annex: failed to lock content: .git/annex/objects/1v/0x/MD5E-s91838--de119e1ae84111150d639689b07734d5.mif.gz/MD5E-s91838--de119e1ae84111150d639689b07734d5.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 4 
git-annex: failed to lock content: .git/annex/objects/6X/PP/MD5E-s61013--8f542c7dffb440679134c7de78ec3d67.nii.gz/MD5E-s61013--8f542c7dffb440679134c7de78ec3d67.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 5 
git-annex: failed to lock content: .git/annex/objects/pW/9x/MD5E-s3476347--d04409241b77121be7361e602626236d.mif.gz/MD5E-s3476347--d04409241b77121be7361e602626236d.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 6 
git-annex: failed to lock content: .git/annex/objects/0g/Gv/MD5E-s91855--01b626dfbe94f321a9561b62c2c9c41a.mif.gz/MD5E-s91855--01b626dfbe94f321a9561b62c2c9c41a.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 7 
git-annex: failed to lock content: .git/annex/objects/JX/Xg/MD5E-s102726--8f737b752aed2f7cbaa6058926c9a225.mif.gz/MD5E-s102726--8f737b752aed2f7cbaa6058926c9a225.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 8 
git-annex: failed to lock content: .git/annex/objects/w3/kJ/MD5E-s166542--8890611035dce931dcac2735110a6225.nii.gz/MD5E-s166542--8890611035dce931dcac2735110a6225.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 9 
git-annex: failed to lock content: .git/annex/objects/7W/5K/MD5E-s14500549--d00fc69e9add87193480c611400d8fe6.mat/MD5E-s14500549--d00fc69e9add87193480c611400d8fe6.mat: setFileMode: permission denied (Operation not permitted)
failed
dropunused 10 
git-annex: failed to lock content: .git/annex/objects/zK/58/MD5E-s144729--8261ee60217139191a1fd9881f6c8f1e.nii.gz/MD5E-s144729--8261ee60217139191a1fd9881f6c8f1e.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 11 
git-annex: failed to lock content: .git/annex/objects/9J/F6/MD5E-s118409--c0bd9c2a4a5518511f4a06e41667edcf.mif.gz/MD5E-s118409--c0bd9c2a4a5518511f4a06e41667edcf.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 12 
git-annex: failed to lock content: .git/annex/objects/7V/w3/MD5E-s144861--19bd122d89d4f79ea0605abeb2577c95.nii.gz/MD5E-s144861--19bd122d89d4f79ea0605abeb2577c95.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 13 
git-annex: failed to lock content: .git/annex/objects/4K/Fv/MD5E-s42777--b3598fba50c6587e16b6b48da7421120.nii.gz/MD5E-s42777--b3598fba50c6587e16b6b48da7421120.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 14 
git-annex: failed to lock content: .git/annex/objects/mk/X7/MD5E-s131189--c97d3073ba88641fc73ed1e1ef111bd6.nii.gz/MD5E-s131189--c97d3073ba88641fc73ed1e1ef111bd6.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 15 
git-annex: failed to lock content: .git/annex/objects/XP/2g/MD5E-s132947--95947f3ea284a1375513cb1bb4624ae5.nii.gz/MD5E-s132947--95947f3ea284a1375513cb1bb4624ae5.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 16 
git-annex: failed to lock content: .git/annex/objects/J1/gF/MD5E-s118312--7e5784d5b43e4d665f6f03c9bfb7324c.mif.gz/MD5E-s118312--7e5784d5b43e4d665f6f03c9bfb7324c.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 17 
git-annex: failed to lock content: .git/annex/objects/Wg/z5/MD5E-s37308--08237bab959e1f4eb29d538692800ca2.mif.gz/MD5E-s37308--08237bab959e1f4eb29d538692800ca2.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 18 
git-annex: failed to lock content: .git/annex/objects/k5/53/MD5E-s133058--e8ef5c82f58ff26b3cda05b3c5db2e7d.nii.gz/MD5E-s133058--e8ef5c82f58ff26b3cda05b3c5db2e7d.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 19 
git-annex: failed to lock content: .git/annex/objects/xk/v4/MD5E-s1381--59931915edf042a9c85387fb2c5a276f.txt/MD5E-s1381--59931915edf042a9c85387fb2c5a276f.txt: setFileMode: permission denied (Operation not permitted)
failed
dropunused 20 
git-annex: failed to lock content: .git/annex/objects/0k/wG/MD5E-s152070--c154efbb93eeca2c432176707d080049.nii.gz/MD5E-s152070--c154efbb93eeca2c432176707d080049.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 21 
git-annex: failed to lock content: .git/annex/objects/vm/g2/MD5E-s160967--da6173dd364b659727a1c58a15a470d7.nii.gz/MD5E-s160967--da6173dd364b659727a1c58a15a470d7.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 22 
git-annex: failed to lock content: .git/annex/objects/84/V3/MD5E-s119669--d0006b7fdc951defb93df1545cd6b596.mif.gz/MD5E-s119669--d0006b7fdc951defb93df1545cd6b596.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 23 
git-annex: failed to lock content: .git/annex/objects/5F/4W/MD5E-s160606--fa58a9b4f13e32071ddb812b8e580d10.nii.gz/MD5E-s160606--fa58a9b4f13e32071ddb812b8e580d10.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 24 
git-annex: failed to lock content: .git/annex/objects/xM/f2/MD5E-s31661558--e42f36aff2910897cc925964c6b0a6de.mif.gz/MD5E-s31661558--e42f36aff2910897cc925964c6b0a6de.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 25 
git-annex: failed to lock content: .git/annex/objects/mP/PG/MD5E-s14032580--6c05da058e3cddbd4bfba82fb13977e5.nii.gz/MD5E-s14032580--6c05da058e3cddbd4bfba82fb13977e5.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 26 
git-annex: failed to lock content: .git/annex/objects/53/97/MD5E-s12330650760--2dff34dde8a38b179eb5659e9da71a22.tck/MD5E-s12330650760--2dff34dde8a38b179eb5659e9da71a22.tck: setFileMode: permission denied (Operation not permitted)
failed
dropunused 27 
git-annex: failed to lock content: .git/annex/objects/Z0/9F/MD5E-s102739--9f8d1cfd1fc975fd83a849c9456c2527.mif.gz/MD5E-s102739--9f8d1cfd1fc975fd83a849c9456c2527.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 28 
git-annex: failed to lock content: .git/annex/objects/zw/7F/MD5E-s123862--dec1d65b280c49552c922b74e1b70696.mif.gz/MD5E-s123862--dec1d65b280c49552c922b74e1b70696.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 29 
git-annex: failed to lock content: .git/annex/objects/x0/wP/MD5E-s124595534--da2041e5cc68334ed48b81623e0983d5.csv/MD5E-s124595534--da2041e5cc68334ed48b81623e0983d5.csv: setFileMode: permission denied (Operation not permitted)
failed
dropunused 30 
git-annex: failed to lock content: .git/annex/objects/0Q/Km/MD5E-s164914--2ee129dff2d0131db770b880fbc3c6ff.nii.gz/MD5E-s164914--2ee129dff2d0131db770b880fbc3c6ff.nii.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 31 
git-annex: failed to lock content: .git/annex/objects/9W/V9/MD5E-s1107--31cc2b0ac3c4cf313024ce6cdf262c11.txt/MD5E-s1107--31cc2b0ac3c4cf313024ce6cdf262c11.txt: setFileMode: permission denied (Operation not permitted)
failed
dropunused 32 
git-annex: failed to lock content: .git/annex/objects/vQ/7k/MD5E-s105323--7ba513e64c8907521e657bf85bad5d29.mif.gz/MD5E-s105323--7ba513e64c8907521e657bf85bad5d29.mif.gz: setFileMode: permission denied (Operation not permitted)
failed
dropunused 33 
git-annex: failed to lock content: .git/annex/objects/8J/g5/MD5E-s1107--f294b38eb0c026ea3d8f11621afb61a3.txt/MD5E-s1107--f294b38eb0c026ea3d8f11621afb61a3.txt: setFileMode: permission denied (Operation not permitted)
failed```

A permission string example. fatx405 refers to me, hpc_ag_thomalla to the group my colleague and I are parts of

-rwxrwx--- 1 fatx405 hpc_ag_thomalla 103K Jul  8 07:56 .git/annex/objects/vQ/7k/MD5E-s105323--7ba513e64c8907521e657bf85bad5d29.mif.gz/MD5E-s105323--7ba513e64c8907521e657bf85bad5d29.mif.gz

$ datalad wtf

output
datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials 
  - keyring: 
    - active_backends: 
      - PlaintextKeyring with no encyption v.1.0 at /home/fatx405/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /home/fatx405/.config/python_keyring/keyringrc.cfg
    - data_root: /home/fatx405/.local/share/python_keyring
## datalad 
  - full_version: 0.14.4
  - version: 0.14.4
## dataset 
  - id: 8a07960e-d233-4818-9942-8e67247bbedd
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /work/fatx405/projects/CSI_TEST_scaling/data/raw_bids
  - repo: AnnexRepo
## dependencies 
  - annexremote: 1.5.0
  - appdirs: 1.4.4
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 8.20201104-g13bab4f2c
  - cmd:bundled-git: 2.29.2
  - cmd:git: 2.29.2
  - cmd:system-git: 2.29.2
  - cmd:system-ssh: 7.4p1
  - exifread: 2.1.2
  - humanize: 3.2.0
  - iso8601: 0.1.14
  - keyring: 22.0.1
  - keyrings.alt: 4.0.2
  - msgpack: 1.0.2
  - mutagen: 1.41.1
  - requests: 2.25.1
  - wrapt: 1.12.1
## environment 
  - LANG: en_US.UTF-8
  - PATH: /work/fatx405/miniconda3/bin:/sw/link/git/2.32.0/bin:/sw/env/system-gcc/singularity/3.5.2-overlayfix/bin:/sw/batch/slurm/19.05.6/bin:/sw/rrz/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
## extensions 
  - container: 
    - description: Containerized environments
    - entrypoints: 
      - datalad_container.containers_add.ContainersAdd: 
        - class: ContainersAdd
        - load_error: None
        - module: datalad_container.containers_add
        - names: 
          - containers-add
          - containers_add
      - datalad_container.containers_list.ContainersList: 
        - class: ContainersList
        - load_error: None
        - module: datalad_container.containers_list
        - names: 
          - containers-list
          - containers_list
      - datalad_container.containers_remove.ContainersRemove: 
        - class: ContainersRemove
        - load_error: None
        - module: datalad_container.containers_remove
        - names: 
          - containers-remove
          - containers_remove
      - datalad_container.containers_run.ContainersRun: 
        - class: ContainersRun
        - load_error: None
        - module: datalad_container.containers_run
        - names: 
          - containers-run
          - containers_run
    - load_error: None
    - module: datalad_container
    - version: 1.1.4
  - hirni: 
    - description: HIRNI workflows
    - entrypoints: 
      - datalad_hirni.commands.dicom2spec.Dicom2Spec: 
        - class: Dicom2Spec
        - load_error: None
        - module: datalad_hirni.commands.dicom2spec
        - names: 
          - hirni-dicom2spec
          - hirni_dicom2spec
      - datalad_hirni.commands.import_dicoms.ImportDicoms: 
        - class: ImportDicoms
        - load_error: None
        - module: datalad_hirni.commands.import_dicoms
        - names: 
          - hirni-import-dcm
          - hirni_import_dcm
      - datalad_hirni.commands.spec2bids.Spec2Bids: 
        - class: Spec2Bids
        - load_error: None
        - module: datalad_hirni.commands.spec2bids
        - names: 
          - hirni-spec2bids
          - hirni_spec2bids
      - datalad_hirni.commands.spec4anything.Spec4Anything: 
        - class: Spec4Anything
        - load_error: None
        - module: datalad_hirni.commands.spec4anything
        - names: 
          - hirni-spec4anything
          - hirni_spec4anything
    - load_error: None
    - module: datalad_hirni
    - version: 0.0.8
  - metalad: 
    - description: DataLad semantic metadata command suite
    - entrypoints: 
      - datalad_metalad.aggregate.Aggregate: 
        - class: Aggregate
        - load_error: None
        - module: datalad_metalad.aggregate
        - names: 
          - meta-aggregate
          - meta_aggregate
      - datalad_metalad.dump.Dump: 
        - class: Dump
        - load_error: None
        - module: datalad_metalad.dump
        - names: 
          - meta-dump
          - meta_dump
      - datalad_metalad.extract.Extract: 
        - class: Extract
        - load_error: None
        - module: datalad_metalad.extract
        - names: 
          - meta-extract
          - meta_extract
    - load_error: None
    - module: datalad_metalad
    - version: 0.2.1
  - neuroimaging: 
    - description: Neuroimaging tools
    - entrypoints: 
      - datalad_neuroimaging.bids2scidata.BIDS2Scidata: 
        - class: BIDS2Scidata
        - load_error: None
        - module: datalad_neuroimaging.bids2scidata
        - names: 
          - bids2scidata
    - load_error: None
    - module: datalad_neuroimaging
    - version: 0.3.1
  - ukbiobank: 
    - description: UKBiobank dataset support
    - entrypoints: 
      - datalad_ukbiobank.init.Init: 
        - class: Init________________

Cheers,
Marvin

Is it possible that this issue is related to me using datalad save -d^. -r raw_bids/ from within origin a lot? I found niftis in the .git/annex/ of origin's root that I expected to be in the .git/annex of a subject subdataset (for example raw_bids/sub-xyz). Does the save command above besides history also save into the annex of origin's root?

I use the procedure above because I noticed that if I clone origin and recent updates in origin were only saved on subjects subdataset level (datalad save within raw_bids/sub-xyz) a datalad get of a subject subdataset in the clone won’t get me the updated files and sometimes a detached head warning occurs. I can address that with running datalad save -d^. -r raw_bids/sub-xyz from within origin's root.

What is the recommended saving procedure in this case to make sure updates in subdatasets are transferred when cloning a superdataset?

So some data files were not annexed and were committed directly into git, leading to such a huge .git/objects?

May be someone first committed to git, realized that, and without git reset --hard , git reflog --expire, git gc to previous state or redoing from scratch just then managed to move them under git annex thus in effect needing twice the storage (a copy in git history and then objects in annex)?

There are helpers to figure out what those big objects are (can find later) but even looking at git log --stat could give you an idea of you see some lines with lots of plusses for data files

is to point to current dataset if you are in some subdirectory of it, so it is the same as -d. if you are in the root of the (sub) dataset. May be you meant to use -d^ to save the updated states of subdatasets in/from the top most superdataset?

Seemingly this happened somehow. However, I only use datalad save for commiting.

The .gitattributes in raw_bids/ is

* annex.backend=MD5E
**/.git* annex.largefiles=nothing
CHANGES annex.largefiles=nothing
README annex.largefiles=nothing
code/** annex.largefiles=nothing
participants.tsv merge=union
.heudiconv/ annex.largefiles=nothing

You must then never share such dataset publicly - those files could contain sensitive information. Better to have “anything” there . Depending on number of subjects etc, might also contribute notably to .git/objects size

Thanks for the clarification. Yes, that’s what I actually was aiming for. So if I understand correctly datalad save -d^ does save to the superdataset regardless of how deep I am in nested subdatasets.

introspecting git log --stat might reveal more details on sequence of events and what large got into git

Yes. It saves current sub dataset first, and then percolates up the hierarchy saving updated states of subdatasets along the path to that original subdataset from the very top superdataset

I delete .heudiconv during jobs. That it still is listed in .gitattributes is a code relic. It is changed now and I put it into .gitignore.

I cannot definitely say that it isn’t the culprit but it has only been produced for 10 subjects for a handful of test runs. git log --stat backs that as far as I can tell

Could be me, but I do not see anything large going into git when going through git log status in raw_bids/. The only big things mentioned are two containers I intermediately stored in raw_bids/code and which were installed with datalad containers-add.

That is the log entry:

commit 1b28da370d53ab10a4fd1699e7b4d352adcc6947
Author: m-petersen <>
Date:   Mon Jul 5 12:36:32 2021 +0200

    Add heudiconv and pydeface container

 code/heudiconv-0.9.0.sif | Bin 0 -> 384999424 byt
es
 code/pydeface-2.0.0.sif  | Bin 0 -> 4362514432 by
tes
 2 files changed, 0 insertions(+), 0 deletions(-)

commit 44b53b32281cb337bc1e2cfd438247d7a0d2e1e0
Author: m-petersen <>
Date:   Mon Jul 5 12:31:52 2021 +0200

    [DATALAD] Configure containerized environment 
'heudiconv'

 .datalad/environments/heudiconv/image | 1 +
 1 file changed, 1 insertion(+)

commit fbb8a54db7bf4869b5c3fd4ff0f2f6bb7c085263
Author: m-petersen <>
Date:   Mon Jul 5 11:58:22 2021 +0200

    [DATALAD] Configure containerized environment 
'pydeface'

 .datalad/config                      | 6 ++++++
 .datalad/environments/pydeface/image | 1 +
 2 files changed, 7 insertions(+)

So here you go - large container images committed directly to git, since

code/** annex.largefiles=nothing

FWIW: filed Hint the user if lots of content was committed directly to git · Issue #5919 · datalad/datalad · GitHub which if we address it could help to avoid/mitigate in the future by raising user awareness

BTW, consider using GitHub - ReproNim/containers: Containers "distribution" for reproducible neuroimaging</title as a subdataset to provide you singularity containers of many (if not all) neuroimaging containerized environments. Comes with extra benefits (see there).

Thanks a lot for your guidance. I really appreciate your time, Yaroslav.

I see. However, I’ve missed to report

commit 407a255ed634b7d7bfbe9558f34ba68546a469cc
Author: m-petersen <>
Author: m-petersen <>
Date:   Sun Aug 1 00:08:12 2021 +0200

    remove sifs from code

 code/heudiconv-0.9.0.sif | Bin 384999424 -> 0 bytes
 code/pydeface-2.0.0.sif  | Bin 4362514432 -> 0 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)

Is it still possible that both containers are in .git? And is there a way to remove them from .git although they were removed (most probably with rm) or do I have to setup a new raw_bids/? git-filter-repo does not seem to affect the large files.

I’ll investigate my other datasets with git log --stat and whether it’s always the problem that I moved files to git and not to annex.


Thanks for the recommendation. I am actually aware of the dataset.

of cause they are. That is partially why git-annex was invented - to detach (large) content from .git/objects which contains full history of changes.

there is always a way but it will be “ugly” in this case, since you need to rewrite the history using git filter-branch, so you would end up needing to “force push” new state of that repo, and if someone has a clone locally already, their git pull might not workout “neatly” :wink:
I don’t think I have used GitHub - newren/git-filter-repo: Quickly rewrite git repository history (filter-branch replacement)</t before, but as long as you parametrize it properly (to remove those code/*.sif) and if it does all needed dance with clearning reflog etc (see Git - git-filter-branch Documentation) – you must get shrunk .git/objects.

of cause they are.

Without reiterating the handbook at that moment I thought that rm <file>; datalad save would also remove a file from git not only from the annex. That’s why the question.

In my case I deleted the dataset to set it up from scratch. Next time I accidentally save stuff to .git I will try that.

It makes a commit in which file is removed, but as git keeps full history – file lives in git history “forever”.