Datalad, containers, data organization: multiple questions

Thanks, looks like a reasonably quick status can be obtained from all submodules with:
git submodule foreach --recursive 'git status -s'

$ time git submodule foreach --recursive 'git status -s'
Entering 'study1'
 ? data/processed
Entering 'study1/data/bids'
Entering 'study1/data/processed'
 ? freesurfer
Entering 'study1/data/processed/fmriprep'
Entering 'study1/data/processed/freesurfer'
?? testme.test
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
...
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'study1/data/processed/freesurfer/sub-XXXXXXXXXXX'
Entering 'containers'

real    0m39.430s
user    0m23.770s
sys     0m38.837s

Hey,

I have not been following this closely, so I might be wrong. But if you have lots of files in these datasets then a default datalad status will traverse the filesystem more intensely than a git status, because it will distinguish symlinks to annex keys from just symlinks in the type report. If you don’t care about this accuracy, use the -t raw option. With it, a datalad status should be rather close to the runtime of an uncached git status.

HTH

For more info on the behavior of the system on large N datasets, checkout https://github.com/datalad/datalad/issues/3869