Hi @yarikoptic (likely),
I was curious if you’ve thought about datalad just getting headers for each nifti file instead of the entire file. The use-case is being able to build statistical models that require metadata about the nifti file but not the nifti file itself (to save space/time).
Or maybe a better workaround is to create new git repos with only header data, but it would be cool to be able to only fetch/sniff a certain amount of bytes from a file in git-annex/datalad.
Thoughts/ideas?
James
GitHub - datalad/datalad-fuse: DataLad extension to provide FUSE file system access is the WiP toward this. It relies on fsspec for actual “sparse cached access” , and uses http* urls for the annexed files. If you are to use programmatically (e.g. to populate that repo of headers if so very much desired) could use FsspecAdapter as here: datalad-fuse/fsspec_head.py at 0063f7b0310151ca868bc64489ee6452a10753bc · datalad/datalad-fuse · GitHub to get an open file
instance you can read
from etc. For a turnkey, use datalad fusefs
, e.g.:
/tmp > datalad install ///openneuro/ds000001
[INFO ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with:
| datalad siblings -d "/tmp/ds000001" enable -s s3-PRIVATE
install(ok): /tmp/ds000001 (dataset)
/tmp > mkdir ds000001-mounted
exit:1 /tmp > datalad fusefs -d ds000001 --foreground ds000001-mounted &
[1] 104174
/tmp > du -scm ds000001
2 ds000001
2 total
/tmp > nib-ls ds000001-mounted/sub-01/func/sub-01_task-balloonanalogrisktask_run-0*nii.gz
ds000001-mounted/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz int16 [ 64, 64, 33, 300] 3.12x3.12x4.00x2.00
ds000001-mounted/sub-01/func/sub-01_task-balloonanalogrisktask_run-02_bold.nii.gz int16 [ 64, 64, 33, 300] 3.12x3.12x4.00x2.00
ds000001-mounted/sub-01/func/sub-01_task-balloonanalogrisktask_run-03_bold.nii.gz int16 [ 64, 64, 33, 300] 3.12x3.12x4.00x2.00
/tmp > du -scm ds000001
17 ds000001
17 total
/tmp > du -scm --apparent-size ds000001
137 ds000001
137 total
so – there are now some “sparse” files for over 130MBs if downloaded in full, but it fetched only 15MB or so (IIRC default block size about 5MB, and there are 3 files) to get those headers for nib-ls. Data is cached under ds000001/.git/datalad/cache/fsspec
.
Hope this helps
1 Like