Dear Datalad team,
Since GIN is imposing quite strict limits to repo size (50 GB, typically too small for large neuroimaging datasets), I am looking for a convenient alternative to push my large datalad datasets with multiple subdatasets to.
Can you recommend any alternatives?
Thanks very much in advance!
Best wishes,
Lukas
what kind of dataset (modalities, bids? has raw bids published?….)?
Hi @yarikoptic ,
Thanks for the prompt response!
Typically, we have large multi-modal brain imaging datasets (any combination of fMRI, structural MRI, PET, diffusion, MRS, ASL).
Structure is as follows
Superdataset
- sourcedata (not pushed/published)
- BIDS (only contains BIDS-converted .nii.gz data)
- derivatives (fmriprep preproc for fMRI, etc)
- code
- pipeline (containers with fmriprep & mriqc versions typically)
- firstlevel
- secondlevel
Size in total typically a few TBs. Pushing to GIN is very handy as it allows us to use datalad drop locally which prevents us from needing tons of storage on our local server, and of course sharing whatever super/subdataset we like with our papers. Therefore, I would like to try to keep this kind of functionality.
Thanks!
Lukas
As a heads up, there are some tricks to getting multi-TB datasets uploaded to OpenNeuro. See Difficulties uploading large dataset · Issue #3683 · OpenNeuroOrg/openneuro · GitHub
Thanks @yarikoptic @psadil
Can it be configured as a remote for superdataset with subdatasets within datalad like GIN (for which there is a nice workflow in the handbook which I am currently using)?
If not, it would be great to develop one maybe, and I would be happy to help testing!
Best wishes,
Lukas
are you talking about “study” or “derivative” type of a dataset? study – not “allowed” on openneuro – can just push to github / gin/ whatnot. “derivative” like fmriprep, with subdataset(s) for raw - yes, that is what that demo one I pointed to is (there are some URL gotchas there… but that is a separate question
)
Hi Yarik,
Here is an example of what we currently have on GIN: labgas/proj_erythritol_4b: Superdataset for work package 4 of the FWO-SNSF erythritol research project - G-Node GIN
It would be nice to be able to keep such structure, but it seems like there is no alternative, so we could push derivatives (which is typically the largest subdataset) to openneuro instead?
GIN will impose a limit of 50GB per repo and 1TB in total, severly limiting its usecases…
Thanks,
Lukas