Hi Yarik,
I found a workflow which is a bit convoluted with a repo on GIN for every subdataset and the superdatasets, which results in browsable repos on GIN, with downloadable files, but on Github
Moreover, cloning from either of both and then datalad getting does not work (except for non-annexed content), since the GIN remotes seem to be set to annex-ignore after cloning (not in my original dataset which was pushed to GIN).
Could you please have a look at the errors and the workflow below?
I feel like I am close to making it work fully, but not there yet with the cloning!
L
u0027997@gbw-s-labgas01:/data/datalad_test$ datalad clone https://gin.g-node.org/labgas/proj_discoverie
Clone attempt: 0%| | 0.00/2.00 [00:00<?, ? Candidate locations/s]Username for 'https://gin.g-node.org': lukasvo76
Password for 'https://lukasvo76@gin.g-node.org':
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://gin.g-node.org/labgas/proj_discoverie/config download failed: Not Found
install(ok): /data/datalad_test/proj_discoverie (dataset)
Here are the siblings for the cloned dataset
u0027997@gbw-s-labgas01:/data/datalad_test/proj_discoverie$ datalad siblings
.: here(+) [git]
.: origin(-) [https://gin.g-node.org/labgas/proj_discoverie (git)]
.: osf-annex-storage(+) [osf]
.: osf-export-storage(+) [osf]
u0027997@gbw-s-labgas01:/data/datalad_test/proj_discoverie$ git config --unset-all remote.gin.annex-ignore
u0027997@gbw-s-labgas01:/data/datalad_test/proj_discoverie$ git config --unset-all remote.origin.annex-ignore
u0027997@gbw-s-labgas01:/data/datalad_test/proj_discoverie$ datalad siblings
.: here(+) [git]
.: osf-annex-storage(+) [osf]
[WARNING] Could not detect whether origin carries an annex. If origin is a pure Git remote, this is expected. Remote was marked by annex as annex-ignore. Edit .git/config to reset if you think that was done by mistake due to absent connection etc
.: origin(-) [https://gin.g-node.org/labgas/proj_discoverie (git)]
.: osf-export-storage(+) [osf]
compared to the original dataset
u0027997@gbw-s-labgas01:/data/proj_discoverie$ datalad siblings -r
.: here(+) [git]
.: github(-) [https://github.com/labgas/proj_discoverie.git (git)]
.: gin(+) [https://gin.g-node.org/labgas/proj_discoverie (git)]
BIDS: here(+) [git]
BIDS: gin(+) [https://gin.g-node.org/labgas/proj_discoverie_BIDS (git)]
code: here(+) [git]
code: github(-) [https://github.com/labgas/proj_discoverie_code.git (git)]
code: gin(+) [https://gin.g-node.org/labgas/proj_discoverie_code (git)]
derivatives: here(+) [git]
derivatives: gin(+) [https://gin.g-node.org/labgas/proj_discoverie_derivatives (git)]
mriqc: here(+) [git]
mriqc: gin(+) [https://gin.g-node.org/labgas/proj_discoverie_mriqc (git)]
pipeline: here(+) [git]
pipeline: gin(+) [https://gin.g-node.org/labgas/proj_discoverie_pipeline (git)]
pipeline: datalad(+) [datalad]
sourcedata: here(+) [git]
Hence datalad getting fails
u0027997@gbw-s-labgas01:/data/datalad_test/proj_discoverie$ datalad get BIDS
Clone attempt: 0%| | 0.00/4.00 [00:00<?, ? Candidate locations/s]Username for 'https://gin.g-node.org': lukasvo76
Password for 'https://lukasvo76@gin.g-node.org':
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://gin.g-node.org/labgas/proj_discoverie_BIDS/config download failed: Not Found
Username for 'https://gin.g-node.org': lukasvo76
Password for 'https://lukasvo76@gin.g-node.org':
install(ok): /data/datalad_test/proj_discoverie/BIDS (dataset) [Installed subdataset in order to get /data/datalad_test/proj_discoverie/BIDS]
get(error): BIDS/sub-KUL004/anat/sub-KUL004_T1w.nii.gz (file) [Remote gin-common-bids not usable by git-annex; setting annex-ignore
https://gin.g-node.org/labgas/proj_discoverie_BIDS/config download failed: Not Found]
get(error): BIDS/sub-KUL004/fmap/sub-KUL004_run-01_magnitude.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL004/fmap/sub-KUL004_run-02_magnitude.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL004/func/sub-KUL004_task-MIST_run-01_bold.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL004/func/sub-KUL004_task-MIST_run-02_bold.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL004/func/sub-KUL004_task-MIST_run-03_bold.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL004/func/sub-KUL004_task-MIST_run-04_bold.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL004/func/sub-KUL004_task-rest_bold.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL005/anat/sub-KUL005_T1w.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
get(error): BIDS/sub-KUL005/fmap/sub-KUL005_fieldmap.nii.gz (file) [not available; (Note that these git remotes have annex-ignore set: gin-common-bids origin)]
[31 similar messages have been suppressed; disable with datalad.ui.suppress-similar-results=off]
action summary:
get (error: 41)
install (ok: 1)
L
Publish your dataset on GIN and/or Github
NOTE: this is work in progress- see this issue I opened on Neurostars, and this rapidly evolving section of the Datalad handbook, particularly this walkthrough
GIN
For now, this somewhat convoluted workflow works best
-
Add a GIN “superrepo” as a sibling (and common data source) to your superdataset
For now, only the manual workflow works - at least I experienced authentication problems with the datalad create-sibling-gin command used in the automated workflow - working on this!
After creating your empty superrepo on GIN, you can run
datalad siblings add -d . --name gin --pushurl git@gin.g-node.org:/labgas/proj_discoverie.git --url https://gin.g-node.org/labgas/proj_discoverie --as-common-datasrc gin-common
Then make sure that annex is supported for this sibling by running (probably not needed, but does not harm)
git config --unset-all remote.gin.annex-ignore
-
Add a GIN “subrepo” as a sibling (and common data sourece) for each subdataset
NOTE: we do NOT do this for the sourcedata subdataset, since we do not want it to be “datalad gettable”, even not for people with access to the private GIN repo!
Essentially, repeat the process above in a slightly simplified way for each subdataset - this is what I mean by convoluted above
After creating your empty subrepo on GIN, you can run from your subdataset
datalad siblings add -d . --name gin --pushurl git@gin.g-node.org:/labgas/proj_discoverie_code.git --url https://gin.g-node.org/labgas/proj_discoverie_code --as-common-datasrc gin-common-code
Then make sure that annex is supported for this sibling by running
git config --unset-all remote.gin.annex-ignore (probably not needed, but does not harm)
-
Add the url of the subrepos for each of the corresponding subdatasets in your superdataset
Run the following command from your superdataset
datalad subdatasets --contains code
–set-property url https://gin.g-node.org/labgas/proj_discoverie_code
-
Push recursively from your superdataset to GIN
datalad push --to gin -r
NOTE: no worries about the error about the sourcedata subdataset, we did not create a GIN sibling for it on purpose!
-
Clone the entire superdataset wherever you like
datalad clone https://gin.g-node.org/labgas/proj_discoverie
If you want a subdataset with annexed files downloaded to your computer, you should
datalad get BIDS
Github
NOTE: Github does not support large files nor annexed content, so it is less convenient than GIN, but it is more widely known so we want our dataset and particularly the code subdataset available on Github as well, preferably in a clonable way (through a link to the common data source on GIN).
-
Add a Github “superrepo” as a sibling to your superdataset
Like for GIN, currently only the manual approach works in my hands, so create an empty repo on Github first, and then run the following command from your superdataset
datalad siblings add -d . --name github --url https://github.com/labgas/proj_discoverie.git
-
Add a Github “subrepo” as a sibling to your code subdataset
Create and empty repo on Github first, and then run the following command from your code subdataset
datalad siblings add -d . --name github --url https://github.com/labgas/proj_discoverie_code.git
-
Push recursively from your superdataset to Github
datalad push --to github -r
NOTE: no worries about the errors for most subdatasets, we did not create a Github sibling for them on purpose, since they are all on GIN anyway, and do no want them to be public prior to publication - private repos on Github or not free, contrary to GIN (pocket money for Bill Gates)