Maintain super the dataset structure with subdatasets when publishing into github

Hoda_RJ · October 26, 2020, 11:13am

Hi,

I have a super dataset that includes two subdatasets and I want to publish it into a github repo with google drive dependency.
How can I create one github repo with the same structure as my super dataset to publish the subdatasets? (one main folder and two subfolders)
If I create one github repo for the superdataset I am getting an error for subdtasets while trying to publish. If I use datalad create-sibling-github with -r flag, I am getting more than one repo for it which is not preferred.

adina · November 9, 2020, 6:50am

Hi @Hoda_RJ,
If I’m understanding your problem correctly, I’m afraid what you would want to do isn’t possible - every dataset (i.e., your superdataset and each of the two subdatasets) is a single repository, you can’t publish three datasets into only one GitHub repo. If I may ask, why is getting more than one repo not preferred?

Hoda_RJ · November 10, 2020, 3:07pm

Hi @adina,
I get the concept now. So every dataset can be published separately since it is independent regardless of being/having a subdataset. therefore, in my case, I will turn the subdatasets into normal contents to keep the structure on github.
Thanks for the explanation.

adina · November 10, 2020, 3:33pm

Hey @Hoda_RJ, great that it helped. Just for the sake of completeness, in case you are not aware: When publishing a hierarchy of datasets, the structure is identical (just split between repositories). Checkout https://github.com/psychoinformatics-de/paper-remodnav/ as an example. In there, you’ll find a subdataset (“remodnav”). Clicking on that directory takes you to the respective repository, and if you clone the superdataset, you can datalad get the subdatasets just like any other file.

Hoda_RJ · November 12, 2020, 9:15pm

Thanks for the additional info.