Dear Datalad community,
Hi, I am building a workflow around Datalad and RIA store.
As I am dealing with a dataset with lots of subdatasets, I reasoned that setting RIA store in the lab NAS would be a viable option.
To efficiently clone the dataset, and use across my personal laptop and workstation,
I added the superdataset to GitHub, hoping that cloning would be easier.
The goal of this workflow is
- cloning superdataset from GitHub
- download data from RIA as ORA
- upload updates to RIA (& github)
However, I am not sure how to reliably add RIA to all the subdatasets.
I guess manually add RIA as new sibling to GitHub-cloned dataset is the way like the example command below:
datalad create-sibling-ria -s ria-store -r --existing reconfigure ria+ssh://internal-nas/path/to/ria-store
Without the option --existing reconfigure
, an error occurs
a sibling 'ria-store-storage' is already configured in dataset
So, my first question is whether it is safe to ignore the error message and add RIA store as a sibling to the dataset cloned from github (where the data is already in RIA store).
Second, if I try to clone GitHub repository where I do not have access to our NAS (e.g., outside of the institution), the following error occurs:
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] https://github.com/username/repository.git/config download failed: Not Found
[INFO ] ssh: connect to host internal-nas: Operation timed out
[INFO ] RIA store unavailable. -caused by- Failed to access ssh://internal-nas/archive/ria-store/ria-layout-version -caused by- ConnectionOpenFailedError: 'ssh -fN -o ControlMaster=auto -o ControlPersist=15m -o ControlPath=/home/mshin/.cache/datalad/sockets/28b3b1f3 internal-nas' failed with exitcode 255 [Failed to open SSH connection (could not start ControlMaster process)]
[INFO ] Reset branch 'main' to 42b1e562 (from ccfea0e6) to avoid a detached HEAD
install(ok): /archive/project-internal/fmri (dataset) [Installed subdataset in order to get /archive/project-internal/fmri]
If I want to download data again when I have access to NAS, what should I do?
Sorry for a bit messy questions.
I am a bit confused right now, but hope to build a robust workflow soon!
Minho