Adding data from S3 into my dataset (and making it accessible to others)

Hi all,

I’m trying to add fmriprep data to an existing dataset in a fork here: GitHub - jdkent/ds000001 at add/fmriprep

The fmriprep bucket is publically available: s3://fmriprep-openneuro/ds000001/fmriprep-20.0.0rc1/fmriprep

and I was able to add the data to my dataset with the command

`datalad addurls --fast fmriprep_files2.txt ‘{link}’ ‘derivatives/{directory}’

However if you try to clone and get files from the repo linked above you cannot get the files


git clone
cd ds000001/
git checkout add/fmriprep 
datalad get derivatives/fmriprep/sub-16.html


get(error): derivatives/fmriprep/sub-16.html (file) [not available; (Note that these git remotes have annex-ignore set: origin)]

How can I make the data available to others that clone the repository?


TL;DR: I think you forgot to push your local git-annex branch to your fork on github, so you didn’t share the git-annex availability information you have added using datalad addurls . You can git push origin git-annex or just datalad push -s origin (which would push git-annex branch as well)

Longer version on why i think it is the case:

$> git shortlog -sn origin/git-annex
    70  Git Worker
     2  OpenNeuro Importer
     1  DataLad Importer

says that it was only the Git Worker of openneuro which had information added to git-annex branch, and I think your computer is not listed among

$> git annex info
trusted repositories: 0
semitrusted repositories: 6
	00000000-0000-0000-0000-000000000001 -- web
 	00000000-0000-0000-0000-000000000002 -- bittorrent
 	1b4b718e-91d9-4da9-9b80-02a2d1bb9363 -- s3-PRIVATE
 	8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 -- [s3-PUBLIC]
 	b54f20e4-904b-42a9-9d4b-6ca93fed3ea5 -- yoh@lena:/tmp/ds000001 [here]
 	b5dd2e3d-825f-4bc2-b719-cba1059f6bfc -- root@93184394ac19:/datalad/ds000001

BTW , there is a datalad-crawler extension which could be used and avoid manual curation of .txt file to feed addurls with

1 Like