Updating #datalad datasets

sure – noone forbids :wink:
What I did now (don’t know yet if “correct”) is

git co --orphan incoming-s3-openneuro-noderivatives
git reset --hard
datalad crawl-init --save --template=simple_s3 bucket=openneuro to_http=1 prefix=ds000030 exclude=derivatives
datalad crawl

and now it seems to be doing smth (once again – this dataset is quite heavy in # of files so it might as well be just getting all the s3 keys, you could run it instead with -l debug to possibly get more feedback… but it might be just silent for a while boto interacts with s3. process is already at 4GB RAM consumption). The idea is that I would be able to crawl straight into this new branch, populate git-annex/ (branch) with all the information about availability of those files, and then possibly even remove it (the incoming-s3-openneuro-noderivatives) entirely so not to keep those additional heavy tree object(s) around in .git… Here we go - started to download the tarballs! hopefully it would, as prescribed, ignore all the derivatives (time will show) :wink:

P.S. Note that ATM

  • it would require S3 credentials. anonymous access is in PR https://github.com/datalad/datalad/pull/2708
  • you would need to have datalad-crawler extension if you use datalad >= 0.10 (prior versions include crawler within)