if you add the --merge option to the update call you should be back in business. Update alone just gives you updated info on file availability and remote branches. In order to get new datasets onto your filesystem the current local branch needs to have such updates merged.
Now, I don’t know how to take it back to normal without having to reinstall all subdatasets. At this moment, the superdataset ///openfmri is following master, at the latest commit 738714cf9daf789d4ea47b46c071498d2144ba51 that seems to be also the commit for my clean, working datalad repo. I manually made git submodule sync and git submodule update. Still, I’m getting those GitCommandError.
That solution worked for all datasets under openfmri, except for ds000030.
I deleted and reinstalled the dataset to make sure it was clean. Still getting:
$ datalad get -r -J 8 sub-10159/anat/sub-10159_T1w.json
[ERROR ] Try making some of these repositories available:
| 00000000-0000-0000-0000-000000000001 -- web
| 09ede57e-5ec2-484b-b6fb-8a632e5c7a4e -- [datalad-archives]
| 41f07c30-3cfc-4de3-9fbc-84383f5156e6 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000030
| [get(/oak/stanford/groups/russpold/data/openfmri/ds000030/sub-10159/anat/sub-10159_T1w.json)]
get(error): /oak/stanford/groups/russpold/data/openfmri/ds000030/sub-10159/anat/sub-10159_T1w.json (file) [Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
09ede57e-5ec2-484b-b6fb-8a632e5c7a4e -- [datalad-archives]
41f07c30-3cfc-4de3-9fbc-84383f5156e6 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000030
]
This seems to replicate in all datalad repos I have.
Do you observe it in any other dataset as well, or only in ds000030?
this one is “tricky” since got populated with a VERY heavy (in # of files) derivative and we didn’t adjust the pipeline yet to modularize those away into subdatasets. Feels that for this one we should just ignore the derivative(s) for now and update at least main URLs to account for the recreated openneuro bucket. But I would like to know first if any other dataset is also problematic?
I’ve seen it with some other datasets, but after several retries, I think only ds000030 is still failing. How can I get rid of the derivatives folder (in datalad language)?
I’ll look all the logs and confirm that no other dataset is still failing.
For ds000030 I did datalad update -r . --merge. Then checked the remotes: it seemed to me incorrect, so I manually set a remote following the example of ds000001 (which works well).
I’ve then git fetch origin, git checkout master, git pull, git rm -r derivatives/ and finally datalad get sub-10159/anat/sub-10159_T1w.nii.gz and still:
[ERROR ] Try making some of these repositories available:
| 00000000-0000-0000-0000-000000000001 -- web
| 09ede57e-5ec2-484b-b6fb-8a632e5c7a4e -- [datalad-archives]
| 41f07c30-3cfc-4de3-9fbc-84383f5156e6 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000030
| [get(/oak/stanford/groups/russpold/data/openfmri/ds000030/sub-10159/anat/sub-10159_T1w.nii.gz)]
get(error): /oak/stanford/groups/russpold/data/openfmri/ds000030/sub-10159/anat/sub-10159_T1w.nii.gz (file) [Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
09ede57e-5ec2-484b-b6fb-8a632e5c7a4e -- [datalad-archives]
41f07c30-3cfc-4de3-9fbc-84383f5156e6 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/openfmri/ds000030
]
For that repo/file we have apparently only “datalad-archives” as the source (not realy sure why it includes “web” as an available for it remote. will check with joey):
but then all those archives seems to be no longer available from the (versioned) URLs where they used to be available:
$> git annex whereis --key MD5E-s4802398120--ce2d215f336e6dfa282d69cc35beb80d.tgz
whereis MD5E-s4802398120--ce2d215f336e6dfa282d69cc35beb80d.tgz (1 copy)
00000000-0000-0000-0000-000000000001 -- web
web: http://openfmri.s3.amazonaws.com/tarballs/ds000030_R1.0.1_sub10150-10299.tgz?versionId=X3sfPmNxugxTtoez935C.PteHH40Dbtc
ok
$> datalad ls -aL s3://openfmri/tarballs/ds030_R1.0.0_10150-
Connecting to bucket: openfmri
[INFO ] S3 session: Connecting to the bucket openfmri
Bucket info:
Versioning: S3ResponseError: 403 Forbidden
Website: S3ResponseError: 403 Forbidden
ACL: S3ResponseError: 403 Forbidden
tarballs/ds030_R1.0.0_10150-10274.tgz 2017-11-18T20:44:17.000Z 3920586194 ver:null acl:AccessDenied http://openfmri.s3.amazonaws.com/tarballs/ds030_R1.0.0_10150-10274.tgz?versionId=null [OK]
so only a non-versioned one is now available :-/ I vaguely remember openfmri bucket going through some migration, so I guess we still have got only some stale URLs for this one.
Let me now finally crawl the extracted version while excluding derivatives, so at least we would get direct links to those files… hopefully would be done in a day or so
and now it seems to be doing smth (once again – this dataset is quite heavy in # of files so it might as well be just getting all the s3 keys, you could run it instead with -l debug to possibly get more feedback… but it might be just silent for a while boto interacts with s3. process is already at 4GB RAM consumption). The idea is that I would be able to crawl straight into this new branch, populate git-annex/ (branch) with all the information about availability of those files, and then possibly even remove it (the incoming-s3-openneuro-noderivatives) entirely so not to keep those additional heavy tree object(s) around in .git… Here we go - started to download the tarballs! hopefully it would, as prescribed, ignore all the derivatives (time will show)