Overview
I am having some trouble downloading data from openneuro using datalad. I am using python (when I can) to do so.
I am trying to download the stroop data on openneuro
and here is the respective github repository
Datalad Version
import datalad
datalad.__version__
'0.11.1'
Git-Annex Version
git-annex version: 7.20181121+git58-gbc4aa3f0e-1~ndall+1
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.0.0 ghc-8.4.3 http-client-0.5.13.1 persistent-sqlite-2.8.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
operating system: linux x86_64
supported repository versions: 5 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 5
git version
git version 2.7.4
Questions/Problems
- How should I properly “get” the git-annex tracked files?
- How should I download extra data (e.g. fmriprep results) to a datalad repository?
How should I properly “get” the git-annex tracked files?
Here is my code:
from datalad.api import install
import tempfile
import os
from subprocess import call
data_dir = tempfile.mkdtemp()
dataset = install(data_dir, "///openneuro/ds000164")
dataset.get("sub-001/func/")
Everything runs fine until the last line, where I get this stderr:
[INFO] access to dataset sibling "s3-PRIVATE" not auto-enabled, enable with:
| datalad siblings -d "/tmp/tmpyq6w5m21" enable -s s3-PRIVATE
[WARNING] Running get resulted in stderr output: Set both AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to use S3
git-annex: get: 1 failed
[ERROR] from s3-PUBLIC...; Unable to access these remotes: s3-PUBLIC; Try making some of these repositories available:; 2f45a4ca-eba9-46da-98f2-5ca487a87a67 -- [s3-PUBLIC]; c40c41af-4d97-418c-a33d-ffb7e596b0c7 -- root@cf3d3f9acfa2:/datalad/ds000164 [get(/tmp/tmpyq6w5m21/sub-001/func/sub-001_task-stroop_bold.nii.gz)]
[WARNING] could not get some content in /tmp/tmpyq6w5m21/sub-001/func ['/tmp/tmpyq6w5m21/sub-001/func/sub-001_task-stroop_bold.nii.gz'] [get(/tmp/tmpyq6w5m21/sub-001/func)]
and this python traceback
---------------------------------------------------------------------------
IncompleteResultsError Traceback (most recent call last)
<ipython-input-13-22015468e7a7> in <module>
1 data_dir = tempfile.mkdtemp()
2 dataset = install(data_dir, "///openneuro/ds000164")
----> 3 dataset.get("sub-001/func/")
~/.conda/envs/nibetaseries/lib/python3.6/site-packages/datalad/distribution/dataset.py in apply_func(wrapped, instance, args, kwargs)
492 elif i >= ds_index:
493 kwargs[orig_pos[i+1]] = args[i]
--> 494 return f(**kwargs)
495
496 setattr(Dataset, name, apply_func(f))
~/.conda/envs/nibetaseries/lib/python3.6/site-packages/datalad/interface/utils.py in eval_func(wrapped, instance, args, kwargs)
477 return results
478
--> 479 return return_func(generator_func)(*args, **kwargs)
480
481 return eval_func(func)
~/.conda/envs/nibetaseries/lib/python3.6/site-packages/datalad/interface/utils.py in return_func(wrapped_, instance_, args_, kwargs_)
465 # unwind generator if there is one, this actually runs
466 # any processing
--> 467 results = list(results)
468 # render summaries
469 if not result_xfm and result_renderer == 'tailored':
~/.conda/envs/nibetaseries/lib/python3.6/site-packages/datalad/interface/utils.py in generator_func(*_args, **_kwargs)
453 raise IncompleteResultsError(
454 failed=incomplete_results,
--> 455 msg="Command did not complete successfully")
456
457 if return_type == 'generator':
IncompleteResultsError: Command did not complete successfully [{'type': 'file', 'refds': '/tmp/tmpyq6w5m21', 'status': 'error', 'path': '/tmp/tmpyq6w5m21/sub-001/func/sub-001_task-stroop_bold.nii.gz', 'action': 'get', 'annexkey': 'MD5E-s50382260--2c571457278c2fcd07016f50abc07f79.nii.gz', 'message': 'from s3-PUBLIC...; Unable to access these remotes: s3-PUBLIC; Try making some of these repositories available:; \t2f45a4ca-eba9-46da-98f2-5ca487a87a67 -- [s3-PUBLIC]; \tc40c41af-4d97-418c-a33d-ffb7e596b0c7 -- root@cf3d3f9acfa2:/datalad/ds000164'}, {'action': 'get', 'path': '/tmp/tmpyq6w5m21/sub-001/func', 'type': 'directory', 'refds': '/tmp/tmpyq6w5m21', 'status': 'impossible', 'message': ('could not get some content in %s %s', '/tmp/tmpyq6w5m21/sub-001/func', ['/tmp/tmpyq6w5m21/sub-001/func/sub-001_task-stroop_bold.nii.gz'])}]
So I worked around that issue by downloading from openfmri instead
from datalad.api import install
import tempfile
import os
from subprocess import call
data_dir = tempfile.mkdtemp()
dataset = install(data_dir, "///openfmri/ds000164")
dataset.get("sub-001/func/")
which works (yay!), but I’m curious if there’s something I should change to make it work on openneuro as well.
How should I download extra data (e.g. fmriprep results) to a datalad repository?
From the docs it looks like I can download extra data using the download_url
method attached to my variable dataset
, but I had to fall back on using the awscli
to download the data.
fmriprep_res = "s3://openneuro.outputs/921294bd5b869b1852ab3ce886583795/4dd151e3-52d1-4fa2-9591-27c16520331c"
try:
# currently not working
dataset.download_url(fmriprep_res)
except:
# depends on user having awscli installed: https://pypi.org/project/awscli/
call(['aws',
'--no-sign-request',
's3',
'sync',
fmriprep_res,
os.path.join(data_dir, 'derivatives')
])
I got the following error message for dataset.download_url(fmriprep_res)
:
[INFO] Downloading 's3://openneuro.outputs/921294bd5b869b1852ab3ce886583795/4dd151e3-52d1-4fa2-9591-27c16520331c' into '/tmp/tmp28sdodg0'
[INFO] S3 session: Connecting to the bucket openneuro.outputs anonymously
Anonymous access to s3://openneuro.outputs/921294bd5b869b1852ab3ce886583795/4dd151e3-52d1-4fa2-9591-27c16520331c has failed.
Do you want to enter other credentials in case they were updated? (choices: yes, no): no
then whichever I choose (yes or no), it fails, however an anonymous download via aws
appears to succeed.
I’m curious if I have an incorrect version of git-annex or some settings are not correct on my end, but before testing on other environments I thought I would ask the community to see if anyone else has had these types of problems.
Thanks!
James