Issue getting data with datalad on HPC

Hello, I am having an issue downloading the ds000030 dataset with datalad on Alliance Canada. It installs OK, but when I check the files they are symbolic links. It’s possibly a problem with git-annex? I also tried doing clone and then get for a different dataset, and that also failed to download the actual files, but did not have any error messages.

I’m using the following script:

#!/bin/bash

OPENNEURO_PATH="/lustre04/scratch/${USER}/openneuro"
DATASET="ds000030"

mkdir -p ${OPENNEURO_PATH}
echo "Get ${DATASET}"
cd ${OPENNEURO_PATH}
datalad install https://github.com/OpenNeuroDatasets/${DATASET}.git
cd /lustre04/scratch/${USER}/openneuro/ds000030
datalad get -r .

Output:

Get ds000030
[INFO   ] scanning for annexed files (this may take some time)                                                                
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore                                                         
[INFO   ] https://github.com/OpenNeuroDatasets/ds000030.git/config download failed: Not Found Preformatted text
[INFO   ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with:
| 		datalad siblings -d "/lustre04/scratch/nclarke/openneuro/ds000030" enable -s s3-PRIVATE 
install(ok): /lustre04/scratch/nclarke/openneuro/ds000030 (dataset)
[INFO   ] Ensuring presence of Dataset(/lustre04/scratch/nclarke/openneuro/ds000030) to get /lustre04/scratch/nclarke/openneuro/ds000030
get(ok): sub-10159/anat/sub-10159_T1w.nii.gz (file) [from s3-PUBLIC...]                                                       
get(ok): sub-10159/dwi/sub-10159_dwi.nii.gz (file) [from s3-PUBLIC...]                                                        
get(ok): sub-10159/func/sub-10159_task-bart_bold.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10159/func/sub-10159_task-rest_bold.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10159/func/sub-10159_task-scap_bold.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10159/func/sub-10159_task-stopsignal_bold.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10159/func/sub-10159_task-taskswitch_bold.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10171/anat/sub-10171_T1w.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10171/dwi/sub-10171_dwi.nii.gz (file) [from s3-PUBLIC...]
get(ok): sub-10171/func/sub-10171_task-bart_bold.nii.gz (file) [from s3-PUBLIC...]
  [3040 similar messages have been suppressed; disable with datalad.ui.suppress-similar-results=off]
action summary:
  get (ok: 3050)

Datalad wtf:

(venv_datalad) [nclarke@beluga2 ~]$ datalad wtf
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## credentials 
  - keyring: 
    - active_backends: 
      - PlaintextKeyring with no encyption v.1.0 at /home/nclarke/.local/share/python_keyring/keyring_pass.cfg
    - config_file: /home/nclarke/.config/python_keyring/keyringrc.cfg
    - data_root: /home/nclarke/.local/share/python_keyring
## datalad 
  - version: 0.18.3
## dependencies 
  - annexremote: 1.6.0
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 8.20210804-g7961c5a98
  - cmd:bundled-git: 2.33.0
  - cmd:git: 2.33.0
  - cmd:ssh: 7.4p1
  - cmd:system-git: 2.31.6
  - cmd:system-ssh: 7.4p1
  - humanize: 4.6.0
  - iso8601: 1.1.0+computecanada
  - keyring: 23.13.1
  - keyrings.alt: 4.2.0
  - msgpack: 1.0.4
  - platformdirs: 3.2.0
  - requests: 2.28.2
## environment 
  - LANG: en_GB.UTF-8
  - PATH: /home/nclarke/venv_datalad/bin:/lustre03/project/def-pbellec/share/bin:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/mii/1.1.2/bin:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Compiler/intel2020/openmpi/4.0.3/bin:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/libfabric/1.10.1/bin:/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/ucx/1.8.0/bin:/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/compilers_and_libraries_2020.1.217/linux/bin/intel64:/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/gcccore/9.3.0/bin:/cvmfs/soft.computecanada.ca/easybuild/bin:/cvmfs/soft.computecanada.ca/custom/bin:/cvmfs/soft.computecanada.ca/gentoo/2020/usr/sbin:/cvmfs/soft.computecanada.ca/gentoo/2020/usr/bin:/cvmfs/soft.computecanada.ca/gentoo/2020/sbin:/cvmfs/soft.computecanada.ca/gentoo/2020/bin:/cvmfs/soft.computecanada.ca/custom/bin/computecanada:/opt/software/slurm/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/project/rrg-pbellec/cedar_old_data/NIAK/bin:/home/nclarke/.local/bin:/home/nclarke/bin
  - PYTHONPATH: /cvmfs/soft.computecanada.ca/custom/python/site-packages
## extensions 
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
    - S3
    - WebDAV
  - dependency versions: 
    - aws-0.22
    - bloomfilter-2.0.1.0
    - cryptonite-0.26
    - DAV-1.3.4
    - feed-1.3.0.1
    - ghc-8.8.4
    - http-client-0.6.4.1
    - persistent-sqlite-2.10.6.2
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.1.0
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
    - X*
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - httpalso
    - borg
    - hook
    - external
  - supported repository versions: 
    - 8
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
  - version: 8.20210804-g7961c5a98
## location 
  - path: /home/nclarke
  - type: directory
## python 
  - implementation: CPython
  - version: 3.10.2
## system 
  - distribution: centos/7/Core
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - filesystem: 
    - CWD: 
      - path: /home/nclarke
    - HOME: 
      - path: /home/nclarke
    - TMP: 
      - path: /tmp
  - max_path_length: 269
  - name: Linux
  - release: 3.10.0-1160.88.1.el7.x86_64
  - type: posix
  - version: #1 SMP Tue Mar 7 15:41:52 UTC 2023

I’d really appreciate any help, thank you!

Maybe I missed something but as far as I can tell you are describing the expected behavior of datalad and git annex.

I warmly recommend reading the first few chapters of the datalad handbook of you want to save your future self some headaches.

Thank you very much for the reply! I was getting an error when trying to do something with the files that made me think they were not fully downloaded but I must have done something else wrong. Will try again.

If you are tying to modify some files in a datalad dataset in many cases you have to “unlock” it first. But this will depend on the configuration of your dataset.

1 Like