Thanks for this amazing tool which I am just learning to use and appreciate!
The handbook is awesome, but I keep on getting problems when wrapping even very simple bash commands in a datalad run command (on an Ubuntu 20.04.3 installed server).
Here is an example of a docker run command for mriqc, but it also happens with a simple mkdir command for example, generating a similar error
/bin/sh: 1: docker run -it --rm -v /data/proj_discoverie/BIDS:/data:ro -v /data/proj_discoverie/derivatives/mriqc:/out poldracklab/mriqc:0.16.1 /data /out participant --participant_label KUL007 --verbose --verbose-reports --fd_thres 0.9: not found
Adding bash or /bin/bash before the command does not help either, so any suggestion for a solution would be appreciated!
Please share the complete invocation of the actual datalad command - hard to tell from the above.
Note that for working with containers while adhering to the best practices toward reproducible research (YODA principles etc), ideally you should use datalad-container extension. You could find examples in the handbook or repronim/containers.
Thanks for the prompt response!
Here are a few examples of the entire datalad run commands, the first one with a custom script and the second one with mriqc run in a docker container on my system (not included in the dataset though).
I do not think the issue has anything to do with the container issue, as it fails for simple bash scripts too.
The dry-run option works great though!
Any idea how to solve this?
Thx,
Lukas
datalad run -m “run KUL_dcm2bids.sh on sub-KUL007” --input “sourcedata/sub-KUL007/DICOM” --output “BIDS/*” “KUL_dcm2bids.sh -d sourcedata/sub-KUL007/DICOM -p KUL007 -c study_config/sequences.txt -a -v”
datalad run -m “run mriqc on sub-KUL007” -i “rawdata/sub-KUL007/” -o “derivatives/mriqc/” “docker run -it --rm -v /data/proj_discoverie/BIDS:/data:ro -v /data/proj_discoverie/derivatives/mriqc:/out poldracklab/mriqc:0.16.1 /data /out participant --participant_label KUL007 --verbose --verbose-reports --fd_thres 0.9”
The strange thing is that the exact same command runs perfectly fine when not wrapped in a datalad run command, and this also applies to fmriprep-docker, docker run, and even simple bash commands like mkdir (see error messages in my earlier message).
The only difference with your situation is that the KUL_dcm2bids.sh script is located outside of the superdataset (or any subdatasets) in my case, but is in my path, so that should not be the problem I guess?
Or do scripts need to be in the dataset itself for them to be run with datalad run?
Interestingly, when trying to run the entire datalad run command again, I now get an error that is similar to what I get when I try to run the simple bash commands of fmriprep or mriqc commands wrapped in datalad run:
/bin/sh: 1: KUL_dcm2bids.sh -d sourcedata/sub-KUL005 -p KUL005 -c study_config/sequences.txt -v: not found
But also it looks like datalad tries to pass that entire invocation as a command name, which is odd. That is why want to see denture invocation and version of datalad
PATH is /home/luna.kuleuven.be/u0027997/.local/bin:/home/luna.kuleuven.be/u0027997/gitkraken:/opt/weasis/bin:/usr/local/freesurfer/bin:/usr/local/freesurfer/fsfast/bin:/usr/local/freesurfer/tktools:/usr/local/fsl/bin:/usr/local/freesurfer/mni/bin:/opt/KUL_apps/KUL_FWT:/opt/KUL_apps/KUL_FWT:/opt/KUL_apps/KUL_VBG:/opt/KUL_apps/KUL_NeuroImaging_Tools:/opt/KUL_apps/dcm2niix/build/bin:/opt/KUL_apps/ANTs_installed/bin/:/opt/mrtrix3/bin:/opt/anaconda3/bin:/opt/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/puppetlabs/bin
The script is in a cloned github repo with path /opt/KUL_apps/KUL_NeuroImaging_Tools
Here is the entire invocation and output
u0027997@gbw-s-labgas01:/data/proj_discoverie$ datalad run -m “run KUL_dcm2bids.sh on sub-KUL005” --input “sourcedata/sub-KUL005/" --output "BIDS/sub-KUL005/” “KUL_dcm2bids.sh -d sourcedata/sub-KUL005 -p KUL005 -c study_config/sequences.txt -v” --dry-run
[INFO ] Making sure inputs are available (this may take some time)
[INFO ] == Command start (output follows) =====
/bin/sh: 1: KUL_dcm2bids.sh -d sourcedata/sub-KUL005 -p KUL005 -c study_config/sequences.txt -v: not found
[INFO ] == Command exit (modification check follows) =====
[INFO ] The command had a non-zero exit code. If this is expected, you can save the changes with ‘datalad save -d . -r -F .git/COMMIT_EDITMSG’
CommandError: ‘‘KUL_dcm2bids.sh -d sourcedata/sub-KUL005 -p KUL005 -c study_config/sequences.txt -v’ --dry-run’ failed with exitcode 127 under /data/proj_discoverie
Having that --dry-run at the end, makes datalad assume you have that long thing the command and --dry-run to be is argument. Move --dry-run to be eg right after run
depends on how you installed. If we pip – pip install --upgrade datalad. if conda – just wait a bit (update ongoing) or conda uninstall datalad && pip install datalad
Upgrading datalad to 0.15.0 using pip worked fine, enabling the dry-run option.
When I run without dry-run now, I get an error about an outdated git-annex version
u0027997@gbw-s-labgas01:/data/proj_discoverie$ datalad run -m "run KUL_dcm2bids.sh on sub-KUL005" -i "sourcedata/sub-KUL005" -o "BIDS/*" "KUL_dcm2bids.sh sourcedata/sub-KUL005 -p KUL005 -c study_config/sequences.txt -v"
[INFO ] Making sure inputs are available (this may take some time) [ERROR ] OutdatedExternalDependency(No working git-annex installation of version >= 8.20200309. Visit http://handbook.datalad.org/r.html?install for instructions on how to install DataLad and git-annex… You have version 8.20200226) (OutdatedExternalDependency)
How shall I upgrade this?
Does not seem to work using pip or sudo apt-get upgrade or update, nor git annex upgrade in the repo? I also adapted git config based on upgrades
Reading package lists… Done Building dependency tree Reading state information… Done git-annex is already the newest version (8.20200226-1). git-annex set to manually installed. The following packages were automatically installed and are no longer required:
libllvm11 libpython2-stdlib linux-headers-5.4.0-80 linux-headers-5.4.0-80-generic linux-image-5.4.0-80-generic linux-modules-5.4.0-80-generic linux-modules-extra-5.4.0-80-generic python2 python2-minimal shim* Use ‘sudo apt autoremove’ to remove them. 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
The output reads “git-annex set to manually installed”, not sure whether this is a problem for datalad or any other applications using git-annex?
As far as I understand it, the problem seems to be that the version of git-annex that is installed on my Ubuntu 20.04.3 system while installing datalad (probably just using sudo apt-get install datalad as per these instructions from the handbook, rather than using pip) is not the most recent version, and that apt-get fails to update to more recent versions?
Uninstalling datalad and git-annex and then installing using the datalad installer (using -m pip for datalad, and -m datalad/git-annex:tested) did the trick, resulting in up-to-date versions of both datalad and git-annex.
A new problem arose, however, when wrapping fmriprep-docker into a datalad run command.
fmriprep runs fine, but the datalad save at the end bumps into permission denied errors when trying to add the large output files (.nii.gz) to the git-annex repo. Permissions for the respective dirs look fine, and the outputdir derivatives/fmriprep was correctly specified in the -o option “derivatives/fmriprep/*”.
I have this problem with my previous installation when running fmriprep-docker as a standalone command, followed by datalad save. Running the datalad save command as sudo solved the issue, but sudo datalad does not work anymore now, probably because of pip installation.
Which installation methods would you recommend to get both the most recent versions of datalad and git-annex, while at the same time being able to solve the above issue using sudo? Or do you have another workaround for this issue?
You are on Linux, so why not to use singularity container, even if built from docker (IIRC with datalad-container if you container-add docker://... it will import it into singularity. Also note that we have already prepared singularity containers for all bids apps within https://github.com/ReproNim/containers/ which I have mentioned before.
PS never use sudo unless you really need to perform some admin tasks! now you need to do something like sudo chown -R $USER.$USER DATASET_PATH to bring it to hopefully sane state in terms of permissions.
I am trying to set up a workflow for this dataset, so simply started over again.
For now, I want to get things up and running without containerizing, i.e. with datalad run rather than container-run. The problem below does not depend on this, because I get the same problem when running fmriprep-docker as a standalone command (which works fine) and then trying to datalad save.
Things work fine for mriqc, but for fmriprep, I keep on running into the same issue: fmriprep runs fine, but things go awry when datalad tries to save - see full error below.
I tried several options including creating my fmriprep output directory as a subdataset (and even datalad get after that) before running fmriprep (from the superdataset), not specifying it before and letting fmriprep create it, but in every scenario, I keep on getting the error that the large files cannot be added (to git-annex).
Any idea what the problem may be and how I could solve it?
Thanks a lot again!
Lukas
Here is my command (scenario where I did not create the fmriprep output dir as a subdataset before)
so – check owner/permissions on those files and directories they are under?
I bet since you are using Docker (although you said that without containerizing, you are using fmriprep-docker which I guess uses docker) – they might be owed by root, and hence those errors. That is why I suggested to use singularity instead so it runs in your user namespace. not familiar with fmriprep-docker enough to advice on either it does proper UID/GID mapping etc. But again – check permissions first!