Nipype script hanging when ran on cluster

rob · May 5, 2020, 8:49am

I am having an issue whereby I can successfully run a nipype script when running it directly from the command line e.g.

bash script.sh

But when I submit it to a cluster:

qsub -t 1 queue.script

It hangs indefinitely, i.e. it starts running and stays running for hours (the job should only take a few minutes), not producing any error message

queue.script looks like this:

#!/bin/tcsh
#$ -m n
#$ -o /data/project/logs/segment_o$TASK_ID
#$ -e /data/project/logs/segment_e$TASK_ID
#$ -q global
#$ -N segment

module purge
module load sge
module load matlab
module load spm

set list=‘/data/subjects.list'

# Need this to set the conda environment in tcsh
setenv CONDA_ENV_PATH /home/.conda/envs/virtual_env
setenv CONDA_DEFAULT_ENV virtual_env
setenv PATH /home/.conda/envs/virtual_env/bin:${PATH}

set file="`awk 'FNR==$SGE_TASK_ID' ${list}`"
bash script.sh ${file}

The script this calls: script.sh:

python /path/to/segment.py

And the nipype script, segment.py:

import nipype.interfaces.spm as spm
import nipype.interfaces.matlab as mlab
spm_path = "/software/system/spm/spm-12-20181114"
mlab.MatlabCommand.set_default_paths(spm_path)

seg = spm.NewSegment()
seg.inputs.channel_files = 'anat.nii'
tissue1 = ((f'{spm_path}/tpm/TPM.nii', 1), 2, (True,True), (False, False))
tissue2 = ((f'{spm_path}/tpm/TPM.nii', 2), 2, (True,True), (False, False))
tissue3 = ((f'{spm_path}/tpm/TPM.nii', 3), 2, (True,False), (False, False))
tissue4 = ((f'{spm_path}/tpm/TPM.nii', 4), 2, (False,False), (False, False))
tissue5 = ((f'{spm_path}/tpm/TPM.nii', 5), 2, (False,False), (False, False))
seg.inputs.tissues = [tissue1, tissue2, tissue3, tissue4, tissue5]
seg.run()

If instead of running NewSegment in the python script, I run e.g coregister it runs successfuly both when called directly and when ran via the cluster

coreg = spm.Coregister()
coreg.inputs.target = "c1anat.nii"
coreg.inputs.source = "cbf.nii"
coreg.inputs.apply_to_files = ["pd.nii"]#, "asl.nii"]
coreg.inputs.out_prefix = 'r'
coreg.run()

Does anyone have any idea what I am doing wrong?

Thanks!

Rob