Fmriprep SGE yaml file

jdkent · June 27, 2017, 4:08pm

Hi I have a couple questions on how to use the SGE plugin for fmriprep

My best guess for how the yaml file should be formatted is fhe following:

plugin: 'SGE'
plugin_args:
    template: 'mytemplate.sh'
    qsub_args: '-q myqueue'

Does this look right?
Additionally, how is the parallelization carried out? Is it done at the level of the node, workflow, or subject? I ask because I want to be sure to specify the correct amount of memory to allocate per job submitted.

effigies · June 27, 2017, 4:26pm

So, this is more a Nipype question than FMRIPREP, per se; I have no experience using the SGE plugin (or any plugins apart from Linear and MultiProc), but I would expect the parallelism to be at the Node level. That looks like reasonable YAML, but I’d try loading it to make sure it generates the right Python dictionary to be sure.

When we run FMRIPREP on clusters, we tend to use MultiProc and submit jobs on a per-subject basis.

As far as specifying the amount of memory per-job, it’s really going to vary widely, but I assume there’s some reasonable default (e.g. 1GB + 1 CPU per Node). This may be a situation where we need to systematically add memory estimates to the metadata of many nodes before SLURM or SGE plugins can allocate resources efficiently. (Again, I don’t really know, here.)

Hopefully someone else can give a more helpful for using Nipype with SGE. (@mgxd perhaps?)

jdkent · June 27, 2017, 5:10pm

Thanks @effigies!

I loaded the yaml in python and got the following result:
{'plugin_args': {'template': 'mytemplate.sh', 'qsub_args': '-q myqueue'}, 'plugin': 'SGE'}

However, the plugin page from nipype documentation says the workflow.run command should be formatted like this:
workflow.run(plugin=PLUGIN_NAME, plugin_args=ARGS_DICT)

Is the variable assignment in the run command taken care of by **kwargs?

I apologize if that’s a simple python question, I’m still getting comfortable with the language

mgxd · June 27, 2017, 5:12pm

@jdkent After trying the yaml approach, I’ve found it’s much more simple to use the MultiProc plugin, and just submit the job with your resource manager (as @effigies said). This allows for subject-level parallelization, and it’s quite easy to keep track of. I’ve attached the outline of a script I call to run this - per subject. My cluster used SLURM, but you should be able to convert this to SGE fairly easy

#!/bin/bash

# args
BASE=/project
WORKDIR=$BASE/workdir
DATADIR=$BASE/data
OUTDIR=$DATADIR/derivatives/fmriprep

# go to data directory and grab array of all subjects
#data=$base/data
#pushd $data
#subjs=$(ls sub-* -d -1 | tr '\n' ' ')
#popd

# or just pick a handful
subjs=(sub-21 sub-26 sub-06)

# run each subject individually
for subj in ${subjs[@]}
do
sbatch -t 14:00:00 --mem 10GB fmriprep $DATADIR $OUTDIR participant --participant_label $subj --nthreads 10 --mem_mb 10000 --no-freesurfer --ignore slicetiming -w $WORKDIR
done

jdkent · June 27, 2017, 5:21pm

Awesome, thanks @mgxd, that’s similar to how I have it set up on our university’s cluster. So I’ll continue using this type of method if it’s what you guys are using.

#!/bin/sh

#$ -pe smp 10
#$ -q UI
#$ -m bea
#$ -M james-kent@uiowa.edu
#$ -o /Users/jdkent/logs/out/
#$ -e /Users/jdkent/logs/err/
OMP_NUM_THREADS=8
singularity run -H /Users/jdkent/singularity_home \
/Users/jdkent/singularity_containers/poldracklab_fmriprep_latest-2017-05-21-3fb35154f7e2.img \
~/ds102_R2.0.0 ~/ds102_R2.0.0/derivatives/fmriprep \
participant --participant_label SUBJECT \
-w ~/ds102_R2.0.0/derivatives/fmriprep/work \
--write-graph --mem_mb 30000 --omp-nthreads 8 --nthreads 8