Run fmriprep on 7 participants simultaneously

Hello!
i m running fmriprep from docker with command:

    docker run --cpuset-cpus="10-39" --rm -v /home/yarkin/BIDS_LA5study:/bids -v 
    /home/yarkin/fmriprep_LA5study:/out -v 
    /home/yarkin/credentials/license.txt:/opt/freesurfer/license.txt poldracklab/fmriprep:latest 
    /bids /out participant --fs-no-reconall --force-bbr --participant_label {60001..60008}

I want to make preproccesing for 265 patients, since fmriprep is mem-consuming proccess, iam trying to run for 7 patients simultaneously, manualy setting --participant_label flag (counting 7 patients and set a range) - it’s a bit annoing to run command everytime for next 7 participant, manualy changing --partcipants_label parameter.
Is there any ways to solve this issue? with fmriprep flags? or with bash command, with xargs?
Would be very helpfull! thanks

How about:

ls -d sub-*/ \
    | sed -e 's/.*sub-\(.*\)\//\1/' \
    | split -l 7 - participants_

Then you can run

docker run ... --participant-label $(cat participants_aa)
docker run ... --participant-label $(cat participants_ab)
docker run ... --participant-label $(cat participants_ac)

Thanks @effigies, but i mean under that, not to call docker run everytime as previous patients last with fmriprep. As an idea: flag needed to limit number of patients processing simultaneiusly, if you call a group fmriprep without --participant-label

Hmm. That certainly can’t be achieved with the built-in nipype scheduler. But depending on whether parallel is available within the Docker container, you could run:

docker run ... --entrypoint=bash $IMAGE \
    "for LABEL in {A..B}; do
         echo fmriprep ... --participant-label $LABEL
     done | parallel -j 7"

You’ll obviously want to adjust your mem_gb and num-procs options to account for other processes consuming resources, which is going to hurt when some of them are only using one core, and another could safely use the other cores.

A more ambitious alternative would be to augment nipype with a smarter ordering algorithm (e.g., make the topological sort linearize one fully connected sub-graph before starting another), but that would help utilize resources much better.

2 Likes

To jump in with some other ideas: Do you have access to a cluster where you can run these jobs ? There, you could follow @effigies suggestion and just submit each subset of participants as a separate job !

Running them all on one machine is a bit more constrained-- I think the idea of executing them sequentially is probably the easiest. If you wanted to automate this sequential execution, it might be worth having a executing script that uses “wait,” see this example !

I use the qbatch utility (https://github.com/pipitone/qbatch), written in python to do this.

There an example script here ( https://github.com/edickie/bids-on-scinet/blob/master/examples/qbatch_fmriprep1.1.2_anat_p08.sh)

1 Like

Hey, we’re running into a similar issue on a workstation with 96 CPUs and about 4 GBs of RAM per CPU, and I was hoping to get some quick clarification on how FMRIPrep handles resource allocation and submissions when we omit the participant_label option. In that case, would FMRIPrep attempt to run all subjects and tasks that live in the BIDS directory, ignoring any information about my system?

If I were to use the --n_cpus argument and set it to 96, would it then limit FMRIPrep to 96 processes? It seems like FMRIPrep generally uses about one CPU per run of fMRI data, so another alternative for use might be to just submit in sequential batches of 48 subjects (2 runs each).

1 Like

Hi @dvsmith, I think we should probably write up a FAQ about exactly how fMRIPrep scheduling works (by default, anyway).

The short answer is: --n-cpus will default to the number of CPUs on your system, and --mem-gb will default to 90% of your memory, to make an allowance for OS consumption or underestimates of fMRIPrep component usage. Component jobs, which are tagged as using some number of processors and some quantity of memory) are run if three conditions are met: 1) all pre-requisite jobs have been run (i.e. all data needed to run this step exists); 2) enough memory is available; 3) enough cores are available. The longer answer has some caveats.

As to CPU consumption per run, it really doesn’t work like that. It might average to that, but many steps are parallelized within subjects. For example, when applying transformations to a BOLD series, each volume gets an independent process, run in parallel as cores are available.

As to advice, our general suggestion is to run each subject in an independent process, but I’ve actually never tried running with 96 cores and 384GB of RAM… You can definitely run multiple subjects simultaneously, but whether you want to run 6 separate processes or one process at a time with six simultaneous subjects is up to you. The end result will be a different effective scheduler, but I can’t say which will be better. (I’m assuming here that you might want to commit 16 cores per subject, which is about the limit before it stops giving you any improvements, but it might be more efficient to assume 8 cores per subject and run 12 at a time.)

You can also try your 48 subjects at a time. I don’t really know how the scheduler will perform under those conditions. If you’re feeling up to it, it might be interested to take advantage of a 96-subject dataset to try several different strategies for running subsets and comparing their performance.

Thanks, @effigies – that’s very helpful to know! We’re doing this on Amazon Web Services, and I’m not sure how to make computing cluster. But, with our 96-core machine with plenty of RAM, I was able to run about 147 subjects (two functional runs) in a couple of days. This number is probably misleading though since the anatomical data had already been processed for these subjects and remained in the /scratch directory so it doesn’t look like FMRIPrep re-did any of that (though I could be missing something).

Also, in case it’s helpful for others, it looks like a couple of programs within FSL run much more smoothly if you set OPENBLAS_NUM_THREADS=1 on the NITRC-CE on AWS. We were experiencing some massive slow-downs. See below for more information.

JISCMail - FSL Archive - Re: film_gls and flameo running very slowly on Amazon Web Services (https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=FSL;4f8a92e.1902)

2 Likes