Nipype FSL FEAT taking very long (Apple Silicone)

I am trying to implement this first- and second-level task-based fMRI analysis after successfully running fMRIprep on my dataset: Analysis of task-based functional MRI data preprocessed with fMRIPrep | Nature Protocols
I have modified the script to adapt it to relevant contrasts for my task and have adapted the DerivativesDataSink nodes as they caused an error because the parameters couldn’t build a bids-compatible file name.

I am using a 2023 Macbook Pro M3 Max with 36GB of RAM.
I initially used the docker container as described in the publication and have seen even slower speeds. When running the script locally without docker using:

python run.py \
    $BIDSDIR \
    $BIDSDIR/derivatives/task-analysis \
    participant \
    --task XX \
    --space MNI152NLin2009cAsym \
    --bids-dir $BIDSDIR \
    --work-dir $BIDSDIR/working_dir

The feat_fit step is faster, but still takes a very long time to run (8 hours plus for first-level analysis when running two subjects at the same time).

In the plugin_settings in run.py, I changed n_procs to 2 since I would keep getting memory errors otherwise. When checking the activity monitor, I see two film_gls processes using ~690% CPU and 30GB memory and a total CPU and memory usage close to 95%.
Is there any reason the first-level analysis could be this slow? What can I do to troublehoot and potentially speed up this analysis?

Hi @jonasb you may be affected by a change in the behaviour of OpenBLAS which causes it to use more threads, and to run slower, than it should. Can you try setting OMP_NUM_THREADS=1 as an environment variable before running your command?

This should be resolved in the latest version of FSL (6.0.7.16) - in the code, we call openblas_set_num_threads(1), but in recent versions of OpenBLAS this is not enough, and one must also call omp_set_num_threads(1).

Thank you so much for the quick reply! I am running fsl 6.0.7.15. I saw a similar issue described here and have tried setting OPENBLAS_NUM_THREADS=1 (which had no real noticeable effect in terms of speed for film_gls but did seem to allow me to set n_procs to 2 and process two subjects at once without crashing) but not OMP_NUM_THREADS yet. I am upgrading FSL to 6.0.7.16 and am setting both environment variables to be safe and will report back whether that improved things.

Best,
Jonas

Reporting back: That solved my problem! Thank you so much for your help!