Optimal parallelisation parameters


I am planning to mindboggle a number of MRI scans on a high performance cluster. For this, I would like to optimise performance via parallelisation. One of my questions is how performance will scale with larger numbers of CPUs (I can request and pay for up to 32 per node). I run mindboggle via Singularity.

Second, I am currently running a small test with 8 CPUs, using this setup:
–plugin MultiProc --plugin_args “dict(n_procs=8)” --fs_openmp 8 --ants_num_threads 8 --mb_num_threads 8

I could not find too many details about these parameters, however. Is it the case in parallel mode that ‘fs’ and ‘ants’ run in parallel (which is why the example on the webpage ‘only’ requested 5 for both), whereas ‘mb’ runs by itself (hence the requested 10)? Finally, what does the MultiProc parameter have to do with all of this?

Thank you very much for your help, any pointers are highly appreciated.

#edit# Relatedly, I noticed that ants ends sooner than freesurfer. Would it harm performance if I requested the whole number of CPUs for both, ants and fs, so that the latter can harness all CPUs once ants is finished?