We are considering building a server optimized to run fmriprep in docker instances. Our data is rather large 8 bold runs with 600 volumes each and we were hoping to be able to run multiple instances of fmriprep at once. Right now we are looking at a machine with 20 core + 20 hyperthreads and 64 GB of RAM, would this be a decent setup or should we put more towards the RAM and less towards the CPU?
I think in this case you’ll be more RAM-bound than CPU-bound, but it’s not too far off of what I would think of as a reasonable balance. As a rule, an individual subject can rarely utilize more than 16 cores efficiently, and 32 GB of RAM is usually safe, so a 1:2 ratio of threads to GB is my rule of thumb. For 4800 BOLD volumes, you might need more RAM, but it’s hard to say without actually trying it.
One thing you’ll have going for you, as opposed to running on a cluster, is the ability to set your own memory overcommit policy. A lot of memory issues arise when a process is killed for having too large an address space, even though it never expands to fill it, because cluster administrators tend to be conservative here.
Also, we’d be happy to work with you to figure out the bottlenecks and better annotate high-RAM nodes.