With some help from AI, I have ported the FSL tools that require NVidia CUDA to MacOS Metal. Specifically, the tools mmorf, eddy, bedpost, and probtrackx are able to leverage the Apple GPU. The description is here and a notarized installer is here. The installer requires you have previously installed the recent version of FSL on your computer and requires an Apple Silicon GPU. I also created a benchmark that shows the benefit.
Be aware that the results are equivalent but not identical. Apple’s Metal only supports float32, and some of the FSL CUDA tools use float64 precision, albeit these are generally for probabilistic processes. General changes are described in the optimization section. While I did try to create many tests and evaluated both speed, memory usage and ensured sanitized behavior, it is possible there are edge cases that will have catastrophic failures. So use with caution. For my own usage, this is a nice way to refine and pilot pipelines on my laptop before running large datasets on a Linux cluster.
I would only suggest running the quick benchmark if you want to compare the stock FSL on MacOS to the Metal version - many CPU tools take an excessive time to run the full benchmark.
In my testing, with these optimizations, a Mac M5 Max completes the benchmark in 894 seconds, while a Linux Threadripper with NVidia 4090 RTX using the stock FSL requires 2159 seconds.