Is there anyway to use cuda 11 or later for FSL's eddy_cuda?

gerardyu · November 5, 2024, 4:37am

I’m using FSL’s eddy_cuda within mrtrix3’s dwifslpreproc on an HPC. It would seem that the latest version of cuda supported by FSL 6.0.6 is that of 10.2. Unfortunately for me, the oldest version of the cuda available on this HPC is 11.7. My sysadmin said cuda10.2 is somewhat ‘ancient’ and will not work on the HPC.

i’m aware this How to setup CUDA 10.2, 11.0, and 11.5 in order to use eddy_cuda10.2 (in FSL 6.0.5.x), PyTorch, and Tensorflow 2 here but i do not have sudo rights on the HPC and the HPC is running on Rocky Linux 8.7 instead of ubuntu

paulmccarthy · November 5, 2024, 9:59am

Hi @gerardyu, recent versions of eddy (and the other FSL CUDA tools) should run just fine on any modern GPU with an up to date GPU driver. Have you tried running eddy on your system?

gerardyu · November 5, 2024, 1:50pm

i wasn’t able to run eddy_cuda10.2 after loading the cuda11.7 module on the HPC

i got the following error

eddy_cuda10.2: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

paulmccarthy · November 5, 2024, 1:57pm

@gerardyu - that suggests that the system you are using doesn’t have a CUDA driver installed. I would get in touch with your HPC team to see if they can help you.

A quick way of testing whether your system is configured correctly for CUDA applications is to try running nvidia-smi. If nvidia-smi doesn’t work, then you either don’t have a GPU, or the CUDA driver is not correctly installed.

gerardyu · November 5, 2024, 5:57pm

So i actually found the libcuda.so file located in /usr/local/cuda-11.7/lib64/stubs . I created a sym link to my home directory and export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$HOME and ran eddy_cudy again

i got the following error

Reading images
Performing volume-to-volume registration
Running Register
EDDY::EddyCudaHelperFunctions::InitGpu: cudaGetDevice returned an error: cudaError_t = 35, cudaErrorName = cudaErrorInsufficientDriver, cudaErrorString = CUDA driver version is insufficient for CUDA runtime version
EDDY::cuda/EddyCudaHelperFunctions.cu:::  static void EDDY::EddyCudaHelperFunctions::InitGpu(bool):  Exception thrown
EDDY::cuda/EddyGpuUtils.cu:::  static std::shared_ptr<EDDY::DWIPredictionMaker> EDDY::EddyGpuUtils::LoadPredictionMaker(const EDDY::EddyCommandLineOptions&, EDDY::ScanType, const EDDY::ECScanManager&, unsigned int, float, NEWIMAGE::volume<float>&, bool):  Exception thrown
EDDY::eddy.cpp:::  EDDY::ReplacementManager* EDDY::Register(const EDDY::EddyCommandLineOptions&, EDDY::ScanType, unsigned int, const std::vector<float, std::allocator<float> >&, EDDY::SecondLevelECModelType, bool, EDDY::ECScanManager&, EDDY::ReplacementManager*, NEWMAT::Matrix&, NEWMAT::Matrix&):  Exception thrown
EDDY::: Eddy failed with message EDDY::eddy.cpp:::  EDDY::ReplacementManager* EDDY::DoVolumeToVolumeRegistration(const EDDY::EddyCommandLineOptions&, EDDY::ECScanManager&):  Exception thrown

is libcuda.so the correct file that i should be sym linking?

paulmccarthy · November 6, 2024, 9:20am

No, you should not need to do this on a correctly configured system. Have you tried running nvidia-smi?

Are you sure that you are working on a system with a GPU?

edit I just tested this on my laptop (which does not have a GPU) - I received the same error. So I strongly suspect that you are trying to run eddy on a system that does not have a GPU. I would recommend getting in touch with your IT team and asking them how to run GPU applications on your cluster.

stebo85 · November 9, 2024, 11:40am

Dear @gerardyu,

This is a scenario where www.Neurodesk.org could help We package the right cuda versions inside the containers so your cluster doesn’t need to have them installed. (Our cluster has exactly the same problem).

This is how you could run it on your HPC if you have apptainer/singularity available:
First check that you have a GPU available:

nvidia-smi

This should look like this:

Note that the GPU driver version is quite backwards compatible with older CUDA versions, but I had cases where a very old CUDA version (from memory CUDA7) didn’t run anymore with the new GPU driver version that our A100s needed to work.

Then load the neurodesk mrtrix3 container and pass the GPU into the singularity/apptainer command using --nv:

curl -X GET https://neurocontainers.neurodesk.org/mrtrix3_3.0.4_20240320.simg -O
singularity shell --nv mrtrix3_3.0.4_20240320.simg

then you can run dwifslpreproc or a very nice little test program developed by Chris Rorden to make sure it’s all working first:

git clone https://github.com/neurolabusc/gpu_test.git
cd gpu_test/etest/
bash runme_gpu.sh

I hope this helps