I can’t seem to figure out why I’m getting this insufficient GPU memory error despite having 18-20 GB free and I’m hoping someone here can help.
After running the run_fastsurfer.sh (with flags --seg_only --no_cereb --3T --no_biasfield --batch_size 1) this is the error:
[CRITICAL: common.py: 160]: ERROR - INSUFFICIENT GPU MEMORY
[INFO: common.py: 161]: The memory requirements exceeds the available GPU memory, try using a smaller batch size (--batch_size <int>) and/or view aggregation on the cpu (--viewagg_device 'cpu').Note: View Aggregation on the GPU is particularly memory-hungry at approx. 5 GB for standard 256x256x256 images.
[INFO: common.py: 168]: Using GPU 0; 19.99 GiB total capacity; 21.57 MiB already allocated; 18.74 GiB free; 22.00 MiB reserved in total by PyTorch.
- I am running fastsurfer on Ubuntu 22.04.3 via WSL2 on Windows 11.
- I have 2 NVIDIA RTX A4500 GPUs installed each with 20 GB of memory
- Per the NVIDIA website, I installed the dedicated WSL driver on the Windows side (version 551.61 if anyone is interested)
- On the linux side I installed the latest CUDA toolkit specific for WSL, which does not contain any of the drivers. This is because, per NVIDIA website the CUDA driver installed on Windows host will be stubbed inside the WSL 2 as
libcuda.so
, therefore users must not install any NVIDIA GPU Linux driver within WSL 2. I did have to update the path to nvidia-smi which was in a non-standard location (/usr/lib/wsl/lib/nvidia-smi) - Watching gpu resources live during script execution shows virtually no GPU usage.
- Other scripts that call on the GPU run without issue (HD-BET for skull stripping)
- The output from nivdia-smi seems to suggest plenty of gpu memory for the task
Here is the full output from the terminal:
Setting ENV variable FASTSURFER_HOME to script directory /home/wmccuddy/FastSurfer.
Change via environment to location of your choice if this is undesired (export FASTSURFER_HOME=/dir/to/FastSurfer)
Version: 2.2.0+a000faa
Wed Mar 6 20:40:35 MST 2024
python3.10 /home/wmccuddy/FastSurfer/FastSurferCNN/run_prediction.py --t1 /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/T1w_AX_T1_3D_TFE_WAND_20240221144656_201.nii --asegdkt_segfile /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/aparc.DKTatlas+aseg.deep.mgz --conformed_name /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/conformed.mgz --brainmask_name /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg//001/mri/mask.mgz --aseg_name /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg//001/mri/aseg.auto_noCCseg.mgz --sid 001 --seg_log /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg//001/scripts/deep-seg.log --vox_size min --batch_size 1 --viewagg_device cuda --device cuda:0
[INFO: run_prediction.py: 546]: Checking or downloading default checkpoints ...
[INFO: common.py: 111]: Using device: cuda:0
[INFO: common.py: 111]: Using viewagg_device: cuda
[INFO: run_prediction.py: 234]: Running view aggregation on cuda
[INFO: inference.py: 200]: Loading checkpoint /home/wmccuddy/FastSurfer/checkpoints/aparc_vinn_coronal_v2.0.0.pkl
[INFO: inference.py: 200]: Loading checkpoint /home/wmccuddy/FastSurfer/checkpoints/aparc_vinn_sagittal_v2.0.0.pkl
[INFO: inference.py: 200]: Loading checkpoint /home/wmccuddy/FastSurfer/checkpoints/aparc_vinn_axial_v2.0.0.pkl
[INFO: common.py: 820]: Single subject with absolute file path for input.
[INFO: common.py: 843]: No subjects directory specified, but the parent directory of the output file /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/aparc.DKTatlas+aseg.deep.mgz is the subject id, so we are assuming this is the subject directory.
[INFO: common.py: 875]: Analyzing single subject /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/T1w_AX_T1_3D_TFE_WAND_20240221144656_201.nii
[INFO: common.py: 970]: Output will be stored in Subjects Directory: /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg
[INFO: run_prediction.py: 310]: Successfully loaded image from /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/T1w_AX_T1_3D_TFE_WAND_20240221144656_201.nii.
[INFO: run_prediction.py: 430]: Successfully saved image as /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/mri/orig/001.mgz.
[INFO: run_prediction.py: 323]: Conforming image
Input: min: 0.0 max: 453300.7880859375
rescale: min: 0.0 max: 142789.7482470703 scale: 0.0017858424931093187
Output: min: 0.0 max: 255.0
[INFO: run_prediction.py: 430]: Successfully saved image as /home/BNI_AdvClinicalImaging_Linux/BrainTumor/derivatives/pipeline_1/brain_seg/001/conformed.mgz.
[CRITICAL: common.py: 160]: ERROR - INSUFFICIENT GPU MEMORY
[INFO: common.py: 161]: The memory requirements exceeds the available GPU memory, try using a smaller batch size (--batch_size <int>) and/or view aggregation on the cpu (--viewagg_device 'cpu').Note: View Aggregation on the GPU is particularly memory-hungry at approx. 5 GB for standard 256x256x256 images.
[INFO: common.py: 168]: Using GPU 0; 19.99 GiB total capacity; 21.57 MiB already allocated; 18.74 GiB free; 22.00 MiB reserved in total by PyTorch.
Any assistatnce would be greatly appreciated!