Experiences with GPU-assisted recon-all - Freesurfer

3 Mar 2012


      Dear SPM community,
Freesurfer supports GPU acceleration since version 5.0. To assess the utility of this functionality, particularly the gain in performance, I performed `recon-all` with and without the option `-use-gpu`. I'd like to share the result and also hope to get some answers to the question that came up. The Video card used was a GeForce GTX 460 and the CPU was an Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz.
1) cudadetect did not recognize the video card
Detecting CUDA... *** No CUDA enabled device(s) detected! ***
2) mri_em_register_cuda called without parameters gave following output
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2010 NVIDIA Corporation
Built on Thu_Nov__4_12:44:17_PDT_2010
Cuda compilation tools, release 3.2, V0.2.1221
Driver : 3.20
Runtime : 3.20
Acquiring CUDA device
Using default device 
CUDA device: GeForce GTX 460
...
3) Running mri_em_register_cuda was about 4 times faster than mri_em_register. During execution of mri_em_register_cuda, the GPU load was at 20% as indicated by `nvidia-smi -a`
4) Running `recon-all` with `-use-gpu` option took 6:49 hours and without the option it took about 20 Minutes longer. A summary of the processing steps taken from `recon-all-status.log` is attached to this email. Note, that these times represent wall clock times, not CPU times.
Hopefully this information will be helpful. The processes "EM Registration" and "CA Normalize" were significantly accelerated by the GPU, as expected. Also "SubCort Seg" ran much faster. "Surf Reg rh" ran 3.4 times faster with GPU acceleration while the accelerated version of "Surf Reg lh" was slower than the regular. What binary was used in these steps? What could be the reason for the difference between hemispheres (I assume lh/rh means left/right hemisphere)? Bug? Problem with the data?
According to `-recon-all`, following binaries support GPU acceleration mri_em_register, mri_ca_register, mris_inflate and mris_sphere. Cold someone point out what binaries correspond to the 64 steps reported in the "scripts/recon-all-status.log"? This would enable to identify the steps that have GPU-accelerated variants.
The GPU performance was at 20% while CUDA binaries were executed. What could be the reason for this, i.e. why not 100%? Is this even expected on the GeForce GTX 460? What are the limitations when running multiple GPU-accelerated instances of `recon-all`? How many GPU memory does a single instance take at max?
What could be the reasons that `cudadetect` failed to detect any CUDA devices, yet CUDA binaries worked as expected?
Best regards,
Ahmed Abdulkadir
--
Master Student, Life Sciences, Semester 4
Medical Image Processing (MIP) Lab 
École Polytechnique Fédérale de Lausanne
Student Assistant
Functional Brain Imaging (FBI)
Department of Neurology
University Medical Center Freiburg
Breisacher Str. 64, D-79106 Freiburg
--