Hi Richard and others,
allow me one additional remark that may be crucial for those considering to invest in new cards. Although the Fermi class cards make use of the same architecture (Geforce GTX 480 and Tesla C2050 for example), for consumer products (GTX 400 series), double precision performance has been limited to a quarter of that of the "full" Fermi architecture (Tesla C20xx). Error checking and correcting memory (ECC) is also disabled on consumer cards. I don't really know how important double precision is for the CUDA enabled Freesurfer tools, but this could mean you have to buy four GTX cards to catch up with the performance of one Tesla card.
Cheers, Georg
-----Ursprüngliche Nachricht----- Von: freesurfer-bounces@nmr.mgh.harvard.edu [mailto:freesurfer-bounces@nmr.mgh.harvard.edu] Im Auftrag von Richard G. Edgar Gesendet: Dienstag, 24. August 2010 15:38 An: freesurfer@nmr.mgh.harvard.edu Betreff: [Freesurfer] Notes on CUDA Acceleration
Greetings,
I've been asked to provide some extra information about GPU support in Freesurfer (being the one guilty of mri_em_register_cuda...).
Firstly, there are no immediate plans for OpenCL support. It would be very nice to have - with ATI, NVIDIA _and_ x86 multicore backends. However, it's far less mature than CUDA. The good news is that the really 'hard' bit is restructuring the algorithms to fit well on a GPU. The syntax of CUDA and OpenCL is very similar (strange that....), but OpenCL is more verbose.
As for cards..... for what is in the current release, any GeForce GTX-200 series or Tesla 10 series (i.e. C1060 and S1070) card should work (I don't know the Quadro model numbers - CUDA architecture 1.3 is the key feature). I think that everything should actually work on somewhat older cards, but the compile flags will have to be tweaked. So long as that threshold is reached, the only issue is the amount of RAM needed. Currently, I expect that any card with at least 1 GiB of RAM will have plenty, and the threshold for mri_em_register_cuda will be much lower than that.
Going forward, I would strongly recommend purchasing 'Fermi' class cards. These are the GTX 400 series, and Tesla 20 series. The new architecture lifts some hardware limits on GPU kernels which are crippling for portions of mri_ca_register. With a more accelerated mri_ca_register, RAM limits may also come into play, until I can come up with a suitably cunning GPU implementation of the Gaussian Classifier Array (right now, I'm going to burn around 2 GiB on a single GCA, to make implementation simple). However, I have bigger fish to fry first.
One final thing: Nick and I found last week that the accelerated mri_em_register_cuda doesn't seem to work prior to skull stripping. I'm going to work on this this week, but if you want to continue using the GPU accelerated binary, you'll have to turn off the FAST_TRANSLATION and FAST_TRANSFORM flags in mri_em_register.c, and recompile. This will increase the runtime to around 4 minutes on ernie, but will give results identical to the CPU code.
I hope this is helpful,
Richard
_______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.