On Thu, 2010-08-26 at 23:23 +0200, Georg Homola wrote:
allow me one additional remark that may be crucial for those considering to invest in new cards. Although the Fermi class cards make use of the same architecture (Geforce GTX 480 and Tesla C2050 for example), for consumer products (GTX 400 series), double precision performance has been limited to a quarter of that of the "full" Fermi architecture (Tesla C20xx). Error checking and correcting memory (ECC) is also disabled on consumer cards. I don't really know how important double precision is for the CUDA enabled Freesurfer tools, but this could mean you have to buy four GTX cards to catch up with the performance of one Tesla card.
This is correct. At the moment, I don't think that I use double precision anywhere, hence we're experimenting with CUDA Capability 1.1. I may have to start using double precision, given the problems which have just been found with mri_em_register_cuda. However, I'm not sure what the performance impact of the degraded GeForce performance will be. I'm reasonably certain that most of the code is bandwidth bound, so if anything a GeForce will outpace a Tesla, even if it uses double precision.
Of greater concern is the amount of memory available. The Tesla cards have quite a bit more RAM. This is likely to become important in the near future, as I work to get the rest of the mri_ca_register pipeline onto the GPU - the GCA structure is quite sparse, but for the initial port, I'll burn RAM instead of coming up with a cunning packing method. There will be enough to debug without worrying about optimisation.
Regards,
Richard