That was really helpful. All the details were very clear and I was able to get it up and running except for some small hiccups which I fixed with a small hack.
The issue I faced was after following all your instructions when I cam to the step of "make j4" it threw an error saying that it was not able to find -lcudart. And it looked like it was looking for it in "/usr/local/cuda/lib" when the directory itself didn't exist. I fixed this by editing another line in the
configure.in file. I changed from CUDA_LIBS="-L$CUDA_DIR/
lib $LIB_CUDA -lcudart" to CUDA_LIBS="-L$CUDA_DIR/
lib64 $LIB_CUDA -lcudart". Was this change right?
Also I noticed that the speedup wasn't extremely huge. The time for one recon-all run went down from 7 hrs 45 mins to 5hrs 20 mins. My understanding is that this is due to only certain modules being cuda-ised (if there is a word like that!) like the mri_ca_register and the mri_em_register. Are there any other modules that are parallelized currently other than these?
With regard to the code base I guess I'm also using the read-only git repository of the main development trunk as per the instructions on the freesurfer webpage.
Once again thank you very much for that detailed information.
Best,
Tyson