On 24 March 2016 at 04:15, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:
The issue I faced was after following all your instructions when I cam to the step of "make j4" it threw an error saying that it was not able to find -lcudart. And it looked like it was looking for it in "/usr/local/cuda/lib" when the directory itself didn't exist. I fixed this by editing another line in the configure.in file. I changed from CUDA_LIBS="-L$CUDA_DIR/lib $LIB_CUDA -lcudart" to CUDA_LIBS="-L$CUDA_DIR/lib64 $LIB_CUDA -lcudart". Was this change right?
If it compiled and ran successfully, I think it was right. I'm not an expert on autotools, and your machine might be laid out slightly differently to mine.
Also I noticed that the speedup wasn't extremely huge. The time for one recon-all run went down from 7 hrs 45 mins to 5hrs 20 mins. My understanding is that this is due to only certain modules being cuda-ised (if there is a word like that!) like the mri_ca_register and the mri_em_register. Are there any other modules that are parallelized currently other than these?
em_register and ca_register were the only two I ported. I think that a couple of the binaries on the surface side of things were accelerated too, but I didn't do those. You are, indeed, encountering a practical example of Amdahl's Law.
Currently, I'm still poking around mri_ca_register, since I think that there's at least another minute which can be shaved off the runtime. However, things are getting a little gnarly, since it appears that I had the wrong mental model of how the datastructures fit together.
Feel free to ask any more questions,
Richard