OpenMP and GPU implementation in freesurfer 6 beta

List overview All Threads
Download

newer

older

errors message in QAtools

Surface based analysis using FA...

Francis Tyson Thomas

3 Mar 2016 3 Mar '16

1:22 p.m.

Hi,

I have been working on speeding up the the hippocampal segmentation in freesurfer 6 beta and I have been playing around with both -openmp and -use-gpu flags. I noticed that freesurfer does seem to make use of all the threads in freesurfer 6 irrespective of passing the -openmp flag or not. Is that the right behaviour. I found this by analysing the cpu usage in both the cases.

Also, it looks like development for GPU usage has been halted for now and so I was trying to use the CUDA 5 for getting it working under ubuntu 14.04. I have been not successful that as the cuda device isn't getting selected. Do you have any recommendations on that?

And lastly, there is the itkthreads option available to -hippocampal-subfields-T1, so can I combine all three or two of these options to minimize the execution time - like GPU and itkthreads or GPU, OPENMP and itkthreads?

Also the v6 beta is no longer available on your website. Is there any particular reason as to why it was removed and can I continue using the copy that I have for hippocampal segmentation ?

Thanks, Tyson

Attachments:

attachment.html (text/html — 1.5 KB)

Show replies by date

R Edgar

3 Mar 3 Mar

4:23 p.m.

On 3 March 2016 at 13:22, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...

Also, it looks like development for GPU usage has been halted for now and so I was trying to use the CUDA 5 for getting it working under ubuntu 14.04. I have been not successful that as the cuda device isn't getting selected. Do you have any recommendations on that?

I have recently started looking at the CUDA port again, although I'm making no promises as to the amount of time I'll have to spend on it. This said I've got mri_ca_register running with CUDA 7.5 on my machine. On my E3 Xeon with a K1200, I can run mri_ca_register with the test dataset in about 9 minutes.

Are you compiling from source? I had to tamper a bit with the configure script before compiling CUDA was enabled again. I don't have access to the right machine at the moment, but as I recall, there was a line with_cuda="" which I had to comment out, since it was overriding the path to the machine's CUDA installation which I was passing on the command line. There are a few other minor bug fixes and performance improvements for GPU code which I've submitted for Zeke; no new kernels yet, I'm afraid.

If you could give me some more details about what you're doing, I may be able to help.

Just the standard warning: the GPU results will be different from the CPU results on the same inputs. We've been kicking around some ideas recently to quantify how different (and to devise input datasets where the correct answers are unambiguous).

Regards,

Richard

Francis Tyson Thomas

7 Mar 7 Mar

11:25 a.m.

Hi Richard,

That information was a lot helpful. At this point I'm currently trying to reduce the recon-all processing time as much as possible and for this reason I was looking to get the -use-gpu flag working. I'm currently running a freesurfer v6 beta version on Ubuntu 14.04.4. With regard to the graphic card it is a dual K2200 configuration (I guess they are running in sli configuration - although I'm not completely sure).

When you mentioned you compiled everything, I believe you were referring to compiling CUDA 7.5 for Ubuntu 14.04. Because after seeing the link - https://developer.nvidia.com/cuda-gpus - we settled for CUDA 5 since it was the compatible version mentioned for K2200. Does that mean CUDA 7.5 is backwards compatible with a slight tinkering and can be used with freesurfer 6.0 ?

I however tried to setup CUDA 5 following the instructions in the link - http://www.unixmen.com/how-to-install-cuda-5-0-toolkit-in-ubuntu/ - however, I'm not able to get it running. I keep getting the following error "Unable to acquire CUDA device". Does this sound familiar ?

If you can share some more information in setting this up it will be great since the amount of time recon-all takes is quite too long for running multiple datasets. Most importantly we are concerned about the hippocampal segmentation in freesurfer 6 rather than recon-all and so speeding this up would be extremely helpful.

Thank you, Tyson

On Thu, Mar 3, 2016 at 2:23 PM, R Edgar freesurfer.rge@gmail.com wrote:

...

On 3 March 2016 at 13:22, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...
Also, it looks like development for GPU usage has been halted for now

and so

...
I was trying to use the CUDA 5 for getting it working under ubuntu

14.04. I

...
have been not successful that as the cuda device isn't getting selected.

Do

...
you have any recommendations on that?

I have recently started looking at the CUDA port again, although I'm making no promises as to the amount of time I'll have to spend on it. This said I've got mri_ca_register running with CUDA 7.5 on my machine. On my E3 Xeon with a K1200, I can run mri_ca_register with the test dataset in about 9 minutes.

Are you compiling from source? I had to tamper a bit with the configure script before compiling CUDA was enabled again. I don't have access to the right machine at the moment, but as I recall, there was a line with_cuda="" which I had to comment out, since it was overriding the path to the machine's CUDA installation which I was passing on the command line. There are a few other minor bug fixes and performance improvements for GPU code which I've submitted for Zeke; no new kernels yet, I'm afraid.

If you could give me some more details about what you're doing, I may be able to help.

Just the standard warning: the GPU results will be different from the CPU results on the same inputs. We've been kicking around some ideas recently to quantify how different (and to devise input datasets where the correct answers are unambiguous).

Regards,

Richard _______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

R Edgar

8:38 p.m.

On 7 March 2016 at 11:25, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...

That information was a lot helpful. At this point I'm currently trying to reduce the recon-all processing time as much as possible and for this reason I was looking to get the -use-gpu flag working. I'm currently running a freesurfer v6 beta version on Ubuntu 14.04.4. With regard to the graphic card it is a dual K2200 configuration (I guess they are running in sli configuration - although I'm not completely sure).

When you mentioned you compiled everything, I believe you were referring to compiling CUDA 7.5 for Ubuntu 14.04. Because after seeing the link - https://developer.nvidia.com/cuda-gpus

we settled for CUDA 5 since it was the compatible version mentioned for

K2200. Does that mean CUDA 7.5 is backwards compatible with a slight tinkering and can be used with freesurfer 6.0 ?

I think that you might have misunderstood the NVIDIA page.

It lists the K2200 as a Compute Capability 5 GPU (just like my K1200). The Compute Capability refers to the hardware: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capa... Compute capability 5 is also called "Maxwell," while the work I did on Freesurfer was around the time of Compute Capability 2, known as Fermi. In CPU terms, GPU compute capabilities are a bit like Ivy Bridge vs Sandy Bridge (although I think that GPU features vary more than the CPU ones).

In order to program the GPU, you need the NVIDIA CUDA Toolkit, which contains the required compiler (nvcc). The current version of this is 7.5. I installed it by following the instructions on NVIDIA's website: http://docs.nvidia.com/cuda/index.html This was quite straightforward (certainly moreso than it was five or six years ago, when you could never be sure that your X11.conf would survive).

Either with the toolkit, or as a separate install, you can get a lot of examples from NVIDIA. I'd suggest grabbing those, and making sure that you can compile them. The "DeviceQuery" one will probe your PCIe bus, and report what GPUs it finds.

...

I however tried to setup CUDA 5 following the instructions in the link - http://www.unixmen.com/how-to-install-cuda-5-0-toolkit-in-ubuntu/ - however, I'm not able to get it running. I keep getting the following error "Unable to acquire CUDA device". Does this sound familiar ?

I suspect (although I wasn't following things at the time) that the Toolkit v5 was before Maxwell cards were released. If so, then it wouldn't know what to do with the GPUs.

For the record, since I was writing the CUDA bits so long ago (for the volume side of things - I didn't do the surface accelerations), they only use Fermi features. For this reason, you'll want to make sure that you have --enable-fermi-gpu when you run configure (and make sure that it's picking out your CUDA installation - I had to tweak the configure script for this).

...

If you can share some more information in setting this up it will be great since the amount of time recon-all takes is quite too long for running multiple datasets. Most importantly we are concerned about the hippocampal segmentation in freesurfer 6 rather than recon-all and so speeding this up would be extremely helpful.

I don't know if those portions benefit from CUDA acceleration at this time. I focused on mri_em_register and mri_ca_register. Even if other programs (which ones are they?) can be linked against some of the accelerated routines, there is no guarantee of speed up - the time to shuffle data to and from the GPU is typically greater than the speedup of any one routine.

Hope this helps,

Richard

Francis Tyson Thomas

21 Mar 21 Mar

7:59 p.m.

Hi Richard,

The first part was smooth. I got it done pretty quickly (installing CUDA and setting it up!). However for the freesurfer portion I have two questions,

1. Bruce mentioned in another email thread that v6 beta should be out in a week or so after testing is completed. Should I wait for that version, assuming I get the source code for v6 when I download it as per the instructions on the freesurfer page.

2. Also, would I have to compile and build all modules (mri_em_registe, mri_ca_register etc or can I limit to just these two when I build it again.

I'm not very well versed with the second part and so if you could explain a little more it will be good.

Thanks, Tyson

On Mon, Mar 7, 2016 at 6:38 PM, R Edgar freesurfer.rge@gmail.com wrote:

...

On 7 March 2016 at 11:25, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...
That information was a lot helpful. At this point I'm currently trying to reduce the recon-all processing time as much as possible and for this

reason

...
I was looking to get the -use-gpu flag working. I'm currently running a freesurfer v6 beta version on Ubuntu 14.04.4. With regard to the graphic card it is a dual K2200 configuration (I guess they are running in sli configuration - although I'm not completely sure).

When you mentioned you compiled everything, I believe you were referring

to

...
compiling CUDA 7.5 for Ubuntu 14.04. Because after seeing the link - https://developer.nvidia.com/cuda-gpus

we settled for CUDA 5 since it was the compatible version mentioned

for

...
K2200. Does that mean CUDA 7.5 is backwards compatible with a slight tinkering and can be used with freesurfer 6.0 ?

I think that you might have misunderstood the NVIDIA page.

It lists the K2200 as a Compute Capability 5 GPU (just like my K1200). The Compute Capability refers to the hardware:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capa... Compute capability 5 is also called "Maxwell," while the work I did on Freesurfer was around the time of Compute Capability 2, known as Fermi. In CPU terms, GPU compute capabilities are a bit like Ivy Bridge vs Sandy Bridge (although I think that GPU features vary more than the CPU ones).

In order to program the GPU, you need the NVIDIA CUDA Toolkit, which contains the required compiler (nvcc). The current version of this is 7.5. I installed it by following the instructions on NVIDIA's website: http://docs.nvidia.com/cuda/index.html This was quite straightforward (certainly moreso than it was five or six years ago, when you could never be sure that your X11.conf would survive).

Either with the toolkit, or as a separate install, you can get a lot of examples from NVIDIA. I'd suggest grabbing those, and making sure that you can compile them. The "DeviceQuery" one will probe your PCIe bus, and report what GPUs it finds.

...
I however tried to setup CUDA 5 following the instructions in the link - http://www.unixmen.com/how-to-install-cuda-5-0-toolkit-in-ubuntu/ -

however,

...
I'm not able to get it running. I keep getting the following error

"Unable

...
to acquire CUDA device". Does this sound familiar ?

I suspect (although I wasn't following things at the time) that the Toolkit v5 was before Maxwell cards were released. If so, then it wouldn't know what to do with the GPUs.

For the record, since I was writing the CUDA bits so long ago (for the volume side of things - I didn't do the surface accelerations), they only use Fermi features. For this reason, you'll want to make sure that you have --enable-fermi-gpu when you run configure (and make sure that it's picking out your CUDA installation - I had to tweak the configure script for this).

...
If you can share some more information in setting this up it will be

great

...
since the amount of time recon-all takes is quite too long for running multiple datasets. Most importantly we are concerned about the

hippocampal

...
segmentation in freesurfer 6 rather than recon-all and so speeding this

up

...
would be extremely helpful.

I don't know if those portions benefit from CUDA acceleration at this time. I focused on mri_em_register and mri_ca_register. Even if other programs (which ones are they?) can be linked against some of the accelerated routines, there is no guarantee of speed up - the time to shuffle data to and from the GPU is typically greater than the speedup of any one routine.

Hope this helps,

Richard _______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

R Edgar

9:47 p.m.

On 21 March 2016 at 19:59, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...

The first part was smooth. I got it done pretty quickly (installing CUDA and setting it up!). However for the freesurfer portion I have two questions,

Bruce mentioned in another email thread that v6 beta should be out in a

week or so after testing is completed. Should I wait for that version, assuming I get the source code for v6 when I download it as per the instructions on the freesurfer page.

Which version of the source code do you have access to? I'm using the read-only git repository of the main development trunk, sending patches back to Zeke, which he puts into CVS. I'm not sure which of those are making it over to v6.

...

Also, would I have to compile and build all modules (mri_em_registe,

mri_ca_register etc or can I limit to just these two when I build it again.

I imagine you'll have to run configure again. When you do, the key options you want are: --enable-fermi-gpu --with-cuda="/usr/local/cuda" Note that on my machine: [rge21@cudastation ~]$ which nvcc /usr/local/cuda-7.5/bin/nvcc [rge21@cudastation ~]$ ls -alF /usr/local/cuda lrwxrwxrwx. 1 root root 8 Jan 9 16:30 /usr/local/cuda -> cuda-7.5/ So if your CUDA installation went somewhere else, you'll have to adjust that --with-cuda path.

Then when you compile, make sure the nvcc being used really is the one you've set (things will likely fail horribly if this isn't the case). If it isn't, you'll have to look in the configure file for:

############################################################# # Nvidia CUDA enabling ############################################################ CUDA_DIR="" with_cuda=""

and then comment out # with_cuda="" or it will override the directory you set on the command line.

Let me know if you need further help,

Richard

Francis Tyson Thomas

24 Mar 24 Mar

4:15 a.m.

Hi Richard,

That was really helpful. All the details were very clear and I was able to get it up and running except for some small hiccups which I fixed with a small hack.

The issue I faced was after following all your instructions when I cam to the step of "make j4" it threw an error saying that it was not able to find -lcudart. And it looked like it was looking for it in "/usr/local/cuda/lib" when the directory itself didn't exist. I fixed this by editing another line in the configure.in file. I changed from CUDA_LIBS="-L$CUDA_DIR/lib $LIB_CUDA -lcudart" to CUDA_LIBS="-L$CUDA_DIR/lib64 $LIB_CUDA -lcudart". Was this change right?

Also I noticed that the speedup wasn't extremely huge. The time for one recon-all run went down from 7 hrs 45 mins to 5hrs 20 mins. My understanding is that this is due to only certain modules being cuda-ised (if there is a word like that!) like the mri_ca_register and the mri_em_register. Are there any other modules that are parallelized currently other than these?

With regard to the code base I guess I'm also using the read-only git repository of the main development trunk as per the instructions on the freesurfer webpage.

Once again thank you very much for that detailed information.

Best, Tyson

On Mon, Mar 21, 2016 at 6:47 PM, R Edgar freesurfer.rge@gmail.com wrote:

...

On 21 March 2016 at 19:59, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...
The first part was smooth. I got it done pretty quickly (installing CUDA

and

...
setting it up!). However for the freesurfer portion I have two questions,

Bruce mentioned in another email thread that v6 beta should be out in

a

...
week or so after testing is completed. Should I wait for that version, assuming I get the source code for v6 when I download it as per the instructions on the freesurfer page.

Which version of the source code do you have access to? I'm using the read-only git repository of the main development trunk, sending patches back to Zeke, which he puts into CVS. I'm not sure which of those are making it over to v6.

...

Also, would I have to compile and build all modules (mri_em_registe,

mri_ca_register etc or can I limit to just these two when I build it

again.

I imagine you'll have to run configure again. When you do, the key options you want are: --enable-fermi-gpu --with-cuda="/usr/local/cuda" Note that on my machine: [rge21@cudastation ~]$ which nvcc /usr/local/cuda-7.5/bin/nvcc [rge21@cudastation ~]$ ls -alF /usr/local/cuda lrwxrwxrwx. 1 root root 8 Jan 9 16:30 /usr/local/cuda -> cuda-7.5/ So if your CUDA installation went somewhere else, you'll have to adjust that --with-cuda path.

Then when you compile, make sure the nvcc being used really is the one you've set (things will likely fail horribly if this isn't the case). If it isn't, you'll have to look in the configure file for:

############################################################# # Nvidia CUDA enabling ############################################################ CUDA_DIR="" with_cuda=""

and then comment out # with_cuda="" or it will override the directory you set on the command line.

Let me know if you need further help,

Richard _______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

R Edgar

6:20 a.m.

On 24 March 2016 at 04:15, Francis Tyson Thomas francistthomas@email.arizona.edu wrote:

...

The issue I faced was after following all your instructions when I cam to the step of "make j4" it threw an error saying that it was not able to find -lcudart. And it looked like it was looking for it in "/usr/local/cuda/lib" when the directory itself didn't exist. I fixed this by editing another line in the configure.in file. I changed from CUDA_LIBS="-L$CUDA_DIR/lib $LIB_CUDA -lcudart" to CUDA_LIBS="-L$CUDA_DIR/lib64 $LIB_CUDA -lcudart". Was this change right?

If it compiled and ran successfully, I think it was right. I'm not an expert on autotools, and your machine might be laid out slightly differently to mine.

...

Also I noticed that the speedup wasn't extremely huge. The time for one recon-all run went down from 7 hrs 45 mins to 5hrs 20 mins. My understanding is that this is due to only certain modules being cuda-ised (if there is a word like that!) like the mri_ca_register and the mri_em_register. Are there any other modules that are parallelized currently other than these?

em_register and ca_register were the only two I ported. I think that a couple of the binaries on the surface side of things were accelerated too, but I didn't do those. You are, indeed, encountering a practical example of Amdahl's Law.

Currently, I'm still poking around mri_ca_register, since I think that there's at least another minute which can be shaved off the runtime. However, things are getting a little gnarly, since it appears that I had the wrong mental model of how the datastructures fit together.

Feel free to ask any more questions,

Richard

3755

Age (days ago)

3776

Last active (days ago)

freesurfer@nmr.mgh.harvard.edu

7 comments

2 participants

tags (0)

participants (2)

Francis Tyson Thomas
R Edgar