GPU/supercomputer adaptation of freesurfer?

List overview All Threads
Download

newer

older

Segmentation Query

Longitudinal stream : LME and...

John Absher

7 Dec 2018 7 Dec '18

1:40 a.m.

External Email - Use Caution

Hi,

I'm planning a freesurfer analysis of a large MRI dataset, and want to use the "380 GPU nodes" (and other cores/nodes) on the Palmetto Cluster (https://www.palmetto.clemson.edu/palmetto/userguide_palmetto_overview.html) to speed up the process. Since I am not a programmer, I'm hoping someone can give me a quick tutorial:

a) Is this going to speed up recon-all and the data analysis?

b) How much programming/expertise is required to enable freesurfer to take advantage of a supercomputer's resources?

c) Has anyone done this already?

d) The Palmetto Cluster is more or less limited to command-line. As long as I visualize the data on another system, I assume this will not be a problem, right?

Thanks,

John R. Absher, MD

jabsher@ghs.orgmailto:jabsher@ghs.org GHS Neuroscience Associates University of South Carolina School of Medicine Greenville 864-350-6655 (mobile)

Attachments:

attachment.html (text/html — 5.8 KB)

Show replies by date

Morgan Hough

7 Dec 7 Dec

7:02 a.m.

External Email - Use Caution

Hi John,

I know others (Satra?) might have more recent experience. I used to be able to get recon-all down to about 4 hours. OpenMP made the most difference but the GPU code always helped if you have it running. It really depends on whether you can get the admin to let you use the version of CUDA that FreeSurfer uses, etc. I was also always doing TRACULA too and the FSL GPU code is (IMO) essential. There are flags you add to recon-all to enable these. If you are having trouble with CUDA, OpenMP definitely speeds things up and try and stay on nodes with the most cores.

It’s not the supercomputing that you need really. Check what the batch/queue system is there. Hopefully it’s SGE (or whatever its called now) or something that can mimic SGE. I know I will always have to modify the fsl_sub included in FreeSurfer to make sure this works properly but if its something different don’t worry and just disable it. Again, this problem normally comes up in TRACULA more than FS recon-all but you want to write a small shell script that starts all your subjects and then wait for it to be done. Sometimes the batch system will be where you also select the nodes/num of cores.

Again, super-common:) Visualization is done on a local workstation unless you have an interactive node that, well, usually is very site specific. Can be useful as with a NVIDIA docker node on AWS if you want to see how it is sometimes done with GPU for visualization with something like ParaView.

Hope that helps.

Cheers,

-Morgan

On Fri, Dec 7, 2018 at 6:42 AM John Absher JAbsher@ghs.org wrote:

...

    External Email - Use Caution
Hi,

I’m planning a freesurfer analysis of a large MRI dataset, and want to use the “380 GPU nodes” (and other cores/nodes) on the Palmetto Cluster ( https://www.palmetto.clemson.edu/palmetto/userguide_palmetto_overview.html) to speed up the process. Since I am not a programmer, I’m hoping someone can give me a quick tutorial:

a) Is this going to speed up recon-all and the data analysis?

b) How much programming/expertise is required to enable freesurfer to take advantage of a supercomputer’s resources?

c) Has anyone done this already?

d) The Palmetto Cluster is more or less limited to command-line. As long as I visualize the data on another system, I assume this will not be a problem, right?

Thanks,

John R. Absher, MD

jabsher@ghs.org

GHS Neuroscience Associates

University of South Carolina School of Medicine Greenville

864-350-6655 (mobile)

Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

Mike Schmidt

9:15 a.m.

External Email - Use Caution

The scripts I used to manage upload, execution, and download of thousands of freesurfer runs to a cluster are available at https://github.com/mfschmidt/freesurfer-management. Seeing what I did might save you some time over scripting this from scratch. Feel free to copy/paste/take whatever you like in parts or in whole. I ran everything single-threaded since I had far more subjects than available cores. And I did not have GPU cores at my disposal. You would want to adapt the recon-all command for that, after you work out the CUDA implementation.

Mike

On Fri, Dec 7, 2018 at 1:41 AM John Absher JAbsher@ghs.org wrote:

...

    External Email - Use Caution
Hi,

I’m planning a freesurfer analysis of a large MRI dataset, and want to use the “380 GPU nodes” (and other cores/nodes) on the Palmetto Cluster ( https://www.palmetto.clemson.edu/palmetto/userguide_palmetto_overview.html) to speed up the process. Since I am not a programmer, I’m hoping someone can give me a quick tutorial:

a) Is this going to speed up recon-all and the data analysis?

b) How much programming/expertise is required to enable freesurfer to take advantage of a supercomputer’s resources?

c) Has anyone done this already?

d) The Palmetto Cluster is more or less limited to command-line. As long as I visualize the data on another system, I assume this will not be a problem, right?

Thanks,

John R. Absher, MD

jabsher@ghs.org

GHS Neuroscience Associates

University of South Carolina School of Medicine Greenville

864-350-6655 (mobile)

Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

R Edgar

12 Dec 12 Dec

9:54 p.m.

External Email - Use Caution

Sorry I'm coming to this a bit late.... I'm not sure that the CUDA implementation is supported these days. I've not done much work on em_ and ca_register since 2012; I'm not sure about the surface stream (I didn't do that port). Furthermore, I don't see CUDA being included in the new CMake build system. Some differences to the CPU stream are inevitable (floating point arithmetic and all that), but if the workflow in those binaries has changed since when I was working on them, then those changes may not be reflected in the GPU code.

Regards,

Richard

Morgan Hough

14 Dec 14 Dec

5:28 a.m.

External Email - Use Caution

Hi Richard,

That is a very good point. I knew it was being deprecated and there was talk of OpenCL but I have not been keeping up. Multicore systems were getting the most benefit from newer releases. Certainly, to run the existing CUDA code involved making sure your sysadmin would give you the older CUDA libs too.

Thank you for your input. I know it was a long time ago but I remember your GTC presentation for sure.

Cheers,

-Morgan

On Thu, Dec 13, 2018 at 2:55 AM R Edgar freesurfer.rge@gmail.com wrote:

...

    External Email - Use Caution
Sorry I'm coming to this a bit late.... I'm not sure that the CUDA implementation is supported these days. I've not done much work on em_ and ca_register since 2012; I'm not sure about the surface stream (I didn't do that port). Furthermore, I don't see CUDA being included in the new CMake build system. Some differences to the CPU stream are inevitable (floating point arithmetic and all that), but if the workflow in those binaries has changed since when I was working on them, then those changes may not be reflected in the GPU code.

Regards,

Richard

Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer

2786

Age (days ago)

2793

Last active (days ago)

freesurfer@nmr.mgh.harvard.edu

4 comments

4 participants

tags (0)

participants (4)

John Absher
Mike Schmidt
Morgan Hough
R Edgar