FreeSurfer experts,
I need to make the process of each one brain 10-20 times faster somehow - one way should be parallelization approach.
Currently I'm trying to add OpenMP parallelization to time-consuming part of the source code, especially mri_ca_register and mri_em_register.
Not completed yet, but at this point of time, I can not see the speed-up in proportion to the number of CPU cores; it's just x2.5 speed-up using 8 or 16 cores.
I'm afraid there might be fundamental limitations in algorithm and/or implementation of the code. Should I proceed with this work?
Any advice, help or comment would be appreciated.
Akio
Hi Akio
we have made some progress on this, but it is different for different algorithms. If you get the current dev codebase you'll find some examples of MPI pragmas.
cheers Bruce On Mon, 25 Jun 2012, Akio Yamamoto wrote:
FreeSurfer experts,
I need to make the process of each one brain 10-20 times faster somehow
- one way should be parallelization approach.
Currently I'm trying to add OpenMP parallelization to time-consuming part of the source code, especially mri_ca_register and mri_em_register.
Not completed yet, but at this point of time, I can not see the speed-up in proportion to the number of CPU cores; it's just x2.5 speed-up using 8 or 16 cores.
I'm afraid there might be fundamental limitations in algorithm and/or implementation of the code. Should I proceed with this work?
Any advice, help or comment would be appreciated.
Akio
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
aiko,
in particular, grep on HAVE_OPENMP in the c files in dev/utils. eg gcamorph.c., also dev/mri_ca_register/mri_ca_register.c. we havent done anything with em reg, but welcome any improvements you can make per what you see with how ca_reg was done.
this is the pattern of speed improvements we've seen with openmp:
https://surfer.nmr.mgh.harvard.edu/fswiki/CaRegTimings
note that the nehalem/sandybridge/(newest) architecture is essential for this improvement, as it accesses scattered memory structures much more efficiently.
nick
Hi Akio
we have made some progress on this, but it is different for different algorithms. If you get the current dev codebase you'll find some examples of MPI pragmas.
cheers Bruce On Mon, 25 Jun 2012, Akio Yamamoto wrote:
FreeSurfer experts,
I need to make the process of each one brain 10-20 times faster somehow
- one way should be parallelization approach.
Currently I'm trying to add OpenMP parallelization to time-consuming part of the source code, especially mri_ca_register and mri_em_register.
Not completed yet, but at this point of time, I can not see the speed-up in proportion to the number of CPU cores; it's just x2.5 speed-up using 8 or 16 cores.
I'm afraid there might be fundamental limitations in algorithm and/or implementation of the code. Should I proceed with this work?
Any advice, help or comment would be appreciated.
Akio
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
freesurfer@nmr.mgh.harvard.edu