Re: [Freesurfer] Parallel vs. Sequential part

5 Jul 2012


      On Tue, Jul 3, 2012 at 5:56 PM, Akio Yamamoto
yamamoto@tkl.iis.u-tokyo.ac.jp wrote:
...
Yes, as Richard pointed out, I just wanted to know the numbers for input
to Amdahl's law, if you have already something, to figure out the maximum
expected speedup using multiple processors/cores.
As for improvements of em_reg, I'll try to split each transform as well as
parallelize the energy evaluation.
Parallelising the energy evaluation is the lowest hanging fruit, and
is what happens in the 'slow' GPU version. But for highest
performance, I would convert the nested transform loops into a single
one, and farm those out between OpenMP threads (I wouldn't bother
trying nested parallelism of the energy evaluation, although you might
want to do some SSE tinkering). That is effectively what happens in
the 'fast' GPU version - you can use the same basic structure. But be
aware that the slightly different transforms which result can cause
you to converge to a different solution.
HTH,
Richard

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Freesurfer] Parallel vs. Sequential part