Hello Freesurfer team!
Having gone through the instructions for making use of the parallelization
in Freesurfer v6, I'm confused as to how exactly the calls for fine grained
and coarse grained paralleization differ. As per the release notes,
following is stated -
*Parallelization: a new flag was introduced which enables two forms of
compute parallelization that significantly reduces the runtime. As a point
of reference, using a new-ish workstation (2015+), the recon-all -all
runtime is just under 3 hours. When the -parallel flag is specified at the
end of the recon-all command-line, it will enable 'fine-grained'
parallelized code, making use of OpenMP, embedded in many of the binaries,
namely affecting mri_em_register and mri_ca_register. By default, it
instructs the binaries to use 4 processors (cores), meaning, 4 threads will
run in parallel in some operations (manifested in 'top' by mri_ca_register,
for example, showing 400% CPU utilization). This can be overridden by
including the flag -openmp <num> after -parallel, where <num> is the number
of processors you'd like to use (ex. 8 if you have an 8 core machine). Note
that this parallelization was introduced in v5.3, but many new routines
were OpenMP-parallelized in v6. The other form of parallelization, a
'coarse' form, enabled when the -parallel flag is specified, is such that
during the stages where left and right hemispheric data is processed, each
hemi binary is run separately (and in parallel, manifesting itself in 'top'
as two instances of mris_sphere, for example). Note that a couple of the
hemi stages (eg. mris_sphere) make use of a tiny amount of OpenMP code,
which means that for brief periods, as many as 8 cores are utilized (2
binaries running code that each make use of 4 threads). In general, though,
a 4 core machine can easily handle those periods. Be aware that if you
enable this -parallel flag on instances of recon-all running through a job
scheduler (like a cluster), it may not make your System Administrator happy
if you do not pre-allocate a sufficient number of cores for your job, as
you will be taking cycles from other cores that may be running jobs
belonging to other cluster users.*
Does this mean a command executed as "recon-all -s vol0 -i
junk/orig/001/IM-0001-0001.dcm -all -parallel" would be a coarse grained
parallelization and the command executed as "recon-all -s vol0 -i
junk/orig/001/IM-0001-0001.dcm -all -parallel -openmp 8" be a fine grained
parallelization?
Thanks,
Tyson