Dear Nárlon

Thanks for the interest in our work!

You have several options:

If it’s only a couple of subjects, you could bite the bullet and run them on the CPU. It’ll be slow, but hey, easy to run (just add the --cpu flag).
Changing the resolution at which the segmentations are generated, by editing the following line of $FREESURFER_HOME/python/packages/ERC_bayesian_segmentation/scripts/segment.py:
https://github.com/freesurfer/freesurfer/blob/551dd7e3954bc030f6d116d9d671e3596137385b/mri_histo_util/ERC_bayesian_segmentation/scripts/segment.py#L107
and replacing 0.3333333333333333 by e.g., 0.4. If you do this, you should rerun all your subjects, so that none of them are treated differently.
We are also releasing a fast version of the tool (see Section 4 of https://surfer.nmr.mgh.harvard.edu/fswiki/HistoAtlasSegmentation) that runs relatively quickly on the CPU. I submitted the PR yesterday and it should be on the development version of FreeSurfer in a couple of days.

Cheers,

/Eugenio

Juan Eugenio Iglesias

http://www.jeiglesias.com

From: freesurfer-bounces@nmr.mgh.harvard.edu <freesurfer-bounces@nmr.mgh.harvard.edu> on behalf of Boa Sorte Silva, Narlon <narlon.silva@ubc.ca>
Date: Thursday, November 7, 2024 at 9:15 PM
To: Freesurfer support list <freesurfer@nmr.mgh.harvard.edu>
Subject: [Freesurfer] NextBrain - CUDA out of memory error

Hi there,

I am playing around with NextBrain with a dataset of individuals with large white matter lesions, and I am noticing that in 10% of the cases NextBrain fails to finish due to a OOM error, such as this:

RuntimeError: CUDA out of memory. Tried to allocate 1.08 GiB (GPU 0; 31.73 GiB total capacity; 28.54 GiB already allocated; 1.00 GiB free; 30.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have tried to adjust the max_split_size_mb parameter to circumvent the problem (PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512) but it is still failing for some individuals. Do you guys have any inputs about how to deal with this? For context, I am using a cluster with two 32GB GPUs with additional 10 CPUs.

Thanks!

Nárlon