Hi there,
I am playing around with NextBrain with a dataset of individuals with large white matter lesions, and I am noticing that in 10% of the cases NextBrain fails to finish due to a OOM error, such as this:
RuntimeError: CUDA out of memory. Tried to allocate 1.08 GiB (GPU 0; 31.73 GiB total capacity; 28.54 GiB already allocated; 1.00 GiB free; 30.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have tried to adjust the max_split_size_mb parameter to circumvent the problem (PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512) but it is still failing for some individuals. Do you guys have any inputs about how to deal with this? For context, I am using a cluster with two 32GB GPUs with additional 10 CPUs.
Thanks!
Nárlon
External Email - Use Caution
Dear Nárlon Thanks for the interest in our work! You have several options:
1. If it’s only a couple of subjects, you could bite the bullet and run them on the CPU. It’ll be slow, but hey, easy to run (just add the --cpu flag). 2. Changing the resolution at which the segmentations are generated, by editing the following line of $FREESURFER_HOME/python/packages/ERC_bayesian_segmentation/scripts/segment.py: https://github.com/freesurfer/freesurfer/blob/551dd7e3954bc030f6d116d9d671e3... and replacing 0.3333333333333333 by e.g., 0.4. If you do this, you should rerun all your subjects, so that none of them are treated differently. 3. We are also releasing a fast version of the tool (see Section 4 of https://surfer.nmr.mgh.harvard.edu/fswiki/HistoAtlasSegmentation) that runs relatively quickly on the CPU. I submitted the PR yesterday and it should be on the development version of FreeSurfer in a couple of days. Cheers, /Eugenio
-- Juan Eugenio Iglesias http://www.jeiglesias.com
From: freesurfer-bounces@nmr.mgh.harvard.edu freesurfer-bounces@nmr.mgh.harvard.edu on behalf of Boa Sorte Silva, Narlon narlon.silva@ubc.ca Date: Thursday, November 7, 2024 at 9:15 PM To: Freesurfer support list freesurfer@nmr.mgh.harvard.edu Subject: [Freesurfer] NextBrain - CUDA out of memory error Hi there,
I am playing around with NextBrain with a dataset of individuals with large white matter lesions, and I am noticing that in 10% of the cases NextBrain fails to finish due to a OOM error, such as this:
RuntimeError: CUDA out of memory. Tried to allocate 1.08 GiB (GPU 0; 31.73 GiB total capacity; 28.54 GiB already allocated; 1.00 GiB free; 30.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have tried to adjust the max_split_size_mb parameter to circumvent the problem (PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512) but it is still failing for some individuals. Do you guys have any inputs about how to deal with this? For context, I am using a cluster with two 32GB GPUs with additional 10 CPUs.
Thanks!
Nárlon
freesurfer@nmr.mgh.harvard.edu