External Email - Use Caution
Hi Freesurfer Support,
The past few weeks, I've emailed about my recon-all processing stream never completely finishing.
We are still encountering the same problem. For context, our recon-all job submission script runs recon-all for 1 image. Given a folder of 100 MRI images, we have another script that calls upon the recon-all script for each image within the folder via a for loop. The result is 100 jobs running in parallel on the cluster.
I have attached a screenshot of the parameters we set whenever we submit each recon-all job for each image by SLURM to UCSD's cluster. The maximum time we are allowed for each job is 48 hours. Our jobs are shared-node jobs, which means we run more than 1 job on a single node. This time, we increased RAM from 8GB to 16GB in hopes the entire recon-all processing stream can fully run through for each image, but it still stops at "mri_pretess done" and does not go on to the -fill step. Each job submitted to the cluster took around 4 hours and 20 minutes each to run in our most recent attempt.
If increasing the RAM did not change anything for us, how else can we get more verbose error messages? What other reasons could be why our recon-all aborts before finishing? I have attached a recon-all.log from one of our subjects for reference and a screenshot of the parameters we set for submitting jobs in our cluster for reference.
Freesurfer version: 7.2.0, but already available as module on UCSD's cluster system
Platform: Rocky Linux release 8.8 (Green Obsidian)
uname -a: Linux login01 4.18.0-477.15.1.el8_8.x86_64 #1 SMP Wed Jun 28 15:04:18 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Recon-all log: see attached
Thank you!
Sincerely,
Yilei