Hi,
At the beginning of the year I processed 400 scans on a linux cluster. It had a reported ram limit of 1 Gig but it could cope with the slight excess of FS3 and a processing time limit of 48 hours (which was fine for 99% of the scans). Most scans went through autorecon-all without problem.
I am now trying to reprocess the scans with the new version but I am running into a number of problems. On the same linux cluser processing the same scans, some scans (~30%) run without problem. The rest either fail because they exceed the memory limit or because they take much more than 48 hours (the jobs are being killed and the logs report only the left and sometimes part of the right hemisphere being processed).
Since the documentation makes clear that FS works best with 2 Gig of ram, I have switched to an Itinium cluster with 2 Gig ram limit and 48 hours processing time limit. When I compare the logs of the same scans processed on both systems the Itinium cluster seems to take longer and although I am still running tests it appears that for at least some scans autoreconall might take 70+ hours.
Here are my questions:
1. On the linux cluster can I tell freesurfer not to exceed a certain ram allocation? and if yes how?
2. Do the problems I have on the Itinium cluster suggest that FS is badly configured on this system? and if yes where should we look? (I don't have access to this system's configuration and I have to feedback to the system managers to fix eventual problems.)
Thank you very much for your help and for sharing these great tools with us.
Nic
Hi Nicolas,
1. No, sorry, the amount of RAM is sometimes dependent on the individual anatomy, and in any case can't be predefine.
2. Not sure about the itaniums. We have no real experience. 72 hours does sound pretty long. That's I think about what it used to take on our old athlons. Can you extend the time limit?
As for the random stopping of recon-all, we have seen that sometimes as well, and are trying to track it down. It seems pretty mysterious, as a binary will exit with a nonzero exit code according to the shell even though the last printf in the code has been executed and the next statement is an exit(0).
Bruce
On Mon, 12 Nov 2007, Nicolas Cherbuin wrote:
Hi,
At the beginning of the year I processed 400 scans on a linux cluster. It had a reported ram limit of 1 Gig but it could cope with the slight excess of FS3 and a processing time limit of 48 hours (which was fine for 99% of the scans). Most scans went through autorecon-all without problem.
I am now trying to reprocess the scans with the new version but I am running into a number of problems. On the same linux cluser processing the same scans, some scans (~30%) run without problem. The rest either fail because they exceed the memory limit or because they take much more than 48 hours (the jobs are being killed and the logs report only the left and sometimes part of the right hemisphere being processed).
Since the documentation makes clear that FS works best with 2 Gig of ram, I have switched to an Itinium cluster with 2 Gig ram limit and 48 hours processing time limit. When I compare the logs of the same scans processed on both systems the Itinium cluster seems to take longer and although I am still running tests it appears that for at least some scans autoreconall might take 70+ hours.
Here are my questions:
On the linux cluster can I tell freesurfer not to exceed a certain ram allocation? and if yes how?
Do the problems I have on the Itinium cluster suggest that FS is badly configured on this system? and if yes where should we look? (I don't have access to this system's configuration and I have to feedback to the system managers to fix eventual problems.)
Thank you very much for your help and for sharing these great tools with us.
Nic
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Hi Bruce,
Thanks for your answers. I am not sure how much more ram might be needed dependant on anatomy but my last job failed due to FS using more than 3 Gig
PBS: job killed: per node vmem 34078384kb exceeded limit 2097152kb
or does it suggest there is a problem with our configuration?
Also, a related question. Since a number of jobs have failed after many hours of processing, I have had to investigate restarting jobs to finish processing the right hemisphere. For instance after the following log:
status file for recon-all Thu Nov 8 23:13:39 EST 2007 #@# MotionCor Thu Nov 8 23:13:53 EST 2007 #@# Nu Intensity Correction Thu Nov 8 23:14:33 EST 2007 #@# Talairach Thu Nov 8 23:15:44 EST 2007 #@# Talairach Failure Detection Thu Nov 8 23:18:32 EST 2007 #@# Intensity Normalization Thu Nov 8 23:18:32 EST 2007 #@# Skull Stripping Thu Nov 8 23:25:47 EST 2007 #@# EM Registration Fri Nov 9 00:11:07 EST 2007 #@# CA Normalize Fri Nov 9 00:31:03 EST 2007 #@# CA Reg Fri Nov 9 00:37:06 EST 2007 #@# CA Reg Inv Fri Nov 9 16:56:14 EST 2007 #@# Remove Neck Fri Nov 9 16:59:57 EST 2007 #@# SkullLTA Fri Nov 9 17:04:39 EST 2007 #@# SubCort Seg Fri Nov 9 17:37:12 EST 2007 #@# CC Seg Fri Nov 9 18:57:34 EST 2007 #@# Intensity Normalization2 Fri Nov 9 18:59:22 EST 2007 #@# Mask BFS Fri Nov 9 19:08:32 EST 2007 #@# WM Segmentation Fri Nov 9 19:08:45 EST 2007 #@# Fill Fri Nov 9 19:17:29 EST 2007 #@# Tessellate lh Fri Nov 9 19:19:57 EST 2007 #@# Smooth1 lh Fri Nov 9 19:20:37 EST 2007 #@# Inflation1 lh Fri Nov 9 19:20:56 EST 2007 #@# QSphere lh Fri Nov 9 19:24:27 EST 2007 #@# Fix Topology lh Fri Nov 9 19:58:03 EST 2007 #@# Make Final Surf lh Sat Nov 10 03:17:04 EST 2007 #@# Smooth2 lh Sat Nov 10 07:11:02 EST 2007 #@# Inflation2 lh Sat Nov 10 07:11:19 EST 2007 #@# Sphere lh Sat Nov 10 07:19:55 EST 2007 #@# Surf Reg lh Sat Nov 10 10:29:15 EST 2007 #@# Jacobian white lh Sat Nov 10 17:43:19 EST 2007 #@# AvgCurv lh Sat Nov 10 17:43:29 EST 2007 #@# Cortical Parc lh Sat Nov 10 17:43:37 EST 2007 #@# Parcellation Stats lh Sat Nov 10 17:44:55 EST 2007 #@# Cortical Parc 2 lh Sat Nov 10 17:45:30 EST 2007 #@# Parcellation Stats 2 lh Sat Nov 10 17:47:12 EST 2007 #@# Tessellate rh Sat Nov 10 17:47:58 EST 2007 #@# Smooth1 rh Sat Nov 10 17:48:35 EST 2007 #@# Inflation1 rh Sat Nov 10 17:48:55 EST 2007 #@# QSphere rh Sat Nov 10 17:52:32 EST 2007 #@# Fix Topology rh Sat Nov 10 18:26:17 EST 2007
I initially thought I could restart with -autorecon3 but since this did not work I restarted with:
recon-all -s subject -autorecon2 -hemi right
but this (log below) restarted with a number of (very lengthy!!!) processes that had been completed before. What is the most efficient way to restart a process and is there a way of restarting autorecon2 and have it automatically followed by autorecton3?
Sun Nov 11 11:29:26 EST 2007 #@# EM Registration Sun Nov 11 11:29:26 EST 2007 #@# CA Normalize Sun Nov 11 11:48:51 EST 2007 #@# CA Reg Sun Nov 11 11:54:14 EST 2007 #@# CA Reg Inv Mon Nov 12 04:21:18 EST 2007 #@# Remove Neck Mon Nov 12 04:25:00 EST 2007 #@# SkullLTA Mon Nov 12 04:29:31 EST 2007 #@# SubCort Seg Mon Nov 12 05:01:49 EST 2007 #@# CC Seg Mon Nov 12 06:21:48 EST 2007 #@# Intensity Normalization2 Mon Nov 12 06:23:50 EST 2007 #@# Mask BFS Mon Nov 12 06:32:52 EST 2007 #@# WM Segmentation Mon Nov 12 06:33:05 EST 2007 #@# Fill Mon Nov 12 06:41:56 EST 2007 #@# Tessellate rh Mon Nov 12 06:44:23 EST 2007 #@# Smooth1 rh Mon Nov 12 06:45:00 EST 2007 #@# Inflation1 rh Mon Nov 12 06:45:19 EST 2007 #@# QSphere rh Mon Nov 12 06:48:48 EST 2007 #@# Fix Topology rh Mon Nov 12 07:22:02 EST 2007 #@# Make Final Surf rh Mon Nov 12 15:26:46 EST 2007 #@# Smooth2 rh Mon Nov 12 19:18:37 EST 2007 #@# Inflation2 rh Mon Nov 12 19:18:55 EST 2007 #@# ASeg Stats Mon Nov 12 19:27:09 EST 2007 #@# Cortical ribbon mask rh Mon Nov 12 20:03:45 EST 2007
Finally, what is the best way to restart it from the last memory exceeded failure which interupted processing towards the end of autorecon2?
Thanks
Nic
-----Original Message----- From: Bruce Fischl [mailto:fischl@nmr.mgh.harvard.edu] Sent: Sun 11/11/2007 4:30 PM To: Nicolas Cherbuin Cc: freesurfer@nmr.mgh.harvard.edu Subject: Re: [Freesurfer] reprocessing on FS4.0 takes much longer for some scans
Hi Nicolas,
1. No, sorry, the amount of RAM is sometimes dependent on the individual anatomy, and in any case can't be predefine.
2. Not sure about the itaniums. We have no real experience. 72 hours does sound pretty long. That's I think about what it used to take on our old athlons. Can you extend the time limit?
As for the random stopping of recon-all, we have seen that sometimes as well, and are trying to track it down. It seems pretty mysterious, as a binary will exit with a nonzero exit code according to the shell even though the last printf in the code has been executed and the next statement is an exit(0).
Bruce
On Mon, 12 Nov 2007, Nicolas Cherbuin wrote:
Hi,
At the beginning of the year I processed 400 scans on a linux cluster. It had a reported ram limit of 1 Gig but it could cope with the slight excess of FS3 and a processing time limit of 48 hours (which was fine for 99% of the scans). Most scans went through autorecon-all without problem.
I am now trying to reprocess the scans with the new version but I am running into a number of problems. On the same linux cluser processing the same scans, some scans (~30%) run without problem. The rest either fail because they exceed the memory limit or because they take much more than 48 hours (the jobs are being killed and the logs report only the left and sometimes part of the right hemisphere being processed).
Since the documentation makes clear that FS works best with 2 Gig of ram, I have switched to an Itinium cluster with 2 Gig ram limit and 48 hours processing time limit. When I compare the logs of the same scans processed on both systems the Itinium cluster seems to take longer and although I am still running tests it appears that for at least some scans autoreconall might take 70+ hours.
Here are my questions:
On the linux cluster can I tell freesurfer not to exceed a certain ram allocation? and if yes how?
Do the problems I have on the Itinium cluster suggest that FS is badly configured on this system? and if yes where should we look? (I don't have access to this system's configuration and I have to feedback to the system managers to fix eventual problems.)
Thank you very much for your help and for sharing these great tools with us.
Nic
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
do you know what process it was running when it failed? You could use -autorecon2-cp and -autorecon3. Something else seems like it is wrong though - you shouldn't need 3G of ram.
On Tue, 13 Nov 2007, Nicolas Cherbuin wrote:
Hi Bruce,
Thanks for your answers. I am not sure how much more ram might be needed dependant on anatomy but my last job failed due to FS using more than 3 Gig
PBS: job killed: per node vmem 34078384kb exceeded limit 2097152kb
or does it suggest there is a problem with our configuration?
Also, a related question. Since a number of jobs have failed after many hours of processing, I have had to investigate restarting jobs to finish processing the right hemisphere. For instance after the following log:
status file for recon-all Thu Nov 8 23:13:39 EST 2007 #@# MotionCor Thu Nov 8 23:13:53 EST 2007 #@# Nu Intensity Correction Thu Nov 8 23:14:33 EST 2007 #@# Talairach Thu Nov 8 23:15:44 EST 2007 #@# Talairach Failure Detection Thu Nov 8 23:18:32 EST 2007 #@# Intensity Normalization Thu Nov 8 23:18:32 EST 2007 #@# Skull Stripping Thu Nov 8 23:25:47 EST 2007 #@# EM Registration Fri Nov 9 00:11:07 EST 2007 #@# CA Normalize Fri Nov 9 00:31:03 EST 2007 #@# CA Reg Fri Nov 9 00:37:06 EST 2007 #@# CA Reg Inv Fri Nov 9 16:56:14 EST 2007 #@# Remove Neck Fri Nov 9 16:59:57 EST 2007 #@# SkullLTA Fri Nov 9 17:04:39 EST 2007 #@# SubCort Seg Fri Nov 9 17:37:12 EST 2007 #@# CC Seg Fri Nov 9 18:57:34 EST 2007 #@# Intensity Normalization2 Fri Nov 9 18:59:22 EST 2007 #@# Mask BFS Fri Nov 9 19:08:32 EST 2007 #@# WM Segmentation Fri Nov 9 19:08:45 EST 2007 #@# Fill Fri Nov 9 19:17:29 EST 2007 #@# Tessellate lh Fri Nov 9 19:19:57 EST 2007 #@# Smooth1 lh Fri Nov 9 19:20:37 EST 2007 #@# Inflation1 lh Fri Nov 9 19:20:56 EST 2007 #@# QSphere lh Fri Nov 9 19:24:27 EST 2007 #@# Fix Topology lh Fri Nov 9 19:58:03 EST 2007 #@# Make Final Surf lh Sat Nov 10 03:17:04 EST 2007 #@# Smooth2 lh Sat Nov 10 07:11:02 EST 2007 #@# Inflation2 lh Sat Nov 10 07:11:19 EST 2007 #@# Sphere lh Sat Nov 10 07:19:55 EST 2007 #@# Surf Reg lh Sat Nov 10 10:29:15 EST 2007 #@# Jacobian white lh Sat Nov 10 17:43:19 EST 2007 #@# AvgCurv lh Sat Nov 10 17:43:29 EST 2007 #@# Cortical Parc lh Sat Nov 10 17:43:37 EST 2007 #@# Parcellation Stats lh Sat Nov 10 17:44:55 EST 2007 #@# Cortical Parc 2 lh Sat Nov 10 17:45:30 EST 2007 #@# Parcellation Stats 2 lh Sat Nov 10 17:47:12 EST 2007 #@# Tessellate rh Sat Nov 10 17:47:58 EST 2007 #@# Smooth1 rh Sat Nov 10 17:48:35 EST 2007 #@# Inflation1 rh Sat Nov 10 17:48:55 EST 2007 #@# QSphere rh Sat Nov 10 17:52:32 EST 2007 #@# Fix Topology rh Sat Nov 10 18:26:17 EST 2007
I initially thought I could restart with -autorecon3 but since this did not work I restarted with:
recon-all -s subject -autorecon2 -hemi right
but this (log below) restarted with a number of (very lengthy!!!) processes that had been completed before. What is the most efficient way to restart a process and is there a way of restarting autorecon2 and have it automatically followed by autorecton3?
Sun Nov 11 11:29:26 EST 2007 #@# EM Registration Sun Nov 11 11:29:26 EST 2007 #@# CA Normalize Sun Nov 11 11:48:51 EST 2007 #@# CA Reg Sun Nov 11 11:54:14 EST 2007 #@# CA Reg Inv Mon Nov 12 04:21:18 EST 2007 #@# Remove Neck Mon Nov 12 04:25:00 EST 2007 #@# SkullLTA Mon Nov 12 04:29:31 EST 2007 #@# SubCort Seg Mon Nov 12 05:01:49 EST 2007 #@# CC Seg Mon Nov 12 06:21:48 EST 2007 #@# Intensity Normalization2 Mon Nov 12 06:23:50 EST 2007 #@# Mask BFS Mon Nov 12 06:32:52 EST 2007 #@# WM Segmentation Mon Nov 12 06:33:05 EST 2007 #@# Fill Mon Nov 12 06:41:56 EST 2007 #@# Tessellate rh Mon Nov 12 06:44:23 EST 2007 #@# Smooth1 rh Mon Nov 12 06:45:00 EST 2007 #@# Inflation1 rh Mon Nov 12 06:45:19 EST 2007 #@# QSphere rh Mon Nov 12 06:48:48 EST 2007 #@# Fix Topology rh Mon Nov 12 07:22:02 EST 2007 #@# Make Final Surf rh Mon Nov 12 15:26:46 EST 2007 #@# Smooth2 rh Mon Nov 12 19:18:37 EST 2007 #@# Inflation2 rh Mon Nov 12 19:18:55 EST 2007 #@# ASeg Stats Mon Nov 12 19:27:09 EST 2007 #@# Cortical ribbon mask rh Mon Nov 12 20:03:45 EST 2007
Finally, what is the best way to restart it from the last memory exceeded failure which interupted processing towards the end of autorecon2?
Thanks
Nic
-----Original Message----- From: Bruce Fischl [mailto:fischl@nmr.mgh.harvard.edu] Sent: Sun 11/11/2007 4:30 PM To: Nicolas Cherbuin Cc: freesurfer@nmr.mgh.harvard.edu Subject: Re: [Freesurfer] reprocessing on FS4.0 takes much longer for some scans
Hi Nicolas,
- No, sorry, the amount of RAM is sometimes dependent on the
individual anatomy, and in any case can't be predefine.
- Not sure about the itaniums. We have no real experience. 72 hours does
sound pretty long. That's I think about what it used to take on our old athlons. Can you extend the time limit?
As for the random stopping of recon-all, we have seen that sometimes as well, and are trying to track it down. It seems pretty mysterious, as a binary will exit with a nonzero exit code according to the shell even though the last printf in the code has been executed and the next statement is an exit(0).
Bruce
On Mon, 12 Nov 2007, Nicolas Cherbuin wrote:
Hi,
At the beginning of the year I processed 400 scans on a linux cluster. It had a reported ram limit of 1 Gig but it could cope with the slight excess of FS3 and a processing time limit of 48 hours (which was fine for 99% of the scans). Most scans went through autorecon-all without problem.
I am now trying to reprocess the scans with the new version but I am running into a number of problems. On the same linux cluser processing the same scans, some scans (~30%) run without problem. The rest either fail because they exceed the memory limit or because they take much more than 48 hours (the jobs are being killed and the logs report only the left and sometimes part of the right hemisphere being processed).
Since the documentation makes clear that FS works best with 2 Gig of ram, I have switched to an Itinium cluster with 2 Gig ram limit and 48 hours processing time limit. When I compare the logs of the same scans processed on both systems the Itinium cluster seems to take longer and although I am still running tests it appears that for at least some scans autoreconall might take 70+ hours.
Here are my questions:
On the linux cluster can I tell freesurfer not to exceed a certain ram allocation? and if yes how?
Do the problems I have on the Itinium cluster suggest that FS is badly configured on this system? and if yes where should we look? (I don't have access to this system's configuration and I have to feedback to the system managers to fix eventual problems.)
Thank you very much for your help and for sharing these great tools with us.
Nic
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
another thing you can do is to run it with -debug and capture all of the terminal output (both stdout and stderr). It will produce huge mounds of output, but it might be the best way.
doug
Bruce Fischl wrote:
do you know what process it was running when it failed? You could use -autorecon2-cp and -autorecon3. Something else seems like it is wrong though - you shouldn't need 3G of ram.
On Tue, 13 Nov 2007, Nicolas Cherbuin wrote:
Hi Bruce,
Thanks for your answers. I am not sure how much more ram might be needed dependant on anatomy but my last job failed due to FS using more than 3 Gig
PBS: job killed: per node vmem 34078384kb exceeded limit 2097152kb
or does it suggest there is a problem with our configuration?
Also, a related question. Since a number of jobs have failed after many hours of processing, I have had to investigate restarting jobs to finish processing the right hemisphere. For instance after the following log:
status file for recon-all Thu Nov 8 23:13:39 EST 2007 #@# MotionCor Thu Nov 8 23:13:53 EST 2007 #@# Nu Intensity Correction Thu Nov 8 23:14:33 EST 2007 #@# Talairach Thu Nov 8 23:15:44 EST 2007 #@# Talairach Failure Detection Thu Nov 8 23:18:32 EST 2007 #@# Intensity Normalization Thu Nov 8 23:18:32 EST 2007 #@# Skull Stripping Thu Nov 8 23:25:47 EST 2007 #@# EM Registration Fri Nov 9 00:11:07 EST 2007 #@# CA Normalize Fri Nov 9 00:31:03 EST 2007 #@# CA Reg Fri Nov 9 00:37:06 EST 2007 #@# CA Reg Inv Fri Nov 9 16:56:14 EST 2007 #@# Remove Neck Fri Nov 9 16:59:57 EST 2007 #@# SkullLTA Fri Nov 9 17:04:39 EST 2007 #@# SubCort Seg Fri Nov 9 17:37:12 EST 2007 #@# CC Seg Fri Nov 9 18:57:34 EST 2007 #@# Intensity Normalization2 Fri Nov 9 18:59:22 EST 2007 #@# Mask BFS Fri Nov 9 19:08:32 EST 2007 #@# WM Segmentation Fri Nov 9 19:08:45 EST 2007 #@# Fill Fri Nov 9 19:17:29 EST 2007 #@# Tessellate lh Fri Nov 9 19:19:57 EST 2007 #@# Smooth1 lh Fri Nov 9 19:20:37 EST 2007 #@# Inflation1 lh Fri Nov 9 19:20:56 EST 2007 #@# QSphere lh Fri Nov 9 19:24:27 EST 2007 #@# Fix Topology lh Fri Nov 9 19:58:03 EST 2007 #@# Make Final Surf lh Sat Nov 10 03:17:04 EST 2007 #@# Smooth2 lh Sat Nov 10 07:11:02 EST 2007 #@# Inflation2 lh Sat Nov 10 07:11:19 EST 2007 #@# Sphere lh Sat Nov 10 07:19:55 EST 2007 #@# Surf Reg lh Sat Nov 10 10:29:15 EST 2007 #@# Jacobian white lh Sat Nov 10 17:43:19 EST 2007 #@# AvgCurv lh Sat Nov 10 17:43:29 EST 2007 #@# Cortical Parc lh Sat Nov 10 17:43:37 EST 2007 #@# Parcellation Stats lh Sat Nov 10 17:44:55 EST 2007 #@# Cortical Parc 2 lh Sat Nov 10 17:45:30 EST 2007 #@# Parcellation Stats 2 lh Sat Nov 10 17:47:12 EST 2007 #@# Tessellate rh Sat Nov 10 17:47:58 EST 2007 #@# Smooth1 rh Sat Nov 10 17:48:35 EST 2007 #@# Inflation1 rh Sat Nov 10 17:48:55 EST 2007 #@# QSphere rh Sat Nov 10 17:52:32 EST 2007 #@# Fix Topology rh Sat Nov 10 18:26:17 EST 2007
I initially thought I could restart with -autorecon3 but since this did not work I restarted with:
recon-all -s subject -autorecon2 -hemi right
but this (log below) restarted with a number of (very lengthy!!!) processes that had been completed before. What is the most efficient way to restart a process and is there a way of restarting autorecon2 and have it automatically followed by autorecton3?
Sun Nov 11 11:29:26 EST 2007 #@# EM Registration Sun Nov 11 11:29:26 EST 2007 #@# CA Normalize Sun Nov 11 11:48:51 EST 2007 #@# CA Reg Sun Nov 11 11:54:14 EST 2007 #@# CA Reg Inv Mon Nov 12 04:21:18 EST 2007 #@# Remove Neck Mon Nov 12 04:25:00 EST 2007 #@# SkullLTA Mon Nov 12 04:29:31 EST 2007 #@# SubCort Seg Mon Nov 12 05:01:49 EST 2007 #@# CC Seg Mon Nov 12 06:21:48 EST 2007 #@# Intensity Normalization2 Mon Nov 12 06:23:50 EST 2007 #@# Mask BFS Mon Nov 12 06:32:52 EST 2007 #@# WM Segmentation Mon Nov 12 06:33:05 EST 2007 #@# Fill Mon Nov 12 06:41:56 EST 2007 #@# Tessellate rh Mon Nov 12 06:44:23 EST 2007 #@# Smooth1 rh Mon Nov 12 06:45:00 EST 2007 #@# Inflation1 rh Mon Nov 12 06:45:19 EST 2007 #@# QSphere rh Mon Nov 12 06:48:48 EST 2007 #@# Fix Topology rh Mon Nov 12 07:22:02 EST 2007 #@# Make Final Surf rh Mon Nov 12 15:26:46 EST 2007 #@# Smooth2 rh Mon Nov 12 19:18:37 EST 2007 #@# Inflation2 rh Mon Nov 12 19:18:55 EST 2007 #@# ASeg Stats Mon Nov 12 19:27:09 EST 2007 #@# Cortical ribbon mask rh Mon Nov 12 20:03:45 EST 2007
Finally, what is the best way to restart it from the last memory exceeded failure which interupted processing towards the end of autorecon2?
Thanks
Nic
-----Original Message----- From: Bruce Fischl [mailto:fischl@nmr.mgh.harvard.edu] Sent: Sun 11/11/2007 4:30 PM To: Nicolas Cherbuin Cc: freesurfer@nmr.mgh.harvard.edu Subject: Re: [Freesurfer] reprocessing on FS4.0 takes much longer for some scans
Hi Nicolas,
- No, sorry, the amount of RAM is sometimes dependent on the
individual anatomy, and in any case can't be predefine.
- Not sure about the itaniums. We have no real experience. 72 hours
does sound pretty long. That's I think about what it used to take on our old athlons. Can you extend the time limit?
As for the random stopping of recon-all, we have seen that sometimes as well, and are trying to track it down. It seems pretty mysterious, as a binary will exit with a nonzero exit code according to the shell even though the last printf in the code has been executed and the next statement is an exit(0).
Bruce
On Mon, 12 Nov 2007, Nicolas Cherbuin wrote:
Hi,
At the beginning of the year I processed 400 scans on a linux cluster. It had a reported ram limit of 1 Gig but it could cope with the slight excess of FS3 and a processing time limit of 48 hours (which was fine for 99% of the scans). Most scans went through autorecon-all without problem.
I am now trying to reprocess the scans with the new version but I am running into a number of problems. On the same linux cluser processing the same scans, some scans (~30%) run without problem. The rest either fail because they exceed the memory limit or because they take much more than 48 hours (the jobs are being killed and the logs report only the left and sometimes part of the right hemisphere being processed).
Since the documentation makes clear that FS works best with 2 Gig of ram, I have switched to an Itinium cluster with 2 Gig ram limit and 48 hours processing time limit. When I compare the logs of the same scans processed on both systems the Itinium cluster seems to take longer and although I am still running tests it appears that for at least some scans autoreconall might take 70+ hours.
Here are my questions:
- On the linux cluster can I tell freesurfer not to exceed a
certain ram allocation? and if yes how?
- Do the problems I have on the Itinium cluster suggest that FS is
badly configured on this system? and if yes where should we look? (I don't have access to this system's configuration and I have to feedback to the system managers to fix eventual problems.)
Thank you very much for your help and for sharing these great tools with us.
Nic
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
freesurfer@nmr.mgh.harvard.edu