New subject: I/O performance mitigation

3 Sep 2010

    we don't sorry. We are trying to replicate it here. The fact
    that it's in an MNI tool and not one of our's makes it harder
    to track down.

    The delay between jobs for us is driven by how robust and
    rapid your storage is. Certainly a minute or two should be
    enough I would think, altough we launch much more rapidly than
    that typically

    Bruce


    On Thu, 26 May 2011, Mehul Sampat wrote:

            Hi Bruce and others,
            My question is related to the staggering of FS jobs
            (see thread below).
            Could you tell me by how much time you stagger the
            freesurfer on a cluster ?


            I am using FS 5.1 on a cluster which uses sun grid
            engine and I have the
            following error when I try to submit a large number of
            jobs
            ..I do not see it for small job batches and I think it
            might be related to
            staggering FS jobs but I am not sure.

            the error message from recon-all.log:

            *nu_estimate_np_and_em: crashed while running
            spline_smooth (termination
            status=11)
            nu_correct: crashed while running
            nu_estimate_np_and_em (termination
            status=65280)
            *
            the message in recon-all.error

            *PWD /work/01523/msampat/**freesurfer-5.1/subjects/**ms0880_01/mri
            CMD mri_nu_correct.mni --i orig.mgz --o nu.mgz --uchar
            transforms/talairach.xfm --proto-iters 1000 --distance
            50 --n 1
            *
            First i thought, it was a memory issue but the sun
            grid engine is supposed
            to assign 4gb per core, so i thought it was enough
            memory to run each case.
            I am investigating if the memory is not allocated
            correctly..

            If anyone knows what this error is related to, could
            you please let me know
            ?
            Thanks
            Mehul

            On Fri, Sep 3, 2010 at 10:25 AM, Bruce Fischl
            <fischl@nmr.mgh.harvard.edu>wrote:

                    Hi David,

                    I'm surprised you didn't have to do this in
                    the past. We always space our
                    jobs out. Glad there's an easy workaround

                    cheers
                    Bruce

                    On Fri, 3 Sep 2010, David Mischel wrote:

                            We took the suggestion of staggering
                            the launch of Freesurfer 5.0
                    recon-all
                            jobs. The attached Word doc (I don't
                            know how to contribute this
                    information
                            other than attaching the image and
                            text using Word) shows a load graph on
                            our file server. When 20 FS jobs began
                            at once (all processing servers
                    using
                            a single file server) the load on the
                            file server bulged up. When we
                    spaced
                            out the launch of each job by 15
                            seconds the load hardly budged.


                            We have not had to do this in the past
                            with earlier versions of
                    Freesurfer
                            but this is an obvious work around to
                            the problem we encountered.


                            < david


                            David Mischel

                            Manager of IT

                            Center for Imaging of
                            Neurodegenerative Diseases (CIND)

                            <http://www.cind.research.va.gov/>
                            http://www.cind.research.va.gov/

                            VA Medical Center

                            4150 Clement Street, 114M

                            San Francisco, CA 94121

                            voice: 415-221-4810 x3864

                            fax: 415-668-2864


                    _______________________________________________
                    Freesurfer mailing list
                    Freesurfer@nmr.mgh.harvard.edu
                    https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer


                    The information in this e-mail is intended
                    only for the person to whom it
                    is
                    addressed. If you believe this e-mail was sent
                    to you in error and the
                    e-mail
                    contains patient information, please contact
                    the Partners Compliance
                    HelpLine at
                    http://www.partners.org/complianceline . If
                    the e-mail was sent to you in
                    error
                    but does not contain patient information,
                    please contact the sender and
                    properly
                    dispose of the e-mail.

Re: [Freesurfer] I/O performance mitigation