Actually I believe it's related with the CUDA architecture itself.
You cannot run multiple CUDA process at the same time in the same GPU, in a multi-core environment it may happen if you start multiple recon-all.
--------------------------------------------------------------------- Pedro Paulo de Magalhães Oliveira Junior Diretor de Operações Netfilter & SpeedComm Telecom -- www.netfilter.com.br -- For mobile: http://www.netfilter.com.br/mobile
On Thu, Aug 19, 2010 at 13:57, Nick Schmansky nicks@nmr.mgh.harvard.eduwrote:
hello cuda beta users! this problem 'all CUDA-capable devices are busy or unavailable.', seems to fall into the category of 'post-release curse', because i am seeing this problem locally as well, but havent seen it in the months we've been using the gpu code. we have found rebooting the machine seems to work, but thats not a real solution. i suspect our detection scheme is tripping a flag in the driver thats not getting untripped or cleared the next time around.
when we find a better solution, we'll post new _cuda libs on our site, which i'm expecting will be a regular occurrence over the next few months. glad to see so many willing gpu users though!
n.
On Thu, 2010-08-19 at 17:08 +0200, Daniel Guellmar wrote:
Hi folks,
I'm trying to employ the new cuda binaries, which come with freesurfer version 5.0.0, however, if I'm trying to execute a cuda binary (e.g. mri_ca_register_cuda) I get the following output:
Acquiring CUDA device Using default device CUDA Error in file 'devicemanagement.cu' on line 46 : all CUDA-capable devices are busy or unavailable.
This error occurs on two different systems which are cuda capable. Both systems run with Ubuntu 9.10, both have the latest developer driver for linux (256.40) and the latest cuda toolkit (3.1) on it. The GPU Computing SDK code samples compile and work fine. The device query on both hosts work fine ... see following output
Host 1:
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA
Device 0: "Tesla C2050" CUDA Driver Version: 3.10 CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 2 CUDA Capability Minor revision number: 0 Total amount of global memory: 2817720320 bytes Number of multiprocessors: 14 Number of cores: 448 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 1.15 GHz Concurrent copy and execution: Yes Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: Yes Device has ECC support enabled: Yes
Device 1: "Tesla C2050" CUDA Driver Version: 3.10 CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 2 CUDA Capability Minor revision number: 0 Total amount of global memory: 2817982464 bytes Number of multiprocessors: 14 Number of cores: 448 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 1.15 GHz Concurrent copy and execution: Yes Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: Yes Device has ECC support enabled: Yes
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 2, Device = Tesla C2050, Device = Tesla C2050
Host 2:
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "Tesla C1060" CUDA Driv CUDA Runtime Version: 3.10 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 3 Total amount of global memory: 4294770688 bytes Number of multiprocessors: 30 Number of cores: 240 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Clock rate: 1.30 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Concurrent kernel execution: No Device has ECC support enabled: No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.10, CUDA Runtime Version = 3.10, NumDevs = 1, Device = Tesla C1060
Any comments on that?
Regards and thanks in advance, Daniel
--
Dr.-Ing. Daniel Güllmar Medical Physics Group / IDIR I Jena University Hospital MRT-Gebäude am Steiger Philosophenweg 3 07743 Jena
Tel: +49-3641-9-35373 Fax: +49-3641-9-35081 www: http://ww.mrt.uni-jena.de ____________________ Universitätsklinikum Jena Körperschaft des öffentlichen Rechts und Teilkörperschaft der Friedrich-Schiller-Universität Jena Bachstraße 18, 07743 Jena Verwaltungsratsvorsitzender: Prof. Dr. Thomas Deufel; Medizinischer Vorstand: Prof. Dr. Klaus Höffken; Wissenschaftlicher Vorstand: Prof. Dr. Klaus Benndorf; Kaufmännischer Vorstand und Sprecher des Klinikumsvorstandes Rudolf Kruse Bankverbindung: Sparkasse Jena; BLZ: 830 530 30; Kto.: 221; Gerichtsstand Jena Steuernummer: 161/144/02978; USt.-IdNr. : DE 150545777 _______________________________________________ Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
Freesurfer mailing list Freesurfer@nmr.mgh.harvard.edu https://mail.nmr.mgh.harvard.edu/mailman/listinfo/freesurfer
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.