External Email - Use Caution
Hi, I'm working on applying deep learning in image registration for a master's project. The overarching goal was to use real tissue clearing lightsheet data to perform the image registration, but first created some synthetically deformed brains based on the Allen Brain Atlas and Lab2im algorithm https://secure-web.cisco.com/15MqNxXe5KXHPIE2voNOwPZsAQlEcjvJ2VaFxWcPmWFuUFfOEQe4AMH0IYGchsBxA3YmLas1GLEv7s9G3P5ioh_yazsoiY9dRzVT4BmDN8TIThSkf_xKudws2liuNNnGwwtTHI7dpGPICW-6DJUo0PMc1LUWSaNh7NVXUDLUEFmAGVOQkIdo19zLNYWdMncoo0Hn2Ol5xwpQJL9a_iMrXdBTs4MX3tWc31gYxcEnw6h3y1AI3BiyXddJjYZFNPdMuDNqOWRNXiMraq-SGBdrEfH75sWgEdp7886V2Lxj1UmOeX3cqPDfaTsgrcEl38rUP/https%3A%2F%2Fgithub.com%2FBBillot%2Flab2im. However, I'm having some issues implementing the command in an Centos Linux 7 (Core) HPC environment. The command gets Killed, I'm wondering if you could provide some more insight? Thank you!
I've set up my environment as follows:
module load mamba mamba create -n freesurf # I named the environment freesurf conda activate freesurf mamba install tensorflow-gpu cuda-version=11.8
I've attached the easyreg.sh shell script that I run. Here's the output:
(freesurfer) kanex161@agc02 [~/Documents/Thesis/code] % bash easyreg.sh 2023-05-09 11:19:32.374206: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2023-05-09 11:19:32.405672: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2023-05-09 11:19:32.405981: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-09 11:19:33.258226: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [W interface.cpp:47] Warning: Loading nvfuser library failed with: Error in dlopen: libnvfuser_codegen.so: cannot open shared object file: No such file or directory (function LoadingNvfuserLibrary) using 1 thread Segmenting reference image Reading reference image Setting up segmentation net 2023-05-09 11:19:38.489093: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2023-05-09 11:19:38.489166: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: agc02 2023-05-09 11:19:38.489202: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: agc02 2023-05-09 11:19:38.489460: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 470.103.1 2023-05-09 11:19:38.489502: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 470.103.1 2023-05-09 11:19:38.489530: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:309] kernel version seems to match DSO: 470.103.1 Inference / segmentation easyreg.sh: line 8: 1108392 Killed ./mri_easyreg --ref /home/umii/kanex161/Documents/Thesis/data/outputs/atlas/atlas.nii.gz --flo /home/umii/kanex161/Documents/Thesis/data/outputs/lab2im/brains/brain_00.nii.gz --ref_seg /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/ref_seg.nii.gz --flo_seg /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/flo_seg.nii.gz --flo_reg /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/flo_reg.nii.gz --fwd_field /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/fwd_field.nii.gz