Hi,
I'm working on applying deep learning in image registration for a master's project. The overarching goal was to use real tissue clearing lightsheet data to perform the image registration, but first created some synthetically deformed brains based on the Allen Brain Atlas and
Lab2im algorithm. However, I'm having some issues implementing the command in an Centos Linux 7 (Core) HPC environment. The command gets Killed, I'm wondering if you could provide some more insight? Thank you!
I've set up my environment as follows:
module load mamba
mamba create -n freesurf # I named the environment freesurf
conda activate freesurf
mamba install tensorflow-gpu cuda-version=11.8
I've attached the easyreg.sh shell script that I run. Here's the output:
(freesurfer) kanex161@agc02 [~/Documents/Thesis/code] % bash easyreg.sh
2023-05-09 11:19:32.374206: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-09 11:19:32.405672: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-05-09 11:19:32.405981: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-09 11:19:33.258226: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[W interface.cpp:47] Warning: Loading nvfuser library failed with: Error in dlopen: libnvfuser_codegen.so: cannot open shared object file: No such file or directory (function LoadingNvfuserLibrary)
using 1 thread
Segmenting reference image
Reading reference image
Setting up segmentation net
2023-05-09 11:19:38.489093: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-05-09 11:19:38.489166: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: agc02
2023-05-09 11:19:38.489202: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: agc02
2023-05-09 11:19:38.489460: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 470.103.1
2023-05-09 11:19:38.489502: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 470.103.1
2023-05-09 11:19:38.489530: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:309] kernel version seems to match DSO: 470.103.1
Inference / segmentation
easyreg.sh: line 8: 1108392 Killed ./mri_easyreg --ref /home/umii/kanex161/Documents/Thesis/data/outputs/atlas/atlas.nii.gz --flo /home/umii/kanex161/Documents/Thesis/data/outputs/lab2im/brains/brain_00.nii.gz --ref_seg /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/ref_seg.nii.gz --flo_seg /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/flo_seg.nii.gz --flo_reg /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/flo_reg.nii.gz --fwd_field /home/umii/kanex161/Documents/Thesis/data/outputs/easyreg/fwd_field.nii.gz