diff --git a/cv/semantic_segmentation/unet3d/pytorch/README.md b/cv/semantic_segmentation/unet3d/pytorch/README.md index 5ecbccff27eda4510648dee6e0337d7359c79d3a..0cad8285eb69362ca7718534fee12624e69f841c 100644 --- a/cv/semantic_segmentation/unet3d/pytorch/README.md +++ b/cv/semantic_segmentation/unet3d/pytorch/README.md @@ -6,16 +6,15 @@ A network for volumetric segmentation that learns from sparsely annotated volume Two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. -The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. +The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. + ## Step 1: Installing ### Install packages ```shell - pip3 install 'scipy' 'tqdm' - ``` ### Prepare datasets @@ -23,48 +22,39 @@ pip3 install 'scipy' 'tqdm' if there is local 'kits19' dataset: ```shell - ln -s /path/to/kits19/ data - ``` + else: ```shell - bash prepare.sh - ``` - ## Step 2: Training ### Single GPU ```shell - bash train.sh - ``` ### Multi GPU ```shell - -bash train_dist.sh - - +export NUM_GPUS=8 +bash train_dist.sh ``` ## Results on BI-V100 -| GPUs | FP16 | FPS | Mean dice | -|------|-------| ---- |-----------| -| 1x8 | False | 11.52| 0.908 | +| GPUs | FP16 | FPS | Mean dice | +| ------ | ------- | ------- | ----------- | +| 1x8 | False | 11.52 | 0.908 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | -|----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| -| 0.908 | SDK V2.2,bs:4,8x,fp32 | 12 | 0.908 | 152\*8 | 0.85 | 19.6\*8 | 1 | - +| ---------------------- | ------------------------------------------ | ------------- | ---------- | ------------ | ------------- | ------------------------- | ----------- | +| 0.908 | SDK V2.2, bs:4, 8x, fp32 | 12 | 0.908 | 152\*8 | 0.85 | 19.6\*8 | 1 | ## Reference diff --git a/cv/semantic_segmentation/unet3d/pytorch/train_dist.sh b/cv/semantic_segmentation/unet3d/pytorch/train_dist.sh index b474a065baea359f80d31a40a857b28a062b181f..220c579b0c84d9ab4fd2fcd9bc46adc9d4e59995 100644 --- a/cv/semantic_segmentation/unet3d/pytorch/train_dist.sh +++ b/cv/semantic_segmentation/unet3d/pytorch/train_dist.sh @@ -12,7 +12,11 @@ # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. -NUM_GPUS=$1 +# export NUM_GPUS=8 +[ -z $NUM_GPUS ] && { + echo "Please export NUM_GPUS=X before run script!" + exit 1 +} SEED=1234 MAX_EPOCHS=5000 @@ -27,30 +31,29 @@ SAVE_CKPT="./ckpt_full" LOG_NAME='full_train_log.json' if [ ! -d ${SAVE_CKPT} ]; then - mkdir ${SAVE_CKPT}; + mkdir ${SAVE_CKPT} fi python3 -u -m torch.distributed.launch --nproc_per_node=${NUM_GPUS} \ ---use_env main.py --data_dir data/kits19/train \ ---epochs ${MAX_EPOCHS} \ ---evaluate_every ${EVALUATE_EVERY} \ ---start_eval_at ${START_EVAL_AT} \ ---quality_threshold ${QUALITY_THRESHOLD} \ ---batch_size ${BATCH_SIZE} \ ---optimizer sgd \ ---ga_steps ${GRADIENT_ACCUMULATION_STEPS} \ ---learning_rate ${LEARNING_RATE} \ ---seed ${SEED} \ ---lr_warmup_epochs ${LR_WARMUP_EPOCHS} \ ---output-dir ${SAVE_CKPT} \ ---log_name ${LOG_NAME} -"$@" + --use_env main.py --data_dir data/kits19/train \ + --epochs ${MAX_EPOCHS} \ + --evaluate_every ${EVALUATE_EVERY} \ + --start_eval_at ${START_EVAL_AT} \ + --quality_threshold ${QUALITY_THRESHOLD} \ + --batch_size ${BATCH_SIZE} \ + --optimizer sgd \ + --ga_steps ${GRADIENT_ACCUMULATION_STEPS} \ + --learning_rate ${LEARNING_RATE} \ + --seed ${SEED} \ + --lr_warmup_epochs ${LR_WARMUP_EPOCHS} \ + --output-dir ${SAVE_CKPT} \ + --log_name ${LOG_NAME} \ + "$@" -if [ $? -eq 0 ];then +if [ $? -eq 0 ]; then echo 'converged to the target value 0.908 of epoch 3820 in full train, stage-wise training succeed' exit 0 else echo 'not converged to the target value, training fail' exit 1 fi -