From 970980fcc8eb1e1d2f491a589d9a70721dc145c0 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Fri, 7 Mar 2025 15:13:28 +0800 Subject: [PATCH 1/4] unify model readme format - audio --- .../conformer/paddlepaddle/README.md | 29 +++--- .../conformer_wenet/pytorch/README.md | 24 ++--- .../pytorch/README.md | 14 +-- .../speech_recognition/rnnt/pytorch/README.md | 55 +++++------- .../transformer_wenet/pytorch/README.md | 14 +-- .../u2++_conformer_wenet/pytorch/README.md | 14 +-- .../unified_conformer_wenet/pytorch/README.md | 14 +-- audio/speech_synthesis/README.md | 1 - .../fastspeech2/paddlepaddle/README.md | 88 +++++++++---------- .../hifigan/paddlepaddle/README.md | 42 ++++----- .../tacotron2/pytorch/README.md | 43 +++++---- .../speech_synthesis/vqmivc/pytorch/README.md | 20 +++-- .../waveglow/pytorch/README.md | 33 ++++--- 13 files changed, 197 insertions(+), 194 deletions(-) delete mode 100644 audio/speech_synthesis/README.md diff --git a/audio/speech_recognition/conformer/paddlepaddle/README.md b/audio/speech_recognition/conformer/paddlepaddle/README.md index 6e0618373..05356b0f8 100644 --- a/audio/speech_recognition/conformer/paddlepaddle/README.md +++ b/audio/speech_recognition/conformer/paddlepaddle/README.md @@ -1,6 +1,6 @@ # Conformer -## Model description +## Model Description Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing @@ -12,33 +12,36 @@ CNN based models achieving state-of-the-art accuracies. On the widely used Libri of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters. -## Step 1:Installation +## Model Preparation -```sh -git clone --recursive -b r1.4 https://github.com/PaddlePaddle/PaddleSpeech.git -cd PaddleSpeech -pip3 install . -``` - -## Step 2:Preparing datasets +### Prepare Resources ```sh -cd examples/aishell/asr1/ +git clone --recursive -b r1.4 https://github.com/PaddlePaddle/PaddleSpeech.git +pushd PaddleSpeech/examples/aishell/asr1/ bash run.sh --stage 0 --stop_stage 0 +popd ``` "run.sh" will download and process the datasets, The download process may be slow, you can download the data_aishell.tgz from [wenet](http://openslr.magicdatatech.com/resources/33/data_aishell.tgz) and put it in the /path/to/PaddleSpeech/dataset/aishell/, then return to execute the above command. -## Step 3:Training +### Install Dependencies + +```sh +(cd PaddleSpeech/ && pip3 install .) +``` + +## Model Training ```sh +cd PaddleSpeech/examples/aishell/asr1/ bash run.sh --stage 1 --stop_stage 3 ``` -## Results +## Model Results -| GPUs | IPS | CER | +| GPU | IPS | CER | |-------------|------|-----------------------| | BI-V100 x 4 | 48.5 | 0.0495(checkpoint 81) | diff --git a/audio/speech_recognition/conformer_wenet/pytorch/README.md b/audio/speech_recognition/conformer_wenet/pytorch/README.md index b05608f49..6d39b05d4 100755 --- a/audio/speech_recognition/conformer_wenet/pytorch/README.md +++ b/audio/speech_recognition/conformer_wenet/pytorch/README.md @@ -1,20 +1,22 @@ -# Conformer +# Conformer (WeNet) -## Model description +## Model Description The Conformer neural network is a hybrid model designed for tasks like speech recognition, merging the strengths of convolutional neural networks (CNNs) and transformers. It employs CNNs for local feature extraction and transformers to capture long-range dependencies in data. This combination allows the Conformer to efficiently handle both local patterns and global relationships, making it particularly effective for audio and speech tasks. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```sh cd ../../../../toolbox/WeNet/ bash install_toolbox_wenet.sh ``` -## Step 2: Training +## Model Training Dataset is data_aishell.tgz and resource_aishell.tgz from wenet. You could just run the whole script, which will download the dataset automatically. @@ -54,17 +56,17 @@ bash run.sh --stage 5 --stop-stage 5 bash run.sh --stage 6 --stop-stage 6 ``` -## Results on BI-V100 +## Model Results -| GPUs | FP16 | QPS | -|------|-------|-----| -| 1x8 | False | 67 | -| 1x8 | True | 288 | +| GPU | FP16 | QPS | +|-----------|-------|-----| +| BI-v100x8 | False | 67 | +| BI-v100x8 | True | 288 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| | 3.72 | SDK V2.2,bs:32,8x,fp32 | 380 | 4.79@cer | 113\*8 | 0.82 | 21.5\*8 | 1 | -## Reference +## References - +- [wenet](https://github.com/wenet-e2e/wenet) diff --git a/audio/speech_recognition/efficient_conformer_v2_wenet/pytorch/README.md b/audio/speech_recognition/efficient_conformer_v2_wenet/pytorch/README.md index d35cb89b4..1ac4b0f62 100644 --- a/audio/speech_recognition/efficient_conformer_v2_wenet/pytorch/README.md +++ b/audio/speech_recognition/efficient_conformer_v2_wenet/pytorch/README.md @@ -1,13 +1,15 @@ -# Efficient Conformer V2 +# Efficient Conformer V2 (WeNet) -## Model description +## Model Description EfficientFormerV2 mimics MobileNet with its convolutional structure, offering transformers a series of designs and optimizations for mobile acceleration. The number of parameters and latency of the model are critical for resource-constrained hardware, so EfficientFormerV2 combines a fine-grained joint search strategy to propose an efficient network with low latency and size. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```sh cd ../../../../toolbox/WeNet/ @@ -18,7 +20,7 @@ pip3 install -r requirements.txt ``` -## Step 2: Training +## Model Training Dataset is data_aishell.tgz and resource_aishell.tgz from wenet. You could just run the whole script, which will download the dataset automatically. @@ -66,12 +68,12 @@ bash run.sh --stage 5 --stop-stage 5 bash run.sh --stage 6 --stop-stage 6 ``` -## Results +## Model Results | GPUs | QPS | WER(ctc_greedy_search) | WER(ctc_prefix_beam_search) | WER(attention) | WER(attention_rescoring) | |------------|-----|------------------------|-----------------------------|----------------|--------------------------| | BI-V100 x8 | 234 | 5.00% | 4.99% | 4.89% | 4.58% | -## Reference +## References - [WeNet](https://github.com/wenet-e2e/wenet) diff --git a/audio/speech_recognition/rnnt/pytorch/README.md b/audio/speech_recognition/rnnt/pytorch/README.md index 66d044631..17dff16a0 100644 --- a/audio/speech_recognition/rnnt/pytorch/README.md +++ b/audio/speech_recognition/rnnt/pytorch/README.md @@ -1,46 +1,39 @@ # RNN-T -## Model description +## Model Description -Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into -output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to -name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output -sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent -neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such -representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to -perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of -many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. -This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in -principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for -phoneme recognition are provided on the TIMIT speech corpus. +RNN-T (Recurrent Neural Network Transducer) is an end-to-end deep learning model designed for sequence-to-sequence +tasks, particularly in speech recognition. It combines RNNs with a transducer architecture to directly map input +sequences (like audio) to output sequences (like text) without requiring explicit alignment. The model consists of an +encoder network that processes the input, a prediction network that models the output sequence, and a joint network that +combines these representations. RNN-T handles variable-length input/output sequences and learns alignments automatically +during training. It's particularly effective for speech recognition as it can process continuous audio streams and +output text in real-time, achieving state-of-the-art performance on various benchmarks. -## Step 1: Installing packages +## Model Preparation -Install required libraries: +### Prepare Resources ```sh -bash install.sh -``` - -## Step 2: Preparing datasets +# Download LibriSpeech [http://www.openslr.org/12](http://www.openslr.org/12) +bash scripts/download_librispeech.sh ${DATA_ROOT_DIR} -download LibriSpeech [http://www.openslr.org/12](http://www.openslr.org/12) +# Preprocess LibriSpeech ```sh -bash scripts/download_librispeech.sh ${DATA_ROOT_DIR} +bash scripts/preprocess_librispeech.sh ${DATA_ROOT_DIR} ``` -preprocess LibriSpeech +### Install Dependencies ```sh -bash scripts/preprocess_librispeech.sh ${DATA_ROOT_DIR} +bash install.sh ``` -## Step 3: Training - -### Setup config yaml +## Model Training ```sh +# Setup config yaml sed -i "s#MODIFY_DATASET_DIR#${DATA_ROOT_DIR}/LibriSpeech#g" configs/baseline_v3-1023sp.yaml ``` @@ -59,12 +52,12 @@ Following conditions were tested, you can run any of them below: | 4 cards | `train_rnnt_1x4.sh` | | 8 cards | `train_rnnt_1x8.sh` | -## Results on BI-V100 +## Model Results -| GPUs | FP16 | FPS | WER | -|------|-------|-----|-------| -| 1x8 | False | 20 | 0.058 | +| Model | GPUs | FP16 | FPS | WER | +|-------|------------|-------|-----|-------| +| RNN-T | BI-V100 x8 | False | 20 | 0.058 | -## Reference +## References - +- [mlcommons](https://github.com/mlcommons/training/tree/master/rnn_speech_recognition/pytorch) diff --git a/audio/speech_recognition/transformer_wenet/pytorch/README.md b/audio/speech_recognition/transformer_wenet/pytorch/README.md index 62a77dede..3138a0244 100755 --- a/audio/speech_recognition/transformer_wenet/pytorch/README.md +++ b/audio/speech_recognition/transformer_wenet/pytorch/README.md @@ -1,6 +1,6 @@ -# Transformer +# Transformer (WeNet) -## Model description +## Model Description The Transformer architecture, introduced in the paper "Attention Is All You Need" by Vaswani et al. in 2017, revolutionized the field of deep learning. It relies on a mechanism called self-attention to process input data in @@ -8,14 +8,16 @@ parallel (as opposed to sequentially) and capture complex dependencies in data, sequence. Transformers have since become the foundation for state-of-the-art models in various tasks, especially in natural language processing, such as the BERT and GPT series. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```sh cd ../../../../toolbox/WeNet/ bash install_toolbox_wenet.sh ``` -## Step 2: Training +## Model Training Dataset is data_aishell.tgz and resource_aishell.tgz from wenet. You could just run the whole script, which will download the dataset automatically. @@ -55,12 +57,12 @@ bash run.sh --stage 5 --stop-stage 5 bash run.sh --stage 6 --stop-stage 6 ``` -## Results +## Model Results | GPUs | FP16 | QPS | WER (ctc_greedy_search) | WER (ctc_prefix_beam_search) | WER (attention) | WER (attention_rescoring) | |------------|-------|-----|-------------------------|------------------------------|-----------------|---------------------------| | BI-V100 x8 | False | 394 | 5.78% | 5.78% | 5.59% | 5.17% | -## Reference +## References - [WeNet](https://github.com/wenet-e2e/wenet) diff --git a/audio/speech_recognition/u2++_conformer_wenet/pytorch/README.md b/audio/speech_recognition/u2++_conformer_wenet/pytorch/README.md index 7e3d89dad..555bc945f 100755 --- a/audio/speech_recognition/u2++_conformer_wenet/pytorch/README.md +++ b/audio/speech_recognition/u2++_conformer_wenet/pytorch/README.md @@ -1,19 +1,21 @@ -# U2++ Conformer +# U2++ Conformer (WeNet) -## Model description +## Model Description U2++, an enhanced version of U2 to further improve the accuracy. The core idea of U2++ is to use the forward and the backward information of the labeling sequences at the same time at training to learn richer information, and combine the forward and backward prediction at decoding to give more accurate recognition results. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```sh cd ../../../../toolbox/WeNet/ bash install_toolbox_wenet.sh ``` -## Step 2: Training +## Model Training Dataset is data_aishell.tgz and resource_aishell.tgz from wenet. You could just run the whole script, which will download the dataset automatically. @@ -53,12 +55,12 @@ bash run.sh --stage 5 --stop-stage 5 bash run.sh --stage 6 --stop-stage 6 ``` -## Results +## Model Results | GPUs | FP16 | QPS | WER(ctc_greedy_search) | WER(ctc_prefix_beam_search) | WER(attention) | WER(attention_rescoring) | |------------|-------|-----|------------------------|-----------------------------|----------------|--------------------------| | BI-V100 x8 | False | 272 | 5.21% | 5.21% | 5.13% | 4.82% | -## Reference +## References - [WeNet](https://github.com/wenet-e2e/wenet) diff --git a/audio/speech_recognition/unified_conformer_wenet/pytorch/README.md b/audio/speech_recognition/unified_conformer_wenet/pytorch/README.md index 18b58ea0b..6a2eb85f0 100755 --- a/audio/speech_recognition/unified_conformer_wenet/pytorch/README.md +++ b/audio/speech_recognition/unified_conformer_wenet/pytorch/README.md @@ -1,20 +1,22 @@ -# Unified Conformer +# Unified Conformer (WeNet) -## Model description +## Model Description Unified Conformer is an architecture that has become state-of-the-art in the field of Automatic Speech Recognition (ASR). It is a variant of the Transformer architecture which has achieved extraordinary performance in Natural Language Processing and Computer Vision tasks thanks to its powerful self-attention mechanism¹. The Conformer architecture has been modified from ASR to Automatic Speaker Verification (ASV) with very minor changes. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```sh cd ../../../../toolbox/WeNet/ bash install_toolbox_wenet.sh ``` -## Step 2: Training +## Model Training Dataset is data_aishell.tgz and resource_aishell.tgz from wenet. You could just run the whole script, which will download the dataset automatically. @@ -54,12 +56,12 @@ bash run.sh --stage 5 --stop-stage 5 bash run.sh --stage 6 --stop-stage 6 ``` -## Results +## Model Results | GPUs | FP16 | QPS | WER(ctc_greedy_search) | WER(ctc_prefix_beam_search) | WER(attention) | WER(attention_rescoring) | |------------|-------|-----|------------------------|-----------------------------|----------------|--------------------------| | BI-V100 x8 | False | 257 | 5.60% | 5.60% | 5.46% | 4.98% | -## Reference +## References - [WeNet](https://github.com/wenet-e2e/wenet) diff --git a/audio/speech_synthesis/README.md b/audio/speech_synthesis/README.md deleted file mode 100644 index d672e3a88..000000000 --- a/audio/speech_synthesis/README.md +++ /dev/null @@ -1 +0,0 @@ -# Speech Synthesis diff --git a/audio/speech_synthesis/fastspeech2/paddlepaddle/README.md b/audio/speech_synthesis/fastspeech2/paddlepaddle/README.md index a5b33637f..bce73c30e 100644 --- a/audio/speech_synthesis/fastspeech2/paddlepaddle/README.md +++ b/audio/speech_synthesis/fastspeech2/paddlepaddle/README.md @@ -1,6 +1,6 @@ # PP-TTS-FastSpeech2 -## Model description +## Model Description Non-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. FastSpeech 2s is the first attempt to directly generate speech @@ -8,7 +8,39 @@ waveform from text in parallel, enjoying the benefit of fully end-to-end inferen FastSpeech 2 achieves a 3x training speed-up over FastSpeech, and FastSpeech 2s enjoys even faster inference speed; 2) FastSpeech 2 and 2s outperform FastSpeech in voice quality, and FastSpeech 2 can even surpass autoregressive models. -## Step 1:Installation +## Model Preparation + +### Prepare Resources + +### Download and Extract + +Download CSMSC(BZNSYP) from this [Website.](https://aistudio.baidu.com/datasetdetail/36741) and extract it to +./datasets. Then the dataset is in the directory ./datasets/BZNSYP. + +### Get MFA Result and Extract + +We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2. You can +download from here +[baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz). + +Put the data directory structure like this: + +```sh +tts3 +├── baker_alignment_tone +├── conf +├── datasets +│ └── BZNSYP +│ ├── PhoneLabeling +│ ├── ProsodyLabeling +│ └── Wave +├── local +└── ... +``` + +Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: `--rootdir=./datasets/BZNSYP` + +### Install Dependencies ```sh # Pip the requirements @@ -56,37 +88,7 @@ rm -rf libstdc++.so.6 ln -s libstdc++.so.6.0.25 libstdc++.so.6 ``` -## Step 2:Preparing datasets - -### Download and Extract - -Download CSMSC(BZNSYP) from this [Website.](https://aistudio.baidu.com/datasetdetail/36741) and extract it to -./datasets. Then the dataset is in the directory ./datasets/BZNSYP. - -### Get MFA Result and Extract - -We use [MFA](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for fastspeech2. You can -download from here -[baker_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/BZNSYP/with_tone/baker_alignment_tone.tar.gz). - -Put the data directory structure like this: - -```sh -tts3 -├── baker_alignment_tone -├── conf -├── datasets -│ └── BZNSYP -│ ├── PhoneLabeling -│ ├── ProsodyLabeling -│ └── Wave -├── local -└── ... -``` - -Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: `--rootdir=./datasets/BZNSYP` - -### Data preprocessing +### Preprocess Data ```sh PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' ./run.sh --stage 0 --stop-stage 0 @@ -112,17 +114,15 @@ dump └── speech_stats.npy ``` -## Step 3:Training - ### Model Training You can choose use how many gpus for training by changing gups parameter in run.sh file and ngpu parameter in ./local/train.sh file. -``` +```bash PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' ./run.sh --stage 1 --stop-stage 1 ``` -### Synthesizing +#### Synthesizing We use [parallel wavegan](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc1) as the neural vocoder. Download pretrained parallel wavegan model from @@ -156,18 +156,18 @@ self.session_g2pW = onnxruntime.InferenceSession( ./run.sh --stage 2 --stop-stage 3 ``` -### Inferencing +### Model Inference ```sh ./run.sh --stage 4 --stop-stage 4 ``` -## Results +## Model Results -| GPUS | avg_ips | l1 loss | duration loss | pitch loss | energy loss | loss | -|------------|--------------------|---------|---------------|------------|-------------|-------| -| BI 100 × 8 | 71.19sequences/sec | 0.603 | 0.037 | 0.327 | 0.151 | 1.118 | +| GPUS | avg_ips | l1 loss | duration loss | pitch loss | energy loss | loss | +|------------|---------------------|---------|---------------|------------|-------------|-------| +| BI 100 × 8 | 71.19 sequences/sec | 0.603 | 0.037 | 0.327 | 0.151 | 1.118 | -## Reference +## References -[FastSpeech2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3) +- [FastSpeech2](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3) diff --git a/audio/speech_synthesis/hifigan/paddlepaddle/README.md b/audio/speech_synthesis/hifigan/paddlepaddle/README.md index cd8e2cff8..d9e0063f4 100644 --- a/audio/speech_synthesis/hifigan/paddlepaddle/README.md +++ b/audio/speech_synthesis/hifigan/paddlepaddle/README.md @@ -1,24 +1,14 @@ # PP-TTS-HiFiGAN -## Model description +## Model Description HiFiGAN is a commonly used vocoder in academia and industry in recent years, which can convert the frequency spectrum generated by acoustic models into high-quality audio. This vocoder uses generative adversarial networks as the basis for generating models. -## Step 1:Installation +## Model Preparation -```sh -# Pip the requirements -pip3 install requirements.txt - -# Clone the repo -git clone https://github.com/PaddlePaddle/PaddleSpeech.git -cd PaddleSpeech/examples/csmsc/voc5 - -``` - -## Step 2:Preparing datasets +### Prepare Resources ### Download and Extract @@ -48,7 +38,19 @@ voc5 Change the rootdir of dataset in ./local/preprocess.sh to the dataset path. Like this: `--rootdir=./datasets/BZNSYP` -### Data preprocessing +### Install Dependencies + +```sh +# Pip the requirements +pip3 install requirements.txt + +# Clone the repo +git clone https://github.com/PaddlePaddle/PaddleSpeech.git +cd PaddleSpeech/examples/csmsc/voc5 + +``` + +### Preprocess Data ```sh ./run.sh --stage 0 --stop-stage 0 @@ -70,8 +72,6 @@ dump └── feats_stats.npy ``` -## Step 3:Training - ### Model Training You can choose use how many gpus for training by changing `gups` parameter in run.sh file and `ngpu` parameter in ./local/train.sh file. @@ -82,7 +82,9 @@ Modify `./local/train.sh `file to use python3 run. sed -i 's/python /python3 /g' ./local/train.sh ``` -Full training may cost much time, you can modify the `train_max_steps` parameter in ./conf/default.yaml file to reduce training time. But in order to get the weight file you should make the `train_max_steps` parameter bigger than `save_interval_steps ` parameter. +Full training may cost much time, you can modify the `train_max_steps` parameter in ./conf/default.yaml file to reduce +training time. But in order to get the weight file you should make the `train_max_steps` parameter bigger than +`save_interval_steps ` parameter. ```sh ./run.sh --stage 1 --stop-stage 1 @@ -96,7 +98,7 @@ Modify the parameter of `ckpt_name` in run.sh file to the weight name after trai ./run.sh --stage 2 --stop-stage 2 ``` -## Results +## Model Results Main results after 1000 step train. @@ -104,6 +106,6 @@ Main results after 1000 step train. |-------------|---------------------|------------------|-----------------------|----------|----------------|-----------|-----------|--------------------| | BI V100 × 1 | 15.42 sequences/sec | 6.276 | 0.845 | 0.531 | 31.858 | 0.513 | 0.6289 | 1.142 | -## Reference +## References -[HiFiGAN](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5) +- [HiFiGAN](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/voc5) diff --git a/audio/speech_synthesis/tacotron2/pytorch/README.md b/audio/speech_synthesis/tacotron2/pytorch/README.md index 9bd8eeb05..f7f113ce6 100644 --- a/audio/speech_synthesis/tacotron2/pytorch/README.md +++ b/audio/speech_synthesis/tacotron2/pytorch/README.md @@ -1,6 +1,6 @@ # Tacotron2 -## Model description +## Model Description This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale @@ -11,55 +11,52 @@ evaluate the impact of using mel spectrograms as the input to WaveNet instead of We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture. -## Step 1: Installing packages +## Model Preparation -```sh -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources 1.Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) in the current directory; - wget -c ; - tar -jxvf LJSpeech-1.1.tar.bz2; -## Step 3: Training - -First, create a directory to save output and logs. +### Install Dependencies ```sh -mkdir outdir logdir +pip3 install -r requirements.txt ``` -### On single GPU +## Model Training + +First, create a directory to save output and logs. ```sh -python3 train.py --output_directory=outdir --log_directory=logdir --target_val_loss=0.5 +mkdir outdir logdir ``` -### Multiple GPUs on one machine +## ```sh +# On single GPU +python3 train.py --output_directory=outdir --log_directory=logdir --target_val_loss=0.5 + +# Multiple GPUs on one machine python3 -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True --target_val_loss=0.5 -``` -### Multiple GPUs on one machine (AMP) - -```sh +# Multiple GPUs on one machine (AMP) python3 -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True --target_val_loss=0.5 ``` -## Results on BI-V100 +## Model Results -| GPUs | FP16 | FPS | Score(MOS) | -|------|------|-----|------------| -| 1x8 | True | 9.2 | 4.460 | +| GPU | FP16 | FPS | Score(MOS) | +|------------|------|-----|------------| +| BI-V100 x8 | True | 9.2 | 4.460 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| | score(MOS):4.460 | SDK V2.2,bs:128,8x,AMP | 77 | 4.46 | 128\*8 | 0.96 | 18.4\*8 | 1 | -## Reference +## References - [tacotron2](https://github.com/NVIDIA/tacotron2) diff --git a/audio/speech_synthesis/vqmivc/pytorch/README.md b/audio/speech_synthesis/vqmivc/pytorch/README.md index bac78a9df..b7e062d39 100644 --- a/audio/speech_synthesis/vqmivc/pytorch/README.md +++ b/audio/speech_synthesis/vqmivc/pytorch/README.md @@ -1,6 +1,6 @@ # VQMIVC -## Model description +## Model Description One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally @@ -14,7 +14,9 @@ target speaker characteristics. In doing so, the proposed approach achieves high similarity than current state-of-the-art one-shot VC systems. Our code, pre-trained models and demo are available at . -## Step 1: Preparing datasets +## Model Preparation + +### Prepare Resources ```sh mkdir -p /home/data/vqmivc/ @@ -23,7 +25,7 @@ wget https://datashare.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip unzip VCTK-Corpus-0.92.zip ``` -## Step 2: Preprocess +## Preprocess Data ```sh cd ${DEEPSPARKHUB_ROOT}/speech/speech_synthesis/vqmivc/pytorch/ @@ -33,28 +35,28 @@ python3 preprocess.py ln -s vqmivc/data . ``` -## Step 3: Training +## Model Training -* Training with mutual information minimization (MIM): +- Training with mutual information minimization (MIM): ```sh export HYDRA_FULL_ERROR=1 python3 train.py use_CSMI=True use_CPMI=True use_PSMI=True ``` -* Training without MIM: +- Training without MIM: ```sh python3 train.py use_CSMI=False use_CPMI=False use_PSMI=False ``` -## Results on BI-V100 +## Model Results | Card Type | recon loss | cps loss | vq loss | perpexlity | lld cs loss | mi cs loss | lld ps loss | mi ps loss | lld cp loss | mi cp loss | used time(s) | |-----------|------------|----------|---------|------------|-------------|------------|-------------|------------|-------------|------------|--------------| -| BI | 0.635 | 1.062 | 0.453 | 401.693 | 110.958 | 2.653E-4 | 0.052 | 0.001 | 219.895 | 0.021 | 4.315 | +| BI-V100 | 0.635 | 1.062 | 0.453 | 401.693 | 110.958 | 2.653E-4 | 0.052 | 0.001 | 219.895 | 0.021 | 4.315 | | | -## Reference +## References - [VQMIVC](https://github.com/Wendison/VQMIVC) diff --git a/audio/speech_synthesis/waveglow/pytorch/README.md b/audio/speech_synthesis/waveglow/pytorch/README.md index e50f14f36..00058b66e 100644 --- a/audio/speech_synthesis/waveglow/pytorch/README.md +++ b/audio/speech_synthesis/waveglow/pytorch/README.md @@ -1,6 +1,6 @@ # WaveGlow -## Model description +## Model Description In our recent paper, we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality @@ -8,42 +8,39 @@ audio synthesis, without the need for auto-regression. WaveGlow is implemented u using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. -## Step 1: Installing packages +## Model Preparation -```sh -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources Download and extract the [LJ Speech dataset](https://keithito.com/LJ-Speech-Dataset/) in the current directory. - wget -c ; - tar -jxvf LJSpeech-1.1.tar.bz2 -## Step 3: Training - -### On single GPU +### Install Dependencies ```sh -python3 train.py -c config.json +pip3 install -r requirements.txt ``` -### Multiple GPUs on one machine - -> Warning: DDP & AMP(mixed precision training set `"fp16_run": true` on `config.json`) +## Model Training ```sh +# On single GPU +python3 train.py -c config.json + +# Multiple GPUs on one machine +## Warning: DDP & AMP(mixed precision training set `"fp16_run": true` on `config.json`) python3 distributed.py -c config.json ``` -## Results +## Model Results | Card Type | Prec. | Single Card | 8 Cards | |-----------|-------|------------:|:--------:| -| BI | FP32 | 196.845 | 1440.233 | -| BI | AMP | 351.040 | 2400.745 | +| BI-V100 | FP32 | 196.845 | 1440.233 | +| BI-V100 | AMP | 351.040 | 2400.745 | -## Reference +## References - [waveglow](https://github.com/NVIDIA/waveglow) -- Gitee From f4e6400cdddd5d55337bb4234ed848f3821d1940 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Fri, 7 Mar 2025 16:51:10 +0800 Subject: [PATCH 2/4] unify model readme format - cv/resolution --- audio/speech_recognition/README.md | 0 cv/super_resolution/README.md | 1 - .../basicvsr++/pytorch/README.md | 46 +++++++++------ .../basicvsr/pytorch/README.md | 45 ++++++++------ cv/super_resolution/esrgan/pytorch/README.md | 40 +++++++------ cv/super_resolution/liif/pytorch/README.md | 49 +++++++-------- .../real_basicvsr/pytorch/README.md | 48 ++++++++------- cv/super_resolution/ttsr/pytorch/README.md | 59 ++++++++++--------- cv/super_resolution/ttvsr/pytorch/README.md | 47 +++++++-------- docs/MODEL_TEMPLATE.md | 6 +- 10 files changed, 186 insertions(+), 155 deletions(-) delete mode 100644 audio/speech_recognition/README.md delete mode 100644 cv/super_resolution/README.md diff --git a/audio/speech_recognition/README.md b/audio/speech_recognition/README.md deleted file mode 100644 index e69de29bb..000000000 diff --git a/cv/super_resolution/README.md b/cv/super_resolution/README.md deleted file mode 100644 index 97f616adf..000000000 --- a/cv/super_resolution/README.md +++ /dev/null @@ -1 +0,0 @@ -# Super Resolution diff --git a/cv/super_resolution/basicvsr++/pytorch/README.md b/cv/super_resolution/basicvsr++/pytorch/README.md index 675b3a870..2674fbf04 100755 --- a/cv/super_resolution/basicvsr++/pytorch/README.md +++ b/cv/super_resolution/basicvsr++/pytorch/README.md @@ -1,10 +1,27 @@ -# basicVSR++ (CVPR2022, Oral) +# BasicVSR++ -## Model description +## Model Description -A recurrent structure is a popular framework choice for the task of video super-resolution. The state-of-the-art method BasicVSR adopts bidirectional propagation with feature alignment to effectively exploit information from the entire input video. In this study, we redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment. We show that by empowering the recurrent framework with the enhanced propagation and alignment, one can exploit spatiotemporal information across misaligned video frames more effectively. The new components lead to an improved performance under a similar computational constraint. In particular, our model BasicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with similar number of parameters. In addition to video super-resolution, BasicVSR++ generalizes well to other video restoration tasks such as compressed video enhancement. In NTIRE 2021, BasicVSR++ obtains three champions and one runner-up in the Video Super-Resolution and Compressed Video Enhancement Challenges. Codes and models will be released to MMEditing. +BasicVSR++ is an advanced video super-resolution model that enhances BasicVSR with second-order grid propagation and +flow-guided deformable alignment. These innovations improve spatiotemporal information utilization across misaligned +frames, boosting performance by 0.82 dB PSNR while maintaining efficiency. It excels in video restoration tasks, +including compressed video enhancement, and achieved top results in NTIRE 2021 challenges. The model's recurrent +structure effectively processes entire video sequences, making it a state-of-the-art solution for high-quality video +upscaling and restoration. -## Step 1: Installing packages +## Model Preparation + +### Prepare Resources + +Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) or you can follow +tools/dataset_converters/reds/README.md + +```shell +mkdir -p data/ +ln -s ${REDS_DATASET_PATH} data/REDS +``` + +### Install Dependencies ```shell # Install libGL @@ -21,26 +38,17 @@ sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_cond pip install albumentations ``` -## Step 2: Preparing datasets +## Model Training -Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) or you can follow tools/dataset_converters/reds/README.md -```shell -mkdir -p data/ -ln -s ${REDS_DATASET_PATH} data/REDS -``` - -## Step 3: Training - -### One single GPU ```shell +# One single GPU python3 tools/train.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py -``` -### Mutiple GPUs on one machine -```shell +# Mutiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py 8 ``` -## Reference -[mmagic](https://github.com/open-mmlab/mmagic) +## References + +- [mmagic](https://github.com/open-mmlab/mmagic) diff --git a/cv/super_resolution/basicvsr/pytorch/README.md b/cv/super_resolution/basicvsr/pytorch/README.md index 9fbad252c..5c30134bf 100755 --- a/cv/super_resolution/basicvsr/pytorch/README.md +++ b/cv/super_resolution/basicvsr/pytorch/README.md @@ -1,10 +1,26 @@ -# basicVSR (CVPR2022, Oral) +# BasicVSR -## Model description +## Model Description -BasicVSR is a video super-resolution pipeline including optical flow and residual blocks. It adopts a typical bidirectional recurrent network. The upsampling module U contains multiple pixel-shuffle and convolutions. In the Figure, red and blue colors represent the backward and forward propagations, respectively. The propagation branches contain only generic components. S, W and R refer to the flow estimation module, spatial warping module, and residual blocks, respectively. +BasicVSR is a video super-resolution pipeline including optical flow and residual blocks. It adopts a typical +bidirectional recurrent network. The upsampling module U contains multiple pixel-shuffle and convolutions. In the +Figure, red and blue colors represent the backward and forward propagations, respectively. The propagation branches +contain only generic components. S, W and R refer to the flow estimation module, spatial warping module, and residual +blocks, respectively. -## Step 1: Installing packages +## Model Preparation + +### Prepare Resources + +Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) or you can follow +tools/dataset_converters/reds/README.md + +```shell +mkdir -p data/ +ln -s ${REDS_DATASET_PATH} data/REDS +``` + +### Install Dependencies ```shell # Install libGL @@ -21,26 +37,17 @@ sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_cond pip install albumentations ``` -## Step 2: Preparing datasets +## Model Training -Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) or you can follow tools/dataset_converters/reds/README.md -```shell -mkdir -p data/ -ln -s ${REDS_DATASET_PATH} data/REDS -``` - -## Step 3: Training - -### One single GPU ```shell +# One single GPU python3 tools/train.py configs/basicvsr/basicvsr_2xb4_reds4.py -``` -### Mutiple GPUs on one machine -```shell +# Mutiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/basicvsr/basicvsr_2xb4_reds4.py 8 ``` -## Reference -[mmagic](https://github.com/open-mmlab/mmagic) +## References + +- [mmagic](https://github.com/open-mmlab/mmagic) diff --git a/cv/super_resolution/esrgan/pytorch/README.md b/cv/super_resolution/esrgan/pytorch/README.md index 7f9028993..957c5b687 100755 --- a/cv/super_resolution/esrgan/pytorch/README.md +++ b/cv/super_resolution/esrgan/pytorch/README.md @@ -1,10 +1,24 @@ # ESRGAN -## Model description +## Model Description -The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://github.com/xinntao/ESRGAN . +ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) is an advanced deep learning model for single image +super-resolution. It improves upon SRGAN by introducing Residual-in-Residual Dense Blocks (RRDB) without batch +normalization, relativistic GAN for the discriminator, and enhanced perceptual loss using pre-activation features. These +innovations enable ESRGAN to generate more realistic textures with fewer artifacts, producing higher-quality upscaled +images. It achieved first place in the PIRM2018-SR Challenge, demonstrating superior visual quality and more natural +textures compared to its predecessor. -## Step 1: Installing packages +## Model Preparation + +### Prepare Resources + +```shell +# Download DIV2K: https://data.vision.ee.ethz.ch/cvl/DIV2K/ or you can follow this tools/dataset_converters/div2k/README.md +$ mkdir -p data/DIV2K +``` + +### Install Dependencies ```shell # Install libGL @@ -21,25 +35,17 @@ sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_cond pip install albumentations ``` -## Step 2: Preparing datasets - -```shell -# Download DIV2K: https://data.vision.ee.ethz.ch/cvl/DIV2K/ or you can follow this tools/dataset_converters/div2k/README.md -$ mkdir -p data/DIV2K -``` - -## Step 3: Training +## Model Training -### One single GPU ```shell +# One single GPU python3 tools/train.py configs/esrgan/esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py -``` -### Mutiple GPUs on one machine -```shell +# Mutiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/esrgan/esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py 8 ``` -## Reference -[mmagic](https://github.com/open-mmlab/mmagic) +## References + +- [mmagic](https://github.com/open-mmlab/mmagic) diff --git a/cv/super_resolution/liif/pytorch/README.md b/cv/super_resolution/liif/pytorch/README.md index 638092db6..f0106f01a 100755 --- a/cv/super_resolution/liif/pytorch/README.md +++ b/cv/super_resolution/liif/pytorch/README.md @@ -1,10 +1,23 @@ -# LIIF[Learning Continuous Image Representation with Local Implicit Image Function] +# LIIF -## Model description +## Model Description -How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. +LIIF (Local Implicit Image Function) is an innovative deep learning model for continuous image representation. It uses +neural networks to learn a function that can predict RGB values at any continuous coordinate within an image, enabling +arbitrary resolution representation. LIIF combines 2D deep features with coordinate inputs to generate high-quality +images, even at resolutions 30x higher than training data. This approach bridges discrete and continuous image +representations, outperforming traditional resizing methods and supporting tasks with varying image sizes. -## Step 1: Installing packages +## Model Preparation + +### Prepare Resources + +```shell +# Download DIV2K: https://data.vision.ee.ethz.ch/cvl/DIV2K/ or you can follow this tools/dataset_converters/div2k/README.md +mkdir -p data/DIV2K +``` + +### Install Dependencies ```shell # Install libGL @@ -21,33 +34,23 @@ sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_cond pip install albumentations ``` -## Step 2: Preparing datasets - -```shell -# Download DIV2K: https://data.vision.ee.ethz.ch/cvl/DIV2K/ or you can follow this tools/dataset_converters/div2k/README.md -mkdir -p data/DIV2K -``` - -## Step 3: Training +## Model Training -### One single GPU ```shell +# One single GPU python3 tools/train.py configs/liif/liif-edsr-norm_c64b16_1xb16-1000k_div2k.py -``` -### Mutiple GPUs on one machine -```shell +# Mutiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/liif/liif-edsr-norm_c64b16_1xb16-1000k_div2k.py 8 ``` -## Results on BI-V100 - -| GPUs | FP16 | FPS | PSNR | -|------|-------| ---- | ------------ | -| 1x8 | False | 684 | 26.87 | +## Model Results +| GPUs | FP16 | FPS | PSNR | +|------------|-------|-----|-------| +| BI-V100 x8 | False | 684 | 26.87 | -## Reference -[mmagic](https://github.com/open-mmlab/mmagic) +## References +- [mmagic](https://github.com/open-mmlab/mmagic) diff --git a/cv/super_resolution/real_basicvsr/pytorch/README.md b/cv/super_resolution/real_basicvsr/pytorch/README.md index b97c6ddbe..d45dab40b 100755 --- a/cv/super_resolution/real_basicvsr/pytorch/README.md +++ b/cv/super_resolution/real_basicvsr/pytorch/README.md @@ -1,11 +1,29 @@ # RealBasicVSR -## Model description +## Model Description -The diversity and complexity of degradations in real-world video super-resolution (VSR) pose non-trivial challenges in inference and training. First, while long-term propagation leads to improved performance in cases of mild degradations, severe in-the-wild degradations could be exaggerated through propagation, impairing output quality. To balance the tradeoff between detail synthesis and artifact suppression, we found an image pre-cleaning stage indispensable to reduce noises and artifacts prior to propagation. Equipped with a carefully designed cleaning module, our RealBasicVSR outperforms existing methods in both quality and efficiency. Second, real-world VSR models are often trained with diverse degradations to improve generalizability, requiring increased batch size to produce a stable gradient. Inevitably, the increased computational burden results in various problems, including 1) speed-performance tradeoff and 2) batch-length tradeoff. To alleviate the first tradeoff, we propose a stochastic degradation scheme that reduces up to 40% of training time without sacrificing performance. We then analyze different training settings and suggest that employing longer sequences rather than larger batches during training allows more effective uses of temporal information, leading to more stable performance during inference. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking. Code, models, and the dataset will be made publicly available. +RealBasicVSR is an advanced video super-resolution model designed for real-world applications. It addresses challenges +in handling diverse and complex degradations through a novel image pre-cleaning module that balances detail synthesis +and artifact suppression. The model introduces a stochastic degradation scheme to reduce training time while maintaining +performance, and emphasizes the use of longer sequences over larger batches for more effective temporal information +utilization. RealBasicVSR demonstrates superior quality and efficiency in video enhancement tasks. +## Model Preparation -## Step 1: Installing packages +### Prepare Resources + +Download UDM10 to data/UDM10 + +Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) or you can follow +tools/dataset_converters/reds/README.md + +```shell +mkdir -p data/ +ln -s ${REDS_DATASET_PATH} data/REDS +python tools/dataset_converters/reds/crop_sub_images.py --data-root ./data/REDS # cut REDS images into patches for fas +``` + +### Install Dependencies ```shell # Install libGL @@ -22,29 +40,17 @@ sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_cond pip install albumentations av==12.0.0 ``` -## Step 2: Preparing datasets - -Download UDM10 https://www.terabox.com/web/share/link?surl=LMuQCVntRegfZSxn7s3hXw&path=%2Fproject%2Fpfnl to data/UDM10 - -Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) or you can follow tools/dataset_converters/reds/README.md -```shell -mkdir -p data/ -ln -s ${REDS_DATASET_PATH} data/REDS -python tools/dataset_converters/reds/crop_sub_images.py --data-root ./data/REDS # cut REDS images into patches for fas -``` - -## Step 3: Training +## Model Training -### Training on single card ```shell +# Training on single card python3 tools/train.py configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py -``` -### Mutiple GPUs on one machine -```shell +# Mutiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/real_basicvsr/realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds.py 8 ``` -## Reference -[mmagic](https://github.com/open-mmlab/mmagic) +## References + +- [mmagic](https://github.com/open-mmlab/mmagic) diff --git a/cv/super_resolution/ttsr/pytorch/README.md b/cv/super_resolution/ttsr/pytorch/README.md index bbcbb0ef6..9bc1e608e 100755 --- a/cv/super_resolution/ttsr/pytorch/README.md +++ b/cv/super_resolution/ttsr/pytorch/README.md @@ -1,28 +1,17 @@ # TTSR -## Model description +## Model Description -We study on image super-resolution (SR), which aims to recover realistic textures from a low-resolution (LR) image. Recent progress has been made by taking high-resolution images as references (Ref), so that relevant textures can be transferred to LR images. However, existing SR approaches neglect to use attention mechanisms to transfer high-resolution (HR) textures from Ref images, which limits these approaches in challenging cases. In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively. TTSR consists of four closely-related modules optimized for image generation tasks, including a learnable texture extractor by DNN, a relevance embedding module, a hard-attention module for texture transfer, and a soft-attention module for texture synthesis. Such a design encourages joint feature learning across LR and Ref images, in which deep feature correspondences can be discovered by attention, and thus accurate texture features can be transferred. The proposed texture transformer can be further stacked in a cross-scale way, which enables texture recovery from different levels (e.g., from 1x to 4x magnification). Extensive experiments show that TTSR achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations. +TTSR (Texture Transformer Network for Image Super-Resolution) is an innovative deep learning model that enhances image +super-resolution using reference images. It employs a transformer architecture where low-resolution and reference images +are formulated as queries and keys. TTSR features four key modules: texture extractor, relevance embedding, +hard-attention for texture transfer, and soft-attention for texture synthesis. This design enables effective texture +transfer through attention mechanisms, allowing for high-quality image reconstruction at various magnification levels +(1x to 4x). +## Model Preparation -## Step 1: Installing packages - -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -git clone https://github.com/open-mmlab/mmagic.git -b v1.2.0 --depth=1 -cd mmagic/ -pip3 install -e . -v - -sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_condition/g' mmagic/models/editors/vico/vico_utils.py -pip install albumentations -``` - -## Step 2: Preparing datasets +### Prepare Resources ```bash mkdir -p data/ @@ -39,18 +28,34 @@ mkdir -p /root/.cache/torch/hub/checkpoints/ wget https://download.pytorch.org/models/vgg19-dcbb9e9d.pth -O /root/.cache/torch/hub/checkpoints/vgg19-dcbb9e9d.pth ``` -## Step 3: Training +### Install Dependencies -### Training on single card -```shell -python3 tools/train.py configs/ttsr/ttsr-gan_x4c64b16_1xb9-500k_CUFED.py +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +git clone https://github.com/open-mmlab/mmagic.git -b v1.2.0 --depth=1 +cd mmagic/ +pip3 install -e . -v + +sed -i 's/diffusers.models.unet_2d_condition/diffusers.models.unets.unet_2d_condition/g' mmagic/models/editors/vico/vico_utils.py +pip install albumentations ``` -### Mutiple GPUs on one machine +## Model Training + ```shell +# Training on single card +python3 tools/train.py configs/ttsr/ttsr-gan_x4c64b16_1xb9-500k_CUFED.py + +# Mutiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/ttsr/ttsr-gan_x4c64b16_1xb9-500k_CUFED.py 8 ``` -## Reference -[mmagic](https://github.com/open-mmlab/mmagic) \ No newline at end of file +## References + +- [mmagic](https://github.com/open-mmlab/mmagic) \ No newline at end of file diff --git a/cv/super_resolution/ttvsr/pytorch/README.md b/cv/super_resolution/ttvsr/pytorch/README.md index 673ce02c6..bafacc6c3 100755 --- a/cv/super_resolution/ttvsr/pytorch/README.md +++ b/cv/super_resolution/ttvsr/pytorch/README.md @@ -1,47 +1,48 @@ # TTVSR -## Model description +## Model Description -We proposed an approach named TTVSR to study video super-resolution by leveraging long-range frame dependencies. TTVSR introduces Transformer architectures in video super-resolution tasks and formulates video frames into pre-aligned trajectories of visual tokens to calculate attention along trajectories. +TTVSR (Transformer-based Temporal Video Super-Resolution) is an advanced deep learning model for video enhancement that +leverages transformer architectures to capture long-range frame dependencies. It formulates video frames into +pre-aligned trajectories of visual tokens, enabling effective attention calculation along temporal sequences. This +approach improves video super-resolution by better utilizing temporal information across frames, resulting in higher +quality upscaled videos. TTVSR demonstrates superior performance in handling complex video sequences while maintaining +efficient processing capabilities. +## Model Preparation -## Step 1: Installing packages - -```shell -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources Download REDS dataset from [homepage](https://seungjunnah.github.io/Datasets/reds.html) + ```shell mkdir -p data/ ln -s ${REDS_DATASET_PATH} data/REDS ``` -## Step 3: Training +### Install Dependencies -### One single GPU ```shell -python3 train.py [training args] # config file can be found in the configs directory +pip3 install -r requirements.txt ``` -### Mutiple GPUs on one machine -```shell -bash dist_train.sh [training args] # config file can be found in the configs directory -``` -### Example +## Model Training ```shell +# One single GPU +python3 train.py [training args] # config file can be found in the configs directory + +# Mutiple GPUs on one machine +## bash dist_train.sh [training args] # config file can be found in the configs directory bash dist_train.sh configs/TTVSR_reds4.py 8 ``` -## Results on BI-V100 +## Model Results -| GPUs | FP16 | FPS | PSNR | -|------|-------| ---- | ---- | -| 1x8 | False | 93.9 | 32.12 | +| GPUs | FP16 | FPS | PSNR | +|------------|-------|------|-------| +| BI-V100 x8 | False | 93.9 | 32.12 | +## References -## Reference -https://github.com/researchmm/TTVSR +- [TTVSR](https://github.com/researchmm/TTVSR) diff --git a/docs/MODEL_TEMPLATE.md b/docs/MODEL_TEMPLATE.md index d3c2c4103..a97bd7d65 100644 --- a/docs/MODEL_TEMPLATE.md +++ b/docs/MODEL_TEMPLATE.md @@ -17,17 +17,13 @@ A brief introduction about this model. ### Prepare Resources -### Prepare Datasets - ```bash python3 dataset/coco/download_coco.py ``` -#### Prepare Pre-trained Weights (for LLM) - Go to huggingface. -### Install Dependencies +## Install Dependencies ```bash pip install -r requirements.txt -- Gitee From 2781af9d1418fece8acf33178c9b255290ce1f89 Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Fri, 7 Mar 2025 18:05:40 +0800 Subject: [PATCH 3/4] unify model readme format - semantic_segmentation --- cv/semantic_segmentation/README.md | 1 - .../apcnet/pytorch/README.md | 75 +++++++++--------- .../att_unet/pytorch/README.md | 76 ++++++++++--------- .../bisenet/pytorch/README.md | 38 +++++----- .../bisenetv2/paddlepaddle/README.md | 65 +++++++--------- .../bisenetv2/pytorch/README.md | 46 ++++++----- .../cgnet/pytorch/README.md | 30 ++++---- .../contextnet/pytorch/README.md | 36 +++++---- .../dabnet/pytorch/README.md | 40 +++++----- .../danet/pytorch/README.md | 43 +++++------ .../ddrnet/pytorch/README.md | 74 +++++++++--------- .../deeplabv3/mindspore/README.md | 65 +++++++++------- .../deeplabv3/paddlepaddle/README.md | 50 +++++------- .../deeplabv3/pytorch/README.md | 39 +++++----- .../deeplabv3plus/paddlepaddle/README.md | 63 +++++++-------- .../deeplabv3plus/tensorflow/README.md | 42 ++++++---- .../deeplabv3plus/tensorflow/README_ORIGIN.md | 2 +- .../denseaspp/pytorch/README.md | 41 +++++----- .../dfanet/pytorch/README.md | 45 ++++++----- .../dnlnet/paddlepaddle/README.md | 61 +++++++-------- .../dunet/pytorch/README.md | 38 +++++----- .../encnet/pytorch/README.md | 36 +++++---- .../enet/pytorch/README.md | 50 ++++++------ .../erfnet/pytorch/README.md | 41 +++++----- .../espnet/pytorch/README.md | 39 +++++----- .../fastfcn/paddlepaddle/README.md | 13 ++-- .../fastscnn/pytorch/README.md | 8 +- .../fcn/pytorch/README.md | 8 +- .../fpenet/pytorch/README.md | 8 +- .../gcnet/pytorch/README.md | 4 +- .../hardnet/pytorch/README.md | 8 +- .../icnet/pytorch/README.md | 4 +- .../lednet/pytorch/README.md | 8 +- .../linknet/pytorch/README.md | 8 +- .../mask2former/pytorch/README.md | 8 +- .../mobileseg/paddlepaddle/README.md | 8 +- .../ocnet/pytorch/README.md | 8 +- .../ocrnet/paddlepaddle/README.md | 8 +- .../ocrnet/pytorch/README.md | 4 +- .../pp_humansegv1/paddlepaddle/README.md | 8 +- .../pp_humansegv2/paddlepaddle/README.md | 8 +- .../pp_liteseg/paddlepaddle/README.md | 8 +- .../psanet/pytorch/README.md | 8 +- .../pspnet/pytorch/README.md | 8 +- .../refinenet/pytorch/README.md | 8 +- .../segnet/pytorch/README.md | 8 +- .../stdc/paddlepaddle/README.md | 8 +- .../stdc/pytorch/README.md | 8 +- .../torchvision/pytorch/README.md | 20 +++-- .../unet++/pytorch/README.md | 8 +- .../unet/paddlepaddle/README.md | 8 +- .../unet/pytorch/README.md | 8 +- .../unet3d/pytorch/README.md | 8 +- .../vnet/tensorflow/README.md | 2 +- 54 files changed, 679 insertions(+), 688 deletions(-) delete mode 100644 cv/semantic_segmentation/README.md diff --git a/cv/semantic_segmentation/README.md b/cv/semantic_segmentation/README.md deleted file mode 100644 index b4625c13a..000000000 --- a/cv/semantic_segmentation/README.md +++ /dev/null @@ -1 +0,0 @@ -# Semantic Segmentation diff --git a/cv/semantic_segmentation/apcnet/pytorch/README.md b/cv/semantic_segmentation/apcnet/pytorch/README.md index f8c71a7ed..d139ec332 100644 --- a/cv/semantic_segmentation/apcnet/pytorch/README.md +++ b/cv/semantic_segmentation/apcnet/pytorch/README.md @@ -1,36 +1,21 @@ # APCNet -## Model descripstion +## Model Description -Adaptive Pyramid Context Network (APCNet) for semantic segmentation. -APCNet adaptively constructs multi-scale contextual representations with multiple well-designed Adaptive Context Modules (ACMs). -Specifically, each ACM leverages a global image representation as a guidance to estimate the local affinity coefficients for each sub-region. -And then calculates a context vector with these affinities. +Adaptive Pyramid Context Network (APCNet) for semantic segmentation. APCNet adaptively constructs multi-scale contextual +representations with multiple well-designed Adaptive Context Modules (ACMs). Specifically, each ACM leverages a global +image representation as a guidance to estimate the local affinity coefficients for each sub-region. And then calculates +a context vector with these affinities. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -# install mmsegmentation -git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 -cd mmsegmentation/ -pip install -v -e . - -pip install ftfy -``` - -## Step 2: Prepare Datasets - -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. - -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -58,28 +43,40 @@ mkdir -p data/ ln -s /path/to/cityscapes data/cityscapes ``` -## Step 3: Training +### Install Dependencies -### Training on single card ```shell -python3 tools/train.py configs/apcnet/apcnet_r50-d8_4xb2-80k_cityscapes-512x1024.py +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install mmsegmentation +git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 +cd mmsegmentation/ +pip install -v -e . + +pip install ftfy ``` -### Training on mutil-cards +## Model Training + ```shell +# Training on single card +python3 tools/train.py configs/apcnet/apcnet_r50-d8_4xb2-80k_cityscapes-512x1024.py + +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/apcnet/apcnet_r50-d8_4xb2-80k_cityscapes-512x1024.py 8 ``` -## Results - -### Cityscapes +## Model Results -#### Accuracy +| Model | Backbone | Crop Size | Lr schd | Mem (GB) | mIoU (BI x 4) | +|--------|----------|-----------|---------|----------|---------------| +| APCNet | R-50-D8 | 512x1024 | 40000 | 7.7 | 77.53 | -| Method | Backbone | Crop Size | Lr schd | Mem (GB) | mIoU (BI x 4) | -| ------ | -------- | --------- | ------: | -------- |--------------:| -| APCNet | R-50-D8 | 512x1024 | 40000 | 7.7 | 77.53 | +## References -## Reference -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/att_unet/pytorch/README.md b/cv/semantic_segmentation/att_unet/pytorch/README.md index 36ec1e825..2469abaf1 100644 --- a/cv/semantic_segmentation/att_unet/pytorch/README.md +++ b/cv/semantic_segmentation/att_unet/pytorch/README.md @@ -1,33 +1,23 @@ -# Attention U-Net: Learning Where to Look for the Pancreas +# Attention U-Net -## Model description +## Model Description -We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task. This enables us to eliminate the necessity of using explicit external tissue/organ localisation modules of cascaded convolutional neural networks (CNNs). AGs can be easily integrated into standard CNN architectures such as the U-Net model with minimal computational overhead while increasing the model sensitivity and prediction accuracy. +Attention U-Net is an advanced deep learning model for medical image segmentation that integrates attention gates into +the U-Net architecture. These gates automatically learn to focus on target structures of varying shapes and sizes, +suppressing irrelevant regions while highlighting salient features. This eliminates the need for explicit localization +modules, improving model sensitivity and accuracy. Attention U-Net efficiently processes medical images with minimal +computational overhead, making it particularly effective for tasks requiring precise segmentation of complex anatomical +structures. -## Step 1: Installation +## Model Preparation -### Install packages +### Prepare Resources -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install mmsegmentation -git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 -cd mmsegmentation/ -pip install -v -e . - -pip install ftfy -``` +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -## Step 2: Preparing datasets - -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. - -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure should look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +should look like: ```bash cityscapes/ @@ -55,24 +45,40 @@ mkdir -p data/ ln -s /path/to/cityscapes data/ ``` -## Step 3: Training +### Install Dependencies -### Training on single card -```shell -python3 tools/train.py configs/unet/unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install mmsegmentation +git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 +cd mmsegmentation/ +pip install -v -e . + +pip install ftfy ``` -### Training on mutil-cards +## Model Training + ```shell +# Training on single card +python3 tools/train.py configs/unet/unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py + +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/unet/unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py 8 ``` -## Results +## Model Results + +| GPUs | Crop Size | Lr schd | FPS | mIoU | +|------------|-----------|---------|---------|-------| +| BI-V100 x8 | 512x1024 | 160000 | 54.5180 | 69.39 | -| GPUs | Crop Size | Lr schd | FPS | mIoU | -| ------ | --------- | ------: | -------- |--------------:| -| BI-V100 x8 | 512x1024 | 160000 | 54.5180 | 69.39 | +## References -## Reference -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/bisenet/pytorch/README.md b/cv/semantic_segmentation/bisenet/pytorch/README.md index b2e48c18c..0641e9fad 100644 --- a/cv/semantic_segmentation/bisenet/pytorch/README.md +++ b/cv/semantic_segmentation/bisenet/pytorch/README.md @@ -1,25 +1,15 @@ # BiSeNet -## Model description +## Model Description -A novel Bilateral Segmentation Network (BiSeNet). -First design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. -Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. -On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. +A novel Bilateral Segmentation Network (BiSeNet). First design a Spatial Path with a small stride to preserve the +spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy +is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to +combine features efficiently. -## Step 1: Installing +## Model Preparation -### Install packages - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -44,13 +34,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_bisenet_r18_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/bisenetv2/paddlepaddle/README.md b/cv/semantic_segmentation/bisenetv2/paddlepaddle/README.md index f48873f75..aa8feb46c 100644 --- a/cv/semantic_segmentation/bisenetv2/paddlepaddle/README.md +++ b/cv/semantic_segmentation/bisenetv2/paddlepaddle/README.md @@ -1,25 +1,18 @@ # BiSeNetV2 -## Model description +## Model Description -A novel Bilateral Segmentation Network (BiSeNet). -First design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. -Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. -On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. +BiSeNet V2 is a two-pathway architecture for real-time semantic segmentation. One pathway is designed to capture the +spatial details with wide channels and shallow layers, called Detail Branch. In contrast, the other pathway is +introduced to extract the categorical semantics with narrow channels and deep layers, called Semantic Branch. The +Semantic Branch simply requires a large receptive field to capture semantic context, while the detail information can be +supplied by the Detail Branch. Therefore, the Semantic Branch can be made very lightweight with fewer channels and a +fast-downsampling strategy. Both types of feature representation are merged to construct a stronger and more +comprehensive feature representation. -## Step 1: Installing +## Model Preparation -```bash -git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -python3 setup.py install -``` - -## Step 2: Download data +### Prepare Resources Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. @@ -46,30 +39,29 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +python3 setup.py install +``` + +### Preprocess Data + ```bash # Datasets preprocessing pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," - -# CityScapes PATH as follow: -ls -al /path/to/cityscapes -total 11567948 -drwxr-xr-x 4 root root 227 Jul 18 03:32 . -drwxr-xr-x 6 root root 179 Jul 18 06:48 .. --rw-r--r-- 1 root root 298 Feb 20 2016 README -drwxr-xr-x 5 root root 58 Jul 18 03:30 gtFine --rw-r--r-- 1 root root 252567705 Jul 18 03:22 gtFine_trainvaltest.zip -drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit --rw-r--r-- 1 root root 11592327197 Jul 18 03:27 leftImg8bit_trainvaltest.zip --rw-r--r-- 1 root root 1646 Feb 17 2016 license.txt --rw-r--r-- 1 root root 193690 Jul 18 03:32 test.txt --rw-r--r-- 1 root root 398780 Jul 18 03:32 train.txt --rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt ``` -## Step 3: Run BiSeNetV2 +## Model Training ```bash # Change '/path/to/cityscapes' as your local Cityscapes dataset path @@ -90,7 +82,6 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py \ --use_vdl ``` - -| GPU | FP32 | -| --------- | ------------- | -| 8 cards | mIoU=73.45% | +| Model | GPU | FP32 | +|-----------|------------|-------------| +| BiSeNetV2 | BI-V100 x8 | mIoU=73.45% | diff --git a/cv/semantic_segmentation/bisenetv2/pytorch/README.md b/cv/semantic_segmentation/bisenetv2/pytorch/README.md index ab16f443f..2b316072f 100644 --- a/cv/semantic_segmentation/bisenetv2/pytorch/README.md +++ b/cv/semantic_segmentation/bisenetv2/pytorch/README.md @@ -1,20 +1,18 @@ # BiSeNetV2 -## Model description +## Model Description -BiSeNet V2 is a two-pathway architecture for real-time semantic segmentation. One pathway is designed to capture the spatial details with wide channels and shallow layers, called Detail Branch. In contrast, the other pathway is introduced to extract the categorical semantics with narrow channels and deep layers, called Semantic Branch. The Semantic Branch simply requires a large receptive field to capture semantic context, while the detail information can be supplied by the Detail Branch. Therefore, the Semantic Branch can be made very lightweight with fewer channels and a fast-downsampling strategy. Both types of feature representation are merged to construct a stronger and more comprehensive feature representation. +BiSeNet V2 is a two-pathway architecture for real-time semantic segmentation. One pathway is designed to capture the +spatial details with wide channels and shallow layers, called Detail Branch. In contrast, the other pathway is +introduced to extract the categorical semantics with narrow channels and deep layers, called Semantic Branch. The +Semantic Branch simply requires a large receptive field to capture semantic context, while the detail information can be +supplied by the Detail Branch. Therefore, the Semantic Branch can be made very lightweight with fewer channels and a +fast-downsampling strategy. Both types of feature representation are merged to construct a stronger and more +comprehensive feature representation. +## Model Preparation -## Step 1: Installation - -```bash -pip3 install -r requirement - -# if libGL.so.1 errors.txt -yum install mesa-libGL -y -``` - -## Step 2: Preparing datasets +### Prepare Resources Download cityscapes from [website](https://www.cityscapes-dataset.com/) @@ -26,18 +24,30 @@ unzip leftImg8bit_trainvaltest.zip unzip gtFine_trainvaltest.zip ``` -## Step 3: Training +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +pip3 install -r requirement +``` + +## Model Training ```bash bash train.sh 8 configs/bisenetv2_city.py ``` -## Results +## Model Results -| GPUs | FPS | ss | ssc | msf | mscf | -| --------- | --- | ----- | ----- | ----- | ----- | +| Model | FPS | ss | ssc | msf | mscf | +|-----------|-------------|-------|-------|-------|-------| | BiSeNetV2 | 156.81 s/it | 73.42 | 74.67 | 74.99 | 75.71 | -## Reference -[BiSeNet](https://github.com/CoinCheung/BiSeNet/tree/master) +## References +- [BiSeNet](https://github.com/CoinCheung/BiSeNet/tree/master) diff --git a/cv/semantic_segmentation/cgnet/pytorch/README.md b/cv/semantic_segmentation/cgnet/pytorch/README.md index 4bd0c8bf7..afceb7958 100644 --- a/cv/semantic_segmentation/cgnet/pytorch/README.md +++ b/cv/semantic_segmentation/cgnet/pytorch/README.md @@ -1,24 +1,14 @@ # CGNet -## Model description +## Model Description A novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing segmentation networks. Specifically, without any post-processing and multi-scale testing, the proposed CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters. -## Step 1: Installing +## Model Preparation -### Install packages - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -43,13 +33,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_cgnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/contextnet/pytorch/README.md b/cv/semantic_segmentation/contextnet/pytorch/README.md index 6a46897fe..07f315f23 100644 --- a/cv/semantic_segmentation/contextnet/pytorch/README.md +++ b/cv/semantic_segmentation/contextnet/pytorch/README.md @@ -1,23 +1,15 @@ # ContextNet -## Model description +## Model Description -ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and pyramid representation to produce competitive semantic segmentation in real-time with low memory requirement. -ContextNet combines a deep network branch at low resolution that captures global context information efficiently with a shallow branch that focuses on high-resolution segmentation details. +ContextNet, a new deep neural network architecture which builds on factorized convolution, network compression and +pyramid representation to produce competitive semantic segmentation in real-time with low memory requirement. ContextNet +combines a deep network branch at low resolution that captures global context information efficiently with a shallow +branch that focuses on high-resolution segmentation details. -## Step 1: Installing +## Model Preparation -### Install packages - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -42,13 +34,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_contextnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/dabnet/pytorch/README.md b/cv/semantic_segmentation/dabnet/pytorch/README.md index cad4fec36..40235ce2a 100644 --- a/cv/semantic_segmentation/dabnet/pytorch/README.md +++ b/cv/semantic_segmentation/dabnet/pytorch/README.md @@ -1,28 +1,20 @@ # DabNet -## Model description +## Model Description -A novel Depthwise Asymmetric Bottleneck (DAB) module, which efficiently adopts depth-wise asymmetric convolution and dilated convolution to build a bottleneck structure. +A novel Depthwise Asymmetric Bottleneck (DAB) module, which efficiently adopts depth-wise asymmetric convolution and dilated convolution to build a bottleneck structure. Based on the DAB module, design a Depth-wise Asymmetric Bottleneck Network (DABNet) especially for real-time semantic segmentation. -It creates sufficient receptive field and densely utilizes the contextual information. +It creates sufficient receptive field and densely utilizes the contextual information. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,13 +35,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_dabnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/danet/pytorch/README.md b/cv/semantic_segmentation/danet/pytorch/README.md index ecc156cbb..c0badffef 100644 --- a/cv/semantic_segmentation/danet/pytorch/README.md +++ b/cv/semantic_segmentation/danet/pytorch/README.md @@ -1,28 +1,21 @@ # DANet -## Model description +## Model Description -A novel framework, the dual attention network (DANet), for natural scene image segmentation. -It adopts a self-attention mechanism instead of simply stacking convolutions to compute the spatial attention map, which enables the network to capture global information directly. -DANet uses in parallel a position attention module and a channel attention module to capture feature dependencies in spatial and channel domains. +A novel framework, the dual attention network (DANet), for natural scene image segmentation. It adopts a self-attention +mechanism instead of simply stacking convolutions to compute the spatial attention map, which enables the network to +capture global information directly. DANet uses in parallel a position attention module and a channel attention module +to capture feature dependencies in spatial and channel domains. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,13 +36,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_danet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/ddrnet/pytorch/README.md b/cv/semantic_segmentation/ddrnet/pytorch/README.md index ca42fc9ce..cee8e72ab 100644 --- a/cv/semantic_segmentation/ddrnet/pytorch/README.md +++ b/cv/semantic_segmentation/ddrnet/pytorch/README.md @@ -1,33 +1,22 @@ # DDRNet -## Model description +## Model Description -we proposed a family of efficient backbones specially designed for real-time semantic segmentation. The proposed deep dual-resolution networks (DDRNets) are composed of two deep branches between which multiple bilateral fusions are performed. Additionally, we design a new contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) to enlarge effective receptive fields and fuse multi-scale context based on low-resolution feature maps. Our method achieves a new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. +we proposed a family of efficient backbones specially designed for real-time semantic segmentation. The proposed deep +dual-resolution networks (DDRNets) are composed of two deep branches between which multiple bilateral fusions are +performed. Additionally, we design a new contextual information extractor named Deep Aggregation Pyramid Pooling Module +(DAPPM) to enlarge effective receptive fields and fuse multi-scale context based on low-resolution feature maps. Our +method achieves a new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. -## Step 1: Installation +## Model Preparation -### Install packages +### Prepare Resources -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install mmsegmentation -git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 -cd mmsegmentation/ -pip install -v -e . - -pip install ftfy -``` +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -## Step 2: Preparing datasets - -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. - -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure should look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +should look like: ```bash cityscapes/ @@ -55,23 +44,40 @@ mkdir -p data/ ln -s /path/to/cityscapes data/ ``` -## Step 3: Training -### Training on single card -```shell -python3 tools/train.py configs/ddrnet/ddrnet_23-slim_in1k-pre_2xb6-120k_cityscapes-1024x1024.py +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install mmsegmentation +git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 +cd mmsegmentation/ +pip install -v -e . + +pip install ftfy ``` -### Training on mutil-cards +## Model Training + ```shell +# Training on single card +python3 tools/train.py configs/ddrnet/ddrnet_23-slim_in1k-pre_2xb6-120k_cityscapes-1024x1024.py + +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/ddrnet/ddrnet_23-slim_in1k-pre_2xb6-120k_cityscapes-1024x1024.py 8 ``` -## Results +## Model Results + +| GPU | Crop Size | Lr schd | FPS | mIoU | +|------------|-----------|--------:|--------|-----:| +| BI-V100 x8 | 1024x1024 | 12000 | 33.085 | 74.8 | -| GPUs | Crop Size | Lr schd | FPS | mIoU| -| ------ | --------- | ------: | -------- |--------------:| -| BI-V100 x8 | 1024x1024 | 12000 | 33.085 | 74.8 | +## References -## Reference -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) \ No newline at end of file +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/deeplabv3/mindspore/README.md b/cv/semantic_segmentation/deeplabv3/mindspore/README.md index e1e194f4d..c6c406a91 100755 --- a/cv/semantic_segmentation/deeplabv3/mindspore/README.md +++ b/cv/semantic_segmentation/deeplabv3/mindspore/README.md @@ -1,18 +1,15 @@ # DeepLabV3 -## Model description -DeepLab is a series of image semantic segmentation models, DeepLabV3 improves significantly over previous versions. Two keypoints of DeepLabV3: Its multi-grid atrous convolution makes it better to deal with segmenting objects at multiple scales, and augmented ASPP makes image-level features available to capture long range information. -This repository provides a script and recipe to DeepLabV3 model and achieve state-of-the-art performance. +## Model Description -Refer to [this paper][1] for network details. -`Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.` +DeepLab is a series of image semantic segmentation models, DeepLabV3 improves significantly over previous versions. Two +keypoints of DeepLabV3: Its multi-grid atrous convolution makes it better to deal with segmenting objects at multiple +scales, and augmented ASPP makes image-level features available to capture long range information. This repository +provides a script and recipe to DeepLabV3 model and achieve state-of-the-art performance. -[1]: https://arxiv.org/abs/1706.05587 -## Step 1: Installing -``` -pip3 install -r requirements.txt -``` -## Step 2: Prepare Datasets +## Model Preparation + +### Prepare Resources Pascal VOC datasets [link](https://pjreddie.com/projects/pascal-voc-dataset-mirror), and Semantic Boundaries Dataset: [link](https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz) @@ -40,26 +37,40 @@ You can also generate the list file automatically by run script: `python3 ./src/ --shuffle shuffle or not ``` -# [Pretrained models](#contents) -Please [resnet101](https://download.mindspore.cn/model_zoo/r1.2/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78/) download resnet101 here +Please download resnet101 from [here](https://download.mindspore.cn/model_zoo/r1.2/resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78/). -## Step 3: Training -``` -python3 train.py --data_file=/home/dataset/deeplabv3/vocaug_train.mindrecord0 --train_dir=./ckpt --train_epochs=200 --batch_size=32 --crop_size=513 --base_lr=0.015 --lr_type=cos --min_scale=0.5 --max_scale=2.0 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s16 --ckpt_pre_trained=./resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt --save_steps=1500 --keep_checkpoint_max=200 --device_target=GPU +### Install Dependencies + +```bash +pip3 install -r requirements.txt ``` -### [Evaluation] + +## Model Training + +```bash +python3 train.py --data_file=/home/dataset/deeplabv3/vocaug_train.mindrecord0 --train_dir=./ckpt \ + --train_epochs=200 --batch_size=32 --crop_size=513 --base_lr=0.015 --lr_type=cos \ + --min_scale=0.5 --max_scale=2.0 --ignore_label=255 --num_classes=21 \ + --model=deeplab_v3_s16 \ + --ckpt_pre_trained=./resnet101_ascend_v120_imagenet2012_official_cv_bs32_acc78.ckpt \ + --save_steps=1500 --keep_checkpoint_max=200 --device_target=GPU ``` -python3 eval.py --data_root=deeplabv3/ --data_lst=voc_val_lst.txt --batch_size=32 --crop_size=513 --ignore_label=255 --num_classes=21 --model=deeplab_v3_s16 --scales_type=0 --freeze_bn=True --device_target=GPU --ckpt_path=deeplab_v3_s16-200_45.ckpt + +### Evaluation + +```bash +python3 eval.py --data_root=deeplabv3/ --data_lst=voc_val_lst.txt --batch_size=32 --crop_size=513 \ + --ignore_label=255 --num_classes=21 --model=deeplab_v3_s16 --scales_type=0 \ + --freeze_bn=True --device_target=GPU --ckpt_path=deeplab_v3_s16-200_45.ckpt ``` -### [Evaluation result] -## Results on BI-V100 -| GPUs | per step time | Miou | -|------|-------------- |--------| -| 1 | 1.465s | 0.7386 | +## Model Results + +| GPU | per step time | Miou | +|-------------|---------------|--------| +| BI-V100 x1 | 1.465s | 0.7386 | +| NV-V100s x1 | 1.716s | 0.7453 | -## Results on NV-V100s +## References -| GPUs | per step time | MAP | -|------|-------------- |-------| -| 1 | 1.716s | 0.7453| +- [Paper](https://arxiv.org/abs/1706.05587) diff --git a/cv/semantic_segmentation/deeplabv3/paddlepaddle/README.md b/cv/semantic_segmentation/deeplabv3/paddlepaddle/README.md index cf5620537..6fa1239d1 100644 --- a/cv/semantic_segmentation/deeplabv3/paddlepaddle/README.md +++ b/cv/semantic_segmentation/deeplabv3/paddlepaddle/README.md @@ -1,22 +1,14 @@ # DeepLabV3 -## Model description +## Model Description -DeepLabV3 is a semantic segmentation architecture that improves upon DeepLabV2 with several modifications. -To handle the problem of segmenting objects at multiple scales, modules are designed which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. +DeepLabV3 is a semantic segmentation architecture that improves upon DeepLabV2 with several modifications. To handle the +problem of segmenting objects at multiple scales, modules are designed which employ atrous convolution in cascade or in +parallel to capture multi-scale context by adopting multiple atrous rates. -## Step 1: Installing +## Model Preparation -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -``` - -## Step 2: Download data +### Prepare Resources Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. @@ -43,30 +35,28 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +git clone --recursive https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +``` + +### Preprocess Data + ```bash # Datasets preprocessing pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 - python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," -# CityScapes PATH as follow: -ls -al /path/to/cityscapes -total 11567948 -drwxr-xr-x 4 root root 227 Jul 18 03:32 . -drwxr-xr-x 6 root root 179 Jul 18 06:48 .. --rw-r--r-- 1 root root 298 Feb 20 2016 README -drwxr-xr-x 5 root root 58 Jul 18 03:30 gtFine --rw-r--r-- 1 root root 252567705 Jul 18 03:22 gtFine_trainvaltest.zip -drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit --rw-r--r-- 1 root root 11592327197 Jul 18 03:27 leftImg8bit_trainvaltest.zip --rw-r--r-- 1 root root 1646 Feb 17 2016 license.txt --rw-r--r-- 1 root root 193690 Jul 18 03:32 test.txt --rw-r--r-- 1 root root 398780 Jul 18 03:32 train.txt --rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt ``` -## Step 3: Run DeepLabV3 +## Model Training ```bash # Change '/path/to/cityscapes' as your local Cityscapes dataset path diff --git a/cv/semantic_segmentation/deeplabv3/pytorch/README.md b/cv/semantic_segmentation/deeplabv3/pytorch/README.md index 9d6fbdd36..0774e1920 100644 --- a/cv/semantic_segmentation/deeplabv3/pytorch/README.md +++ b/cv/semantic_segmentation/deeplabv3/pytorch/README.md @@ -1,27 +1,20 @@ # DeepLabV3 -## Model description +## Model Description -DeepLabV3 is a semantic segmentation architecture that improves upon DeepLabV2 with several modifications. -To handle the problem of segmenting objects at multiple scales, modules are designed which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. +DeepLabV3 is a semantic segmentation architecture that improves upon DeepLabV2 with several modifications. To handle the +problem of segmenting objects at multiple scales, modules are designed which employ atrous convolution in cascade or in +parallel to capture multi-scale context by adopting multiple atrous rates. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -42,12 +35,18 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_deeplabv3_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/deeplabv3plus/paddlepaddle/README.md b/cv/semantic_segmentation/deeplabv3plus/paddlepaddle/README.md index 486c4039d..fe11612e9 100644 --- a/cv/semantic_segmentation/deeplabv3plus/paddlepaddle/README.md +++ b/cv/semantic_segmentation/deeplabv3plus/paddlepaddle/README.md @@ -1,26 +1,20 @@ # DeepLabV3+ -## Model description +## Model Description -DeepLabv3 is a semantic segmentation architecture that improves upon DeepLabv2 with several modifications. -To handle the problem of segmenting objects at multiple scales, modules are designed which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. +DeepLabv3 is a semantic segmentation architecture that improves upon DeepLabv2 with several modifications. To handle the +problem of segmenting objects at multiple scales, modules are designed which employ atrous convolution in cascade or in +parallel to capture multi-scale context by adopting multiple atrous rates. -## Step 1: Installing +## Model Preparation -```bash -git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -``` - -## Step 2: Download data +### Prepare Resources -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -43,30 +37,28 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +``` + +### Preprocess Data + ```bash # Datasets preprocessing pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," - -# CityScapes PATH as follow: -ls -al /path/to/cityscapes -total 11567948 -drwxr-xr-x 4 root root 227 Jul 18 03:32 . -drwxr-xr-x 6 root root 179 Jul 18 06:48 .. --rw-r--r-- 1 root root 298 Feb 20 2016 README -drwxr-xr-x 5 root root 58 Jul 18 03:30 gtFine --rw-r--r-- 1 root root 252567705 Jul 18 03:22 gtFine_trainvaltest.zip -drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit --rw-r--r-- 1 root root 11592327197 Jul 18 03:27 leftImg8bit_trainvaltest.zip --rw-r--r-- 1 root root 1646 Feb 17 2016 license.txt --rw-r--r-- 1 root root 193690 Jul 18 03:32 test.txt --rw-r--r-- 1 root root 398780 Jul 18 03:32 train.txt --rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt ``` -## Step 3: Run DeepLabV3+ +## Model Training ```bash # Change '/path/to/cityscapes' as your local Cityscapes dataset path @@ -91,7 +83,8 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py \ --use_vdl ``` +## Model Results -| GPU | FP32 | -| --------- | -------------- | -| 8 cards | mIoU =80.42% | +| Model | GPU | FP32 | +|------------|------------|--------------| +| DeepLabV3+ | BI-V100 x8 | mIoU =80.42% | diff --git a/cv/semantic_segmentation/deeplabv3plus/tensorflow/README.md b/cv/semantic_segmentation/deeplabv3plus/tensorflow/README.md index 4d767ccaf..afaa6bf65 100644 --- a/cv/semantic_segmentation/deeplabv3plus/tensorflow/README.md +++ b/cv/semantic_segmentation/deeplabv3plus/tensorflow/README.md @@ -1,18 +1,18 @@ # DeepLabV3+ -## Model description -DeepLabV3+ is a state-of-the-art semantic segmentation network. It combines the strengths of DeepLabV3 and a powerful encoder-decoder architecture. The network employs atrous convolution to capture multi-scale contextual information effectively. It introduces a novel feature called the "ASPP" module, which utilizes parallel atrous convolutions to capture fine-grained details and global context simultaneously. +## Model Description -## Step 1: Installation +DeepLabV3+ is a state-of-the-art semantic segmentation network. It combines the strengths of DeepLabV3 and a powerful +encoder-decoder architecture. The network employs atrous convolution to capture multi-scale contextual information +effectively. It introduces a novel feature called the "ASPP" module, which utilizes parallel atrous convolutions to +capture fine-grained details and global context simultaneously. -```bash -pip3 install wandb -pip3 install urllib3==1.26.6 -``` +## Model Preparation -## Step 2:Preparing datasets +### Prepare Resources -Sign up and login in [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the cityscapes dataset. Specify `/path/to/cityscapes` to your Cityscapes path in later training process. +Sign up and login in [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to +download the cityscapes dataset. Specify `/path/to/cityscapes` to your Cityscapes path in later training process. The Cityscapes dataset path structure should look like: @@ -39,19 +39,29 @@ Cityscapes └── val.txt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install wandb +pip3 install urllib3==1.26.6 +``` + +## Model Training + Open config folder and set `/path/to/cityscapes` in ./config/cityscapes_resnet50.py. single gpu: + ```bash export CUDA_VISIBLE_DEVICES=0 nohup python3 trainer.py cityscapes_resnet50 1> train_deeplabv3.log 2> train_deeplabv3_error.log & tail -f train_deeplabv3.log ``` -## Results +## Model Results + +| Model | GPU | FPS | ACC | +|------------|---------|------|--------| +| DeepLabV3+ | BI-V100 | 6.14 | 77.35% | -| GPUs | FPS | ACC | -|-------------|-----------|--------------| -| BI-V100 | 6.14 | 77.35% | +## References -## Reference -https://github.com/lattice-ai/DeepLabV3-Plus +- [DeepLabV3-Plus](https://github.com/lattice-ai/DeepLabV3-Plus) diff --git a/cv/semantic_segmentation/deeplabv3plus/tensorflow/README_ORIGIN.md b/cv/semantic_segmentation/deeplabv3plus/tensorflow/README_ORIGIN.md index 94407d5bf..420b852cc 100644 --- a/cv/semantic_segmentation/deeplabv3plus/tensorflow/README_ORIGIN.md +++ b/cv/semantic_segmentation/deeplabv3plus/tensorflow/README_ORIGIN.md @@ -171,7 +171,7 @@ def plot_predictions(images_list, size): plot_predictions(train_images[:4], (512, 512)) ``` -## Results +## Model Results ### Multi-Person Human Parsing diff --git a/cv/semantic_segmentation/denseaspp/pytorch/README.md b/cv/semantic_segmentation/denseaspp/pytorch/README.md index 328f00e58..a17fa3df1 100644 --- a/cv/semantic_segmentation/denseaspp/pytorch/README.md +++ b/cv/semantic_segmentation/denseaspp/pytorch/README.md @@ -1,27 +1,20 @@ # DenseASPP -## Model description +## Model Description -Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way. -Such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. +Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a +dense way. Such that it generates multi-scale features that not only cover a larger scale range, but also cover that +scale range densely, without significantly increasing the model size. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -42,13 +35,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +### Model Training ```shell bash train_denseaspp_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/dfanet/pytorch/README.md b/cv/semantic_segmentation/dfanet/pytorch/README.md index 925dba9c3..c898bd2b7 100644 --- a/cv/semantic_segmentation/dfanet/pytorch/README.md +++ b/cv/semantic_segmentation/dfanet/pytorch/README.md @@ -1,29 +1,22 @@ # DFANet -## Model description +## Model Description -An extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. -It starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. -Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters. -But it still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. +An extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. It starts +from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade +respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters. But +it still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the +speed and segmentation performance. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,13 +37,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +### Model Training ```shell bash train_dfanet_xceptiona_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md b/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md index ed5c92493..4722e6a23 100644 --- a/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md +++ b/cv/semantic_segmentation/dnlnet/paddlepaddle/README.md @@ -1,20 +1,17 @@ # DNLNet -## Model description -[Disentangled Non-Local Neural Networks](https://arxiv.org/abs/2006.06668) Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu +## Model Description +DNLNet (Disentangled Non-Local Neural Networks) is an advanced deep learning model for semantic segmentation that +enhances feature representation through disentangled non-local operations. It separates spatial and channel-wise +relationships in feature maps, enabling more effective long-range dependency modeling. This approach improves the +network's ability to capture contextual information while maintaining computational efficiency. DNLNet demonstrates +superior performance in tasks requiring precise spatial understanding, such as urban scene segmentation, by effectively +aggregating both local and global features. -## Step 1: Installing -``` -git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -``` +## Model Preparation -## Step 2: Prepare Datasets +### Prepare Resources Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. @@ -41,35 +38,31 @@ cityscapes/ └── munster ``` -Datasets preprocessing: +### Install Dependencies + +```bash +git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +``` + +### Preprocess Data + ```bash pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 - python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," ``` -then the cityscapes path as follows: -``` -root@5574247e63f8:~# ls -al /path/to/cityscapes -total 11567948 -drwxr-xr-x 4 root root 227 Jul 18 03:32 . -drwxr-xr-x 6 root root 179 Jul 18 06:48 .. --rw-r--r-- 1 root root 298 Feb 20 2016 README -drwxr-xr-x 5 root root 58 Jul 18 03:30 gtFine --rw-r--r-- 1 root root 252567705 Jul 18 03:22 gtFine_trainvaltest.zip -drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit --rw-r--r-- 1 root root 11592327197 Jul 18 03:27 leftImg8bit_trainvaltest.zip --rw-r--r-- 1 root root 1646 Feb 17 2016 license.txt --rw-r--r-- 1 root root 193690 Jul 18 03:32 test.txt --rw-r--r-- 1 root root 398780 Jul 18 03:32 train.txt --rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt -``` +## Model Training -## Step 3: Training Notice: modify configs/dnlnet/dnlnet_resnet50_os8_cityscapes_1024x512_80k.yml file, modify the datasets path as yours. -``` + +```bash cd PaddleSeg export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True @@ -81,5 +74,7 @@ python3 -u -m paddle.distributed.launch train.py \ --amp_level O1 ``` -## Reference +## References + - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) +- [Paper](https://arxiv.org/abs/2006.06668) diff --git a/cv/semantic_segmentation/dunet/pytorch/README.md b/cv/semantic_segmentation/dunet/pytorch/README.md index 7f33c47ce..730fc4af0 100644 --- a/cv/semantic_segmentation/dunet/pytorch/README.md +++ b/cv/semantic_segmentation/dunet/pytorch/README.md @@ -1,24 +1,16 @@ # DUNet -## Model description +## Model Description -Deformable U-Net (DUNet), which exploits the retinal vessels' local features with a U-shape architecture, in an end to end manner for retinal vessel segmentation. -The DUNet, with upsampling operators to increase the output resolution, is designed to extract context information and enable precise localization by combining low-level feature maps with high-level ones. -Furthermore, DUNet captures the retinal vessels at various shapes and scales by adaptively adjusting the receptive fields according to vessels' scales and shapes. +Deformable U-Net (DUNet), which exploits the retinal vessels' local features with a U-shape architecture, in an end to +end manner for retinal vessel segmentation. The DUNet, with upsampling operators to increase the output resolution, is +designed to extract context information and enable precise localization by combining low-level feature maps with +high-level ones. Furthermore, DUNet captures the retinal vessels at various shapes and scales by adaptively adjusting +the receptive fields according to vessels' scales and shapes. -## Step 1: Installing +## Model Preparation -### Install packages - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -43,13 +35,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_dunet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/encnet/pytorch/README.md b/cv/semantic_segmentation/encnet/pytorch/README.md index f3b3b4bfc..3e59f3b55 100644 --- a/cv/semantic_segmentation/encnet/pytorch/README.md +++ b/cv/semantic_segmentation/encnet/pytorch/README.md @@ -1,27 +1,19 @@ # EncNet -## Model description +## Model Description The Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -42,13 +34,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_encnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/enet/pytorch/README.md b/cv/semantic_segmentation/enet/pytorch/README.md index 0ccec8d41..3ca986693 100644 --- a/cv/semantic_segmentation/enet/pytorch/README.md +++ b/cv/semantic_segmentation/enet/pytorch/README.md @@ -1,33 +1,23 @@ # ENet -## Model description +## Model Description -ENet is a semantic segmentation architecture which utilises a compact encoder-decoder architecture. +ENet is an efficient deep learning model for real-time semantic segmentation, designed with a compact encoder-decoder +architecture. It optimizes performance through early downsampling, reducing computational costs while maintaining +accuracy. Key features include SegNet-style max pooling indices for sparse upsampling, PReLU activation functions, +dilated convolutions, and spatial dropout. ENet's lightweight design makes it particularly suitable for applications +requiring fast inference on resource-constrained devices, such as mobile platforms or real-time video processing +systems, without compromising segmentation quality. -Some design choices include: -1. Using the SegNet approach to downsampling y saving indices of elements chosen in max pooling layers, and using them to produce sparse upsampled maps in the decoder. -2. Early downsampling to optimize the early stages of the network and reduce the cost of processing large input frames. The first two blocks of ENet heavily reduce the input size, and use only a small set of feature maps. -3. Using PReLUs as an activation function. -4. Using dilated convolutions. -5. Using Spatial Dropout. +## Model Preparation -## Step 1: Installing +### Prepare Resources -### Install packages +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets - -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -48,13 +38,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_enet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/erfnet/pytorch/README.md b/cv/semantic_segmentation/erfnet/pytorch/README.md index 08bbcb974..a8dc93345 100644 --- a/cv/semantic_segmentation/erfnet/pytorch/README.md +++ b/cv/semantic_segmentation/erfnet/pytorch/README.md @@ -1,27 +1,20 @@ # ERFNet -## Model description +## Model Description -A deep architecture that is able to run in real-time while providing accurate semantic segmentation. -The core of the architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. +A deep architecture that is able to run in real-time while providing accurate semantic segmentation. The core of the +architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient +while retaining remarkable accuracy. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -42,13 +35,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_erfnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/Eromera/erfnet_pytorch -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/espnet/pytorch/README.md b/cv/semantic_segmentation/espnet/pytorch/README.md index 2aea30b65..d8b603aea 100644 --- a/cv/semantic_segmentation/espnet/pytorch/README.md +++ b/cv/semantic_segmentation/espnet/pytorch/README.md @@ -1,27 +1,20 @@ # ESPNet -## Model description +## Model Description ESPNet is a convolutional neural network for semantic segmentation of high resolution images under resource constraints. -ESPNet is based on a convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power. +ESPNet is based on a convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, +memory, and power. -## Step 1: Installing +## Model Preparation -### Install packages +### Prepare Resources -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -42,13 +35,19 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_espnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md b/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md index faf9ca58a..4a7c53ec3 100644 --- a/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md +++ b/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md @@ -1,8 +1,11 @@ # FastFCN -## Model description +## Model Description -FastFCN is a fast, lightweight semantic segmentation model that achieves real-time speeds with competitive accuracy. It uses an efficient encoder-decoder architecture and depthwise separable convolutions to reduce computations. The simplified design allows FastFCN to run much faster than prior FCNs while maintaining good segmentation quality. FastFCN demonstrates real-time segmentation is possible with a carefully designed lightweight CNN architecture. +FastFCN is a fast, lightweight semantic segmentation model that achieves real-time speeds with competitive accuracy. It +uses an efficient encoder-decoder architecture and depthwise separable convolutions to reduce computations. The +simplified design allows FastFCN to run much faster than prior FCNs while maintaining good segmentation quality. FastFCN +demonstrates real-time segmentation is possible with a carefully designed lightweight CNN architecture. ## Step 1: Installation @@ -34,7 +37,7 @@ ADEChallengeData2016 └── sceneCategories.txt ``` -## Step 3: Training +## Model Training ```bash # Make sure your dataset path is the same as above @@ -52,12 +55,12 @@ python3 -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py --con python3 tools/val.py --config configs/fastfcn/fastfcn_resnet50_os8_ade20k_480x480_120k.yml --model_path output/path/to/model.pdparams ``` -## Results +## Model Results | GPUs | mIoU | Acc |Kappa | Dice | ips | |:-----------:|:-----------:|:-----------:|:------------:|:------------:|:-----------:| | BI-V100 x 8 |0.4312 | 0.8083 | 0.7935 | 0.570 | 33.68 | -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/fastscnn/pytorch/README.md b/cv/semantic_segmentation/fastscnn/pytorch/README.md index a0219a40d..1d90a9424 100644 --- a/cv/semantic_segmentation/fastscnn/pytorch/README.md +++ b/cv/semantic_segmentation/fastscnn/pytorch/README.md @@ -1,14 +1,14 @@ # FastSCNN -## Model description +## Model Description The fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. The 'learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. The network combines spatial detail at high resolution with deep features extracted at lower resolution. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -49,7 +49,7 @@ coco2017 bash train_fastscnn_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/fcn/pytorch/README.md b/cv/semantic_segmentation/fcn/pytorch/README.md index 94989f9c2..2555cb0d5 100644 --- a/cv/semantic_segmentation/fcn/pytorch/README.md +++ b/cv/semantic_segmentation/fcn/pytorch/README.md @@ -1,6 +1,6 @@ # FCN -## Model description +## Model Description Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. They employ solely locally connected layers, such as convolution, pooling and upsampling. @@ -9,9 +9,9 @@ It also means an FCN can work for variable image sizes given all connections are The network consists of a downsampling path, used to extract and interpret the context, and an upsampling path, which allows for localization. FCNs also employ skip connections to recover the fine-grained spatial information lost in the downsampling path. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -52,6 +52,6 @@ coco2017 bash train_fcn_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/fpenet/pytorch/README.md b/cv/semantic_segmentation/fpenet/pytorch/README.md index 4771be4c4..06d1c83c8 100644 --- a/cv/semantic_segmentation/fpenet/pytorch/README.md +++ b/cv/semantic_segmentation/fpenet/pytorch/README.md @@ -1,14 +1,14 @@ # FPENet -## Model description +## Model Description A lightweight feature pyramid encoding network (FPENet) to make a good trade-off between accuracy and speed. Specifically, use a feature pyramid encoding block to encode multi-scale contextual features with depthwise dilated convolutions in all stages of the encoder. A mutual embedding upsample module is introduced in the decoder to aggregate the high-level semantic features and low-level spatial details efficiently. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -49,7 +49,7 @@ coco2017 bash train_fpenet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/gcnet/pytorch/README.md b/cv/semantic_segmentation/gcnet/pytorch/README.md index 4b9aac50f..0824b2c67 100755 --- a/cv/semantic_segmentation/gcnet/pytorch/README.md +++ b/cv/semantic_segmentation/gcnet/pytorch/README.md @@ -1,6 +1,6 @@ # GCNet -## Model description +## Model Description A Global Context Network, or GCNet, utilises global context blocks to model long-range dependencies in images. It is based on the Non-Local Network, but it modifies the architecture so less computation is required. @@ -88,5 +88,5 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/gcnet/gcnet_r50-d8_4xb2-40k_cityscapes-769x769.py 8 ``` -## Reference +## References [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) \ No newline at end of file diff --git a/cv/semantic_segmentation/hardnet/pytorch/README.md b/cv/semantic_segmentation/hardnet/pytorch/README.md index 23f23be8e..70dfa2651 100644 --- a/cv/semantic_segmentation/hardnet/pytorch/README.md +++ b/cv/semantic_segmentation/hardnet/pytorch/README.md @@ -1,14 +1,14 @@ # HardNet -## Model description +## Model Description The Harmonic Densely Connected Network to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -49,7 +49,7 @@ coco2017 bash train_hardnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/icnet/pytorch/README.md b/cv/semantic_segmentation/icnet/pytorch/README.md index 492d6c1ff..ea230afd8 100644 --- a/cv/semantic_segmentation/icnet/pytorch/README.md +++ b/cv/semantic_segmentation/icnet/pytorch/README.md @@ -1,6 +1,6 @@ # ICNet -## Model description +## Model Description An image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance ICNet provide in-depth analysis of the framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. @@ -47,7 +47,7 @@ coco2017 bash train_icnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/lednet/pytorch/README.md b/cv/semantic_segmentation/lednet/pytorch/README.md index 2055c36bf..8b491b2e8 100644 --- a/cv/semantic_segmentation/lednet/pytorch/README.md +++ b/cv/semantic_segmentation/lednet/pytorch/README.md @@ -1,14 +1,14 @@ # LedNet -## Model description +## Model Description A lightweight network to address this problem, namely LEDNet, which employs an asymmetric encoder-decoder architecture for the task of real-time semantic segmentation. More specifically, the encoder adopts a ResNet as backbone network, where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, an attention pyramid network (APN) is employed in the decoder to further lighten the entire network complexity. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -49,7 +49,7 @@ coco2017 bash train_lednet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/linknet/pytorch/README.md b/cv/semantic_segmentation/linknet/pytorch/README.md index a632731d7..f63846b56 100644 --- a/cv/semantic_segmentation/linknet/pytorch/README.md +++ b/cv/semantic_segmentation/linknet/pytorch/README.md @@ -1,13 +1,13 @@ # LinkNet -## Model description +## Model Description A novel deep neural network architecture which allows it to learn without any significant increase in number of parameters. The network uses only 11.5 million parameters and 21.2 GFLOPs for processing an image of resolution 3x640x360. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -48,7 +48,7 @@ coco2017 bash train_linknet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/mask2former/pytorch/README.md b/cv/semantic_segmentation/mask2former/pytorch/README.md index 7f1827a37..b0ba1b3ed 100644 --- a/cv/semantic_segmentation/mask2former/pytorch/README.md +++ b/cv/semantic_segmentation/mask2former/pytorch/README.md @@ -1,6 +1,6 @@ # Mask2Former -## Model description +## Model Description Mask2Former adopts the same meta architecture as MaskFormer, with our proposed Transformer decoder replacing the standard one. The key components of our Transformer decoder include a masked attention operator, which extracts localized features by constraining cross-attention to within the foreground region of the predicted mask for each query, instead of attending to the full feature map. To handle small objects, we propose an efficient multi-scale strategy to utilize high-resolution features. It feeds successive feature maps from the pixel decoder’s feature pyramid into successive Transformer decoder layers in a round-robin fashion. Finally, we incorporate optimization improvements that boost model performance without introducing additional computation. @@ -49,18 +49,18 @@ cityscapes/ └── munster ``` -## Step 3: Training +## Model Training ```bash DETECTRON2_DATASETS=/path/to/cityscapes/ python3 train_net.py --num-gpus 8 --config-file configs/cityscapes/semantic-segmentation/maskformer2_R50_bs16_90k.yaml 1> train_mask2former.log 2> train_mask2former_error.log & tail -f train_mask2former.log ``` -## Results +## Model Results |GPUs| fps | IoU Score Average | nIoU Score Average | | --- | --- | --- | --- | | BI-V100×8 | 11.52 | 0.795 | 0.624 | -## Reference +## References - [Mask2Former](https://github.com/facebookresearch/Mask2Former) diff --git a/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md b/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md index 67fa39498..7d2eca49b 100644 --- a/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md +++ b/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md @@ -1,6 +1,6 @@ # MobileSeg -## Model description +## Model Description MobileSeg models adopt encoder-decoder architecture and use lightweight models as encoder. These semantic segmentation models are designed for mobile and edge devices. @@ -67,7 +67,7 @@ drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit -rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt ``` -## Step 3: Training +## Model Training ```bash # Change '/path/to/cityscapes' as your local Cityscapes dataset path @@ -86,7 +86,7 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 train.py \ --use_vdl ``` -## Results +## Model Results ### Cityscapes @@ -96,5 +96,5 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 train.py \ | ------ | --------- | ------: | -------- |--------------:| | MobileSeg | 512x1024 | 80000 | 28.68 | 0.726 | -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file diff --git a/cv/semantic_segmentation/ocnet/pytorch/README.md b/cv/semantic_segmentation/ocnet/pytorch/README.md index e525aed02..4704d03b3 100644 --- a/cv/semantic_segmentation/ocnet/pytorch/README.md +++ b/cv/semantic_segmentation/ocnet/pytorch/README.md @@ -1,6 +1,6 @@ # OCNet -## Model description +## Model Description OCNet: Object Context Network , which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. @@ -9,9 +9,9 @@ We propose to use a dense relation matrix to serve as a surrogate for the binary The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. We propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -52,7 +52,7 @@ coco2017 bash train_ocnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md b/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md index 93f269b25..066c7ad61 100644 --- a/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md +++ b/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md @@ -1,6 +1,6 @@ # OCRNet -## Model description +## Model Description We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. @@ -68,7 +68,7 @@ tree -L 2 /path/to/cityscapes └── val.txt ``` -## Step 3: Training +## Model Training ```bash # Change '/path/to/cityscapes' as your local Cityscapes dataset path @@ -92,10 +92,10 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py \ --use_vdl ``` -## Results +## Model Results | GPU | IPS | ACC | |:-----------:|:----------------:|:-----------:| | BI-V100 x 4 | 2.12 samples/sec | mIoU=0.8120 | -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/ocrnet/pytorch/README.md b/cv/semantic_segmentation/ocrnet/pytorch/README.md index ad6cdaca1..92bc6e887 100644 --- a/cv/semantic_segmentation/ocrnet/pytorch/README.md +++ b/cv/semantic_segmentation/ocrnet/pytorch/README.md @@ -1,6 +1,6 @@ # OCRNet -## Model description +## Model Description Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation It presents a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. @@ -63,6 +63,6 @@ when using 8 x GPU bash train_distx8.sh ``` -## Reference +## References Ref: https://github.com/HRNet/HRNet-Semantic-Segmentation diff --git a/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md b/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md index 2eb02cca4..f33c35911 100644 --- a/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md +++ b/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md @@ -1,6 +1,6 @@ # PP-HumanSegV1 -## Model description +## Model Description Human segmentation is a high-frequency application in the field of image segmentation. Generally, human segmentation can be classified as portrait segmentation and general human segmentation. @@ -47,7 +47,7 @@ PP-HumanSeg14K/ ``` -## Step 3: Training +## Model Training ```bash # Change ./contrib/PP-HumanSeg/configs/portrait_pp_humansegv1_lite.yml dataset path as your dataset path @@ -63,11 +63,11 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py \ --save_interval 500 ``` -## Results +## Model Results | GPUS | mIoU | Acc | Kappa |Dice | FPS | | ---------- | ------ | ------ | ------ | ------ | ----- | | BI-V100 x8 | 0.9591 | 0.9836 | 0.9581 | 0.9790 | 24.54 | -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file diff --git a/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md b/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md index 997c21107..c36eac62c 100644 --- a/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md +++ b/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md @@ -1,6 +1,6 @@ # PP-HumanSegV2 -## Model description +## Model Description Human segmentation is a high-frequency application in the field of image segmentation. Generally, human segentation can be classified as portrait segmentation and general human segmentation. @@ -46,7 +46,7 @@ PP-HumanSeg14K/ ``` -## Step 3: Training +## Model Training ```bash # Change ./contrib/PP-HumanSeg/configs/portrait_pp_humansegv2_lite.yml dataset path as your dateset path @@ -62,7 +62,7 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py \ --save_interval 500 ``` -## Results +## Model Results | MODEL | mIoU |Acc | Kappa |Dice | | ---------- | ------ |------ |--------|----- | @@ -72,5 +72,5 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py \ | ---------- | ------ | | BI-V100x 8 | 34.0294 | -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file diff --git a/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md b/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md index 023634d0e..2efe73901 100644 --- a/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md +++ b/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md @@ -1,6 +1,6 @@ # PP-LiteSeg -## Model description +## Model Description PP-LiteSeg is a novel lightweight model for the real-time semantic segmentation task. Specifically, the model presents a Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature representations, this model proposes a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid Pooling Module (SPPM) is proposed to aggregate global context with low computation cost. @@ -53,7 +53,7 @@ PaddleSeg/data │   └── val.txt ``` -## Step 3: Training +## Model Training ``` cd .. @@ -81,11 +81,11 @@ python3 tools/train.py \ --use_vdl ``` -## Results +## Model Results | Method | Backbone | Training Iters | FPS (BI x 8) | mIOU | | ------ | --------- | ------ | -------- |--------------:| | PP-LiteSeg-T | STDC1 | 160000 | 28.8 | 73.19% | -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file diff --git a/cv/semantic_segmentation/psanet/pytorch/README.md b/cv/semantic_segmentation/psanet/pytorch/README.md index 67b9c7c01..41c5c74b3 100644 --- a/cv/semantic_segmentation/psanet/pytorch/README.md +++ b/cv/semantic_segmentation/psanet/pytorch/README.md @@ -1,15 +1,15 @@ # PSANet -## Model description +## Model Description The point-wise spatial attention network (PSANet) to relax the local neighborhood constraint. Each position on the feature map is connected to all the other ones through a self-adaptively learned attention mask. Moreover, information propagation in bi-direction for scene parsing is enabled. Information at other positions can be collected to help the prediction of the current position and vice versa, information at the current position can be distributed to assist the prediction of other ones. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -50,7 +50,7 @@ coco2017 bash train_psanet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/ycszen/TorchSeg Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/pspnet/pytorch/README.md b/cv/semantic_segmentation/pspnet/pytorch/README.md index c17af4366..a60120902 100644 --- a/cv/semantic_segmentation/pspnet/pytorch/README.md +++ b/cv/semantic_segmentation/pspnet/pytorch/README.md @@ -1,13 +1,13 @@ # PSPNet -## Model descripstion +## Model Description An effective pyramid scene parsing network for complex scene understanding. The global pyramid pooling feature provides additional contextual information. PSPNet provides a superior framework for pixellevel prediction. The proposed approach achieves state-ofthe-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell # Install libGL @@ -67,5 +67,5 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py 8 ``` -## Reference +## References [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/refinenet/pytorch/README.md b/cv/semantic_segmentation/refinenet/pytorch/README.md index fc493a4ab..0b9eb4ae8 100644 --- a/cv/semantic_segmentation/refinenet/pytorch/README.md +++ b/cv/semantic_segmentation/refinenet/pytorch/README.md @@ -1,15 +1,15 @@ # RefineNet -## Model description +## Model Description RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -50,7 +50,7 @@ coco2017 bash train_refinenet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/segnet/pytorch/README.md b/cv/semantic_segmentation/segnet/pytorch/README.md index 90d73df51..12f58d364 100644 --- a/cv/semantic_segmentation/segnet/pytorch/README.md +++ b/cv/semantic_segmentation/segnet/pytorch/README.md @@ -1,6 +1,6 @@ # SegNet -## Model description +## Model Description SegNet is a semantic segmentation model. This core trainable segmentation architecture consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. @@ -9,9 +9,9 @@ The role of the decoder network is to map the low resolution encoder feature map The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature maps. Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -52,6 +52,6 @@ coco2017 bash train_segnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/stdc/paddlepaddle/README.md b/cv/semantic_segmentation/stdc/paddlepaddle/README.md index 7ca0eff3f..5851beede 100644 --- a/cv/semantic_segmentation/stdc/paddlepaddle/README.md +++ b/cv/semantic_segmentation/stdc/paddlepaddle/README.md @@ -1,6 +1,6 @@ # STDC -## Model description +## Model Description A novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC @@ -54,7 +54,7 @@ python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes - python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," ``` -## Step 3: Training +## Model Training ```bash # Change '/path/to/cityscapes' as your local Cityscapes dataset path @@ -75,12 +75,12 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 train.py \ --use_vdl ``` -## Results +## Model Results | GPUs | Crop Size | Lr schd | FPS | mIoU | | ------ | --------- | ------: | -------- |--------------:| | BI-V100 x8 | 512x1024 | 80000 | 14.36 | 0.7466 | -## Reference +## References - [cityscapes](https://mmsegmentation.readthedocs.io/en/latest/dataset_prepare.html#cityscapes) - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/stdc/pytorch/README.md b/cv/semantic_segmentation/stdc/pytorch/README.md index 260b68495..ba05e33a9 100644 --- a/cv/semantic_segmentation/stdc/pytorch/README.md +++ b/cv/semantic_segmentation/stdc/pytorch/README.md @@ -1,6 +1,6 @@ # STDC -## Model description +## Model Description We propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC @@ -58,7 +58,7 @@ mkdir -p data/ ln -s /path/to/cityscapes data/ ``` -## Step 3: Training +## Model Training ### Training on single card ```shell @@ -71,11 +71,11 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/stdc/stdc1_4xb12-80k_cityscapes-512x1024.py 8 ``` -## Results +## Model Results | GPUs | Crop Size | Lr schd | FPS | mIoU | | ------ | --------- | ------: | -------- |--------------:| | BI-V100 x8 | 512x1024 | 20000 | 39.38 | 70.74 | -## Reference +## References [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/torchvision/pytorch/README.md b/cv/semantic_segmentation/torchvision/pytorch/README.md index bbb81d671..ee0ae9cd7 100644 --- a/cv/semantic_segmentation/torchvision/pytorch/README.md +++ b/cv/semantic_segmentation/torchvision/pytorch/README.md @@ -13,40 +13,46 @@ You must modify the following flags: ## One Card ### deeplabv3_resnet50 -``` + +```bash python3 train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet50 --aux-loss ``` ### deeplabv3_resnet101 -``` + +```bash python3 train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet101 --aux-loss ``` ### deeplabv3_mobilenet_v3_large -``` + +```bash python3 train.py --dataset coco -b 4 --model deeplabv3_mobilenet_v3_large --aux-loss --wd 0.000001 ``` ## DDP ### deeplabv3_resnet50 -``` + +```bash python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet50 --aux-loss ``` ### deeplabv3_resnet101 -``` + +```bash python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --lr 0.02 --dataset coco -b 4 --model deeplabv3_resnet101 --aux-loss ``` ### deeplabv3_mobilenet_v3_large -``` + +```bash python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --dataset coco -b 4 --model deeplabv3_mobilenet_v3_large --aux-loss --wd 0.000001 ``` ## Arguments -``` +```bash --data-path DATA_PATH dataset path --dataset DATASET dataset name, [coco | camvid] diff --git a/cv/semantic_segmentation/unet++/pytorch/README.md b/cv/semantic_segmentation/unet++/pytorch/README.md index 4cdfeae15..4cebd7b99 100644 --- a/cv/semantic_segmentation/unet++/pytorch/README.md +++ b/cv/semantic_segmentation/unet++/pytorch/README.md @@ -1,6 +1,6 @@ # UNet++: A Nested U-Net Architecture for Medical Image Segmentation -## Model description +## Model Description We present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially a deeply supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip @@ -45,7 +45,7 @@ If there is no `DRIVE` dataset locally, you can download `DRIVE` from a file ser python3 tools/convert_datasets/drive.py /path/to/training.zip /path/to/test.zip ``` -## Step 3: Training +## Model Training ### Training on single card ```shell python3 tools/train.py configs/unet/unet-s5-d16_pspnet_4xb4-40k_drive-64x64.py @@ -57,11 +57,11 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/unet/unet-s5-d16_pspnet_4xb4-40k_drive-64x64.py 8 ``` -## Results +## Model Results | GPUs| Crop Size | Lr schd | FPS | mDice | | ------ | --------- | ------: | -------- |--------------:| | BI-V100 x8 | 64x64 | 40000 | 238.9 | 87.52 | -## Reference +## References [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/unet/paddlepaddle/README.md b/cv/semantic_segmentation/unet/paddlepaddle/README.md index 379914767..b2644e928 100644 --- a/cv/semantic_segmentation/unet/paddlepaddle/README.md +++ b/cv/semantic_segmentation/unet/paddlepaddle/README.md @@ -1,5 +1,5 @@ # UNet -## Model description +## Model Description A network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. @@ -13,7 +13,7 @@ pip3 install urllib3==1.26.6 yum install mesa-libGL ``` -## Step 2: Prepare Datasets +### Prepare Resources Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. @@ -68,7 +68,7 @@ drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit -rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt ``` -## Step 3: Training +## Model Training Notice: modify configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml file, modify the datasets path as yours. The training is use AMP model. ``` @@ -85,5 +85,5 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3 train.py \ --amp_level O1 ``` -## Reference +## References - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file diff --git a/cv/semantic_segmentation/unet/pytorch/README.md b/cv/semantic_segmentation/unet/pytorch/README.md index a137e48a1..386028af1 100644 --- a/cv/semantic_segmentation/unet/pytorch/README.md +++ b/cv/semantic_segmentation/unet/pytorch/README.md @@ -1,14 +1,14 @@ # UNet -## Model description +## Model Description A network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell @@ -49,7 +49,7 @@ coco2017 bash train_unet_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References Ref: https://github.com/LikeLy-Journey/SegmenTron Ref: [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/unet3d/pytorch/README.md b/cv/semantic_segmentation/unet3d/pytorch/README.md index 0cad8285e..a5e180cce 100644 --- a/cv/semantic_segmentation/unet3d/pytorch/README.md +++ b/cv/semantic_segmentation/unet3d/pytorch/README.md @@ -1,6 +1,6 @@ # 3D-UNet -## Model description +## Model Description A network for volumetric segmentation that learns from sparsely annotated volumetric images. Two attractive use cases of this method: @@ -9,9 +9,9 @@ Two attractive use cases of this method: The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. -## Step 1: Installing +## Model Preparation -### Install packages +### Install Dependencies ```shell pip3 install 'scipy' 'tqdm' @@ -56,6 +56,6 @@ bash train_dist.sh | ---------------------- | ------------------------------------------ | ------------- | ---------- | ------------ | ------------- | ------------------------- | ----------- | | 0.908 | SDK V2.2, bs:4, 8x, fp32 | 12 | 0.908 | 152\*8 | 0.85 | 19.6\*8 | 1 | -## Reference +## References Reference: https://github.com/mlcommons/training/tree/master/image_segmentation/pytorch diff --git a/cv/semantic_segmentation/vnet/tensorflow/README.md b/cv/semantic_segmentation/vnet/tensorflow/README.md index 56c2d8cc5..e198ed0b8 100644 --- a/cv/semantic_segmentation/vnet/tensorflow/README.md +++ b/cv/semantic_segmentation/vnet/tensorflow/README.md @@ -1,6 +1,6 @@ # VNet -## Model description +## Model Description [V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation](https://arxiv.org/abs/1606.04797) Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi ## Prepare -- Gitee From 9070da97704a5c72bdf163b3469b9d28678bac6d Mon Sep 17 00:00:00 2001 From: "mingjiang.li" Date: Mon, 10 Mar 2025 14:07:36 +0800 Subject: [PATCH 4/4] unify model readme format - semantic_segmentation 2nd part Signed-off-by: mingjiang.li --- .../speech_synthesis/vqmivc/pytorch/README.md | 20 ++---- .../dunet/pytorch/README.md | 6 +- .../fastfcn/paddlepaddle/README.md | 33 +++++----- .../fastscnn/pytorch/README.md | 39 +++++------ .../fcn/pytorch/README.md | 42 ++++++------ .../fpenet/pytorch/README.md | 39 +++++------ .../gcnet/pytorch/README.md | 65 ++++++++++-------- .../hardnet/pytorch/README.md | 38 +++++------ .../icnet/pytorch/README.md | 37 +++++------ .../lednet/pytorch/README.md | 38 +++++------ .../linknet/pytorch/README.md | 35 +++++----- .../mask2former/pytorch/README.md | 55 +++++++++------- .../mobileseg/paddlepaddle/README.md | 65 ++++++++---------- .../ocnet/pytorch/README.md | 42 ++++++------ .../ocrnet/paddlepaddle/README.md | 52 +++++++++------ .../ocrnet/pytorch/README.md | 43 ++++++------ .../pp_humansegv1/paddlepaddle/README.md | 53 +++++++-------- .../pp_humansegv2/paddlepaddle/README.md | 60 ++++++++--------- .../pp_liteseg/paddlepaddle/README.md | 50 +++++++------- .../psanet/pytorch/README.md | 39 ++++++----- .../pspnet/pytorch/README.md | 59 +++++++++-------- .../refinenet/pytorch/README.md | 40 +++++------ .../segnet/pytorch/README.md | 41 ++++++------ .../stdc/paddlepaddle/README.md | 47 +++++++------ .../stdc/pytorch/README.md | 63 +++++++++--------- .../unet++/pytorch/README.md | 66 +++++++++---------- .../unet/paddlepaddle/README.md | 56 +++++++--------- .../unet/pytorch/README.md | 36 +++++----- .../unet3d/pytorch/README.md | 45 ++++++------- .../vnet/tensorflow/README.md | 40 ++++++----- docs/MODEL_TEMPLATE.md | 2 +- 31 files changed, 682 insertions(+), 664 deletions(-) diff --git a/audio/speech_synthesis/vqmivc/pytorch/README.md b/audio/speech_synthesis/vqmivc/pytorch/README.md index b7e062d39..4afac17dc 100644 --- a/audio/speech_synthesis/vqmivc/pytorch/README.md +++ b/audio/speech_synthesis/vqmivc/pytorch/README.md @@ -2,17 +2,12 @@ ## Model Description -One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker -utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally -ignores the correlation between different speech representations during training, which causes leakage of content -information into the speaker representation and thus degrades VC performance. To alleviate this issue, we employ vector -quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training, -to achieve proper disentanglement of content, speaker and pitch representations, by reducing their inter-dependencies in -an unsupervised manner. Experimental results reflect the superiority of the proposed method in learning effective -disentangled speech representations for retaining source linguistic content and intonation variations, while capturing -target speaker characteristics. In doing so, the proposed approach achieves higher speech naturalness and speaker -similarity than current state-of-the-art one-shot VC systems. Our code, pre-trained models and demo are available at -. +VQMIVC is an advanced deep learning model for one-shot voice conversion that employs vector quantization and mutual +information minimization to disentangle speech representations. It effectively separates content, speaker, and pitch +information, enabling high-quality voice conversion with just a single target-speaker utterance. By reducing +inter-dependencies between speech components, VQMIVC achieves superior naturalness and speaker similarity compared to +traditional methods. This unsupervised approach is particularly effective for retaining source linguistic content while +accurately capturing target speaker characteristics. ## Model Preparation @@ -25,10 +20,9 @@ wget https://datashare.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip unzip VCTK-Corpus-0.92.zip ``` -## Preprocess Data +### Install Dependencies ```sh -cd ${DEEPSPARKHUB_ROOT}/speech/speech_synthesis/vqmivc/pytorch/ pip3 install -r requirements_bi.txt ln -s /home/data/vqmivc . python3 preprocess.py diff --git a/cv/semantic_segmentation/dunet/pytorch/README.md b/cv/semantic_segmentation/dunet/pytorch/README.md index 730fc4af0..42ec2f9cf 100644 --- a/cv/semantic_segmentation/dunet/pytorch/README.md +++ b/cv/semantic_segmentation/dunet/pytorch/README.md @@ -12,9 +12,11 @@ the receptive fields according to vessels' scales and shapes. ### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 diff --git a/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md b/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md index 4a7c53ec3..0e8b399ba 100644 --- a/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md +++ b/cv/semantic_segmentation/fastfcn/paddlepaddle/README.md @@ -7,19 +7,13 @@ uses an efficient encoder-decoder architecture and depthwise separable convoluti simplified design allows FastFCN to run much faster than prior FCNs while maintaining good segmentation quality. FastFCN demonstrates real-time segmentation is possible with a carefully designed lightweight CNN architecture. -## Step 1: Installation +## Model Preparation -```bash -git clone -b release/2.9 --recursive https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -yum install -y mesa-libGL -pip3 install scikit-learn easydict visualdl==2.2.0 urllib3==1.26.6 -pip3 install -v -e . -``` +### Prepare Resources -## Step 2: Preparing datasets - -Download the ADEChallengeData2016 from [Scene Parsing Challenge 2016](http://sceneparsing.csail.mit.edu/index_challenge.html). Alternatively, you can skip this step because the PaddleSeg framework will automatically download it for you. +Download the ADEChallengeData2016 from [Scene Parsing Challenge +2016](http://sceneparsing.csail.mit.edu/index_challenge.html). Alternatively, you can skip this step because the +PaddleSeg framework will automatically download it for you. The ADEChallengeData2016 dataset path structure should look like: @@ -37,6 +31,16 @@ ADEChallengeData2016 └── sceneCategories.txt ``` +### Install Dependencies + +```bash +git clone -b release/2.9 --recursive https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +yum install -y mesa-libGL +pip3 install scikit-learn easydict visualdl==2.2.0 urllib3==1.26.6 +pip3 install -v -e . +``` + ## Model Training ```bash @@ -47,7 +51,6 @@ cd PaddleClas/ ln -s /path/to/ADEChallengeData2016 ./data/ADEChallengeData2016 # 8 GPUs - export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m paddle.distributed.launch --gpus=0,1,2,3,4,5,6,7 tools/train.py --config configs/fastfcn/fastfcn_resnet50_os8_ade20k_480x480_120k.yml @@ -57,9 +60,9 @@ python3 tools/val.py --config configs/fastfcn/fastfcn_resnet50_os8_ade20k_480x4 ## Model Results -| GPUs | mIoU | Acc |Kappa | Dice | ips | -|:-----------:|:-----------:|:-----------:|:------------:|:------------:|:-----------:| -| BI-V100 x 8 |0.4312 | 0.8083 | 0.7935 | 0.570 | 33.68 | +| GPUs | mIoU | Acc | Kappa | Dice | ips | +|-------------|--------|--------|--------|-------|-------| +| BI-V100 x 8 | 0.4312 | 0.8083 | 0.7935 | 0.570 | 33.68 | ## References diff --git a/cv/semantic_segmentation/fastscnn/pytorch/README.md b/cv/semantic_segmentation/fastscnn/pytorch/README.md index 1d90a9424..b8b5c4a63 100644 --- a/cv/semantic_segmentation/fastscnn/pytorch/README.md +++ b/cv/semantic_segmentation/fastscnn/pytorch/README.md @@ -2,27 +2,20 @@ ## Model Description -The fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. -The 'learning to downsample' module which computes low-level features for multiple resolution branches simultaneously. -The network combines spatial detail at high resolution with deep features extracted at lower resolution. +The fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high +resolution image data (1024x2048px) suited to efficient computation on embedded devices with low memory. The 'learning +to downsample' module which computes low-level features for multiple resolution branches simultaneously. The network +combines spatial detail at high resolution with deep features extracted at lower resolution. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing dataset +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,7 +36,15 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell + +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' + +``` + +### Model Training ```shell bash train_fastscnn_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -51,5 +52,5 @@ bash train_fastscnn_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +-[torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/fcn/pytorch/README.md b/cv/semantic_segmentation/fcn/pytorch/README.md index 2555cb0d5..4ae8295ca 100644 --- a/cv/semantic_segmentation/fcn/pytorch/README.md +++ b/cv/semantic_segmentation/fcn/pytorch/README.md @@ -2,30 +2,22 @@ ## Model Description -Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. -They employ solely locally connected layers, such as convolution, pooling and upsampling. -Avoiding the use of dense layers means less parameters (making the networks faster to train). -It also means an FCN can work for variable image sizes given all connections are local. -The network consists of a downsampling path, used to extract and interpret the context, and an upsampling path, which allows for localization. -FCNs also employ skip connections to recover the fine-grained spatial information lost in the downsampling path. +Fully Convolutional Networks, or FCNs, are an architecture used mainly for semantic segmentation. They employ solely +locally connected layers, such as convolution, pooling and upsampling. Avoiding the use of dense layers means less +parameters (making the networks faster to train). It also means an FCN can work for variable image sizes given all +connections are local. The network consists of a downsampling path, used to extract and interpret the context, and an +upsampling path, which allows for localization. FCNs also employ skip connections to recover the fine-grained spatial +information lost in the downsampling path. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -46,7 +38,15 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell + +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' + +``` + +## Model Training ```shell bash train_fcn_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -54,4 +54,4 @@ bash train_fcn_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/fpenet/pytorch/README.md b/cv/semantic_segmentation/fpenet/pytorch/README.md index 06d1c83c8..b6c5c7795 100644 --- a/cv/semantic_segmentation/fpenet/pytorch/README.md +++ b/cv/semantic_segmentation/fpenet/pytorch/README.md @@ -2,27 +2,20 @@ ## Model Description -A lightweight feature pyramid encoding network (FPENet) to make a good trade-off between accuracy and speed. -Specifically, use a feature pyramid encoding block to encode multi-scale contextual features with depthwise dilated convolutions in all stages of the encoder. -A mutual embedding upsample module is introduced in the decoder to aggregate the high-level semantic features and low-level spatial details efficiently. +A lightweight feature pyramid encoding network (FPENet) to make a good trade-off between accuracy and speed. +Specifically, use a feature pyramid encoding block to encode multi-scale contextual features with depthwise dilated +convolutions in all stages of the encoder. A mutual embedding upsample module is introduced in the decoder to aggregate +the high-level semantic features and low-level spatial details efficiently. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,7 +36,15 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell + +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' + +``` + +## Model Training ```shell bash train_fpenet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -51,5 +52,5 @@ bash train_fpenet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron]https://github.com/LikeLy-Journey/SegmenTron +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/gcnet/pytorch/README.md b/cv/semantic_segmentation/gcnet/pytorch/README.md index 0824b2c67..4e2ba8162 100755 --- a/cv/semantic_segmentation/gcnet/pytorch/README.md +++ b/cv/semantic_segmentation/gcnet/pytorch/README.md @@ -2,33 +2,19 @@ ## Model Description -A Global Context Network, or GCNet, utilises global context blocks to model long-range dependencies in images. -It is based on the Non-Local Network, but it modifies the architecture so less computation is required. -Global context blocks are applied to multiple layers in a backbone network to construct the GCNet. +A Global Context Network, or GCNet, utilises global context blocks to model long-range dependencies in images. It is +based on the Non-Local Network, but it modifies the architecture so less computation is required. Global context blocks +are applied to multiple layers in a backbone network to construct the GCNet. -## Step 1: Installing -### Install packages +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install mmsegmentation -git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 -cd mmsegmentation/ -pip install -v -e . - -pip install ftfy -``` - -### Datasets +### Prepare Resources -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -55,11 +41,15 @@ cityscapes/ mkdir data/ ln -s /path/to/cityscapes data/cityscapes ``` + - convert_datasets + ```bash python3 tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8 ``` + - when done data folder looks like + ```bash data/ ├── cityscapes @@ -76,17 +66,34 @@ data/    └── val.lst ``` -## Step 2: Training -### Training on single card -```shell -python3 tools/train.py configs/gcnet/gcnet_r50-d8_4xb2-40k_cityscapes-769x769.py +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install mmsegmentation +git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 +cd mmsegmentation/ +pip install -v -e . + +pip install ftfy ``` -### Training on mutil-cards +## Model Training + ```shell +# Training on single card +python3 tools/train.py configs/gcnet/gcnet_r50-d8_4xb2-40k_cityscapes-769x769.py + +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/gcnet/gcnet_r50-d8_4xb2-40k_cityscapes-769x769.py 8 ``` ## References -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) \ No newline at end of file + +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/hardnet/pytorch/README.md b/cv/semantic_segmentation/hardnet/pytorch/README.md index 70dfa2651..687683692 100644 --- a/cv/semantic_segmentation/hardnet/pytorch/README.md +++ b/cv/semantic_segmentation/hardnet/pytorch/README.md @@ -2,27 +2,19 @@ ## Model Description -The Harmonic Densely Connected Network to achieve high efficiency in terms of both low MACs and memory traffic. -The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. - +The Harmonic Densely Connected Network to achieve high efficiency in terms of both low MACs and memory traffic. The new +network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, +ResNet-50, ResNet-152, and SSD-VGG, respectively. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,7 +35,15 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell + +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' + +``` + +## Model Training ```shell bash train_hardnet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -51,5 +51,5 @@ bash train_hardnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/icnet/pytorch/README.md b/cv/semantic_segmentation/icnet/pytorch/README.md index ea230afd8..fb48a24a6 100644 --- a/cv/semantic_segmentation/icnet/pytorch/README.md +++ b/cv/semantic_segmentation/icnet/pytorch/README.md @@ -2,25 +2,19 @@ ## Model Description -An image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance -ICNet provide in-depth analysis of the framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. +An image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance ICNet provide +in-depth analysis of the framework and introduce the cascade feature fusion unit to quickly achieve high-quality +segmentation. -## Step 1: Installing -### Install packages +## Model Preparation -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -41,7 +35,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_icnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -49,6 +49,5 @@ bash train_icnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) -~ +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/lednet/pytorch/README.md b/cv/semantic_segmentation/lednet/pytorch/README.md index 8b491b2e8..06b5817ca 100644 --- a/cv/semantic_segmentation/lednet/pytorch/README.md +++ b/cv/semantic_segmentation/lednet/pytorch/README.md @@ -2,27 +2,21 @@ ## Model Description -A lightweight network to address this problem, namely LEDNet, which employs an asymmetric encoder-decoder architecture for the task of real-time semantic segmentation. -More specifically, the encoder adopts a ResNet as backbone network, where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation cost while maintaining higher segmentation accuracy. -On the other hand, an attention pyramid network (APN) is employed in the decoder to further lighten the entire network complexity. +A lightweight network to address this problem, namely LEDNet, which employs an asymmetric encoder-decoder architecture +for the task of real-time semantic segmentation. More specifically, the encoder adopts a ResNet as backbone network, +where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation +cost while maintaining higher segmentation accuracy. On the other hand, an attention pyramid network (APN) is employed +in the decoder to further lighten the entire network complexity. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,7 +37,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_lednet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -51,5 +51,5 @@ bash train_lednet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/linknet/pytorch/README.md b/cv/semantic_segmentation/linknet/pytorch/README.md index f63846b56..73bc80fec 100644 --- a/cv/semantic_segmentation/linknet/pytorch/README.md +++ b/cv/semantic_segmentation/linknet/pytorch/README.md @@ -2,26 +2,19 @@ ## Model Description -A novel deep neural network architecture which allows it to learn without any significant increase in number of parameters. -The network uses only 11.5 million parameters and 21.2 GFLOPs for processing an image of resolution 3x640x360. +A novel deep neural network architecture which allows it to learn without any significant increase in number of +parameters. The network uses only 11.5 million parameters and 21.2 GFLOPs for processing an image of resolution +3x640x360. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -42,7 +35,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_linknet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -50,5 +49,5 @@ bash train_linknet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/mask2former/pytorch/README.md b/cv/semantic_segmentation/mask2former/pytorch/README.md index b0ba1b3ed..e3493e747 100644 --- a/cv/semantic_segmentation/mask2former/pytorch/README.md +++ b/cv/semantic_segmentation/mask2former/pytorch/README.md @@ -2,29 +2,20 @@ ## Model Description -Mask2Former adopts the same meta architecture as MaskFormer, with our proposed Transformer decoder replacing the standard one. The key components of our Transformer decoder include a masked attention operator, which extracts localized features by constraining cross-attention to within the foreground region of the predicted mask for each query, instead of attending to the full feature map. To handle small objects, we propose an efficient multi-scale strategy to utilize high-resolution features. It feeds successive feature maps from the pixel decoder’s feature pyramid into successive Transformer decoder layers in a round-robin fashion. Finally, we incorporate optimization improvements that boost model performance without introducing additional computation. +Mask2Former adopts the same meta architecture as MaskFormer, with our proposed Transformer decoder replacing the +standard one. The key components of our Transformer decoder include a masked attention operator, which extracts +localized features by constraining cross-attention to within the foreground region of the predicted mask for each query, +instead of attending to the full feature map. To handle small objects, we propose an efficient multi-scale strategy to +utilize high-resolution features. It feeds successive feature maps from the pixel decoder’s feature pyramid into +successive Transformer decoder layers in a round-robin fashion. Finally, we incorporate optimization improvements that +boost model performance without introducing additional computation. -## Step 1: Installation +## Model Preparation -```bash -# Install mesa-libGL -yum install mesa-libGL -y +### Prepare Resources -# Install requirements -pip3 install urllib3==1.26.6 -pip3 install 'git+https://github.com/facebookresearch/detectron2.git@d779ea63faa54fe42b9b4c280365eaafccb280d6' -pip3 install cityscapesscripts - -pip3 install -r requirements.txt - -cd mask2former/modeling/pixel_decoder/ops -sh make.sh -cd - -``` - -## Step 2: Preparing datasets - -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure should look like: @@ -49,6 +40,24 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +# Install mesa-libGL +yum install mesa-libGL -y + +# Install requirements +pip3 install urllib3==1.26.6 +pip3 install 'git+https://github.com/facebookresearch/detectron2.git@d779ea63faa54fe42b9b4c280365eaafccb280d6' +pip3 install cityscapesscripts + +pip3 install -r requirements.txt + +cd mask2former/modeling/pixel_decoder/ops +sh make.sh +cd - +``` + ## Model Training ```bash @@ -57,9 +66,9 @@ DETECTRON2_DATASETS=/path/to/cityscapes/ python3 train_net.py --num-gpus 8 --con ## Model Results -|GPUs| fps | IoU Score Average | nIoU Score Average | -| --- | --- | --- | --- | -| BI-V100×8 | 11.52 | 0.795 | 0.624 | +| GPU | fps | IoU Score Average | nIoU Score Average | +|------------|-------|-------------------|--------------------| +| BI-V100 ×8 | 11.52 | 0.795 | 0.624 | ## References diff --git a/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md b/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md index 7d2eca49b..6074932be 100644 --- a/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md +++ b/cv/semantic_segmentation/mobileseg/paddlepaddle/README.md @@ -2,25 +2,18 @@ ## Model Description -MobileSeg models adopt encoder-decoder architecture and use lightweight models as encoder. +MobileSeg models adopt encoder-decoder architecture and use lightweight models as encoder. These semantic segmentation models are designed for mobile and edge devices. -## Step 1: Installation +## Model Preparation -```bash -git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -``` - -## Step 2: Preparing datasets +### Prepare Resources -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -43,28 +36,25 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +``` + +### Preprocess Data + ```bash -# Datasets preprocessing + pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 - python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," - -# CityScapes PATH as follow: -ls -al /path/to/cityscapes -total 11567948 -drwxr-xr-x 4 root root 227 Jul 18 03:32 . -drwxr-xr-x 6 root root 179 Jul 18 06:48 .. --rw-r--r-- 1 root root 298 Feb 20 2016 README -drwxr-xr-x 5 root root 58 Jul 18 03:30 gtFine --rw-r--r-- 1 root root 252567705 Jul 18 03:22 gtFine_trainvaltest.zip -drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit --rw-r--r-- 1 root root 11592327197 Jul 18 03:27 leftImg8bit_trainvaltest.zip --rw-r--r-- 1 root root 1646 Feb 17 2016 license.txt --rw-r--r-- 1 root root 193690 Jul 18 03:32 test.txt --rw-r--r-- 1 root root 398780 Jul 18 03:32 train.txt --rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt ``` ## Model Training @@ -88,13 +78,10 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 train.py \ ## Model Results -### Cityscapes - -#### Accuracy - -| Method | Crop Size | Lr schd | FPS (BI x 8) | mIOU | -| ------ | --------- | ------: | -------- |--------------:| -| MobileSeg | 512x1024 | 80000 | 28.68 | 0.726 | +| Method | Crop Size | Lr schd | FPS (BI x 8) | mIOU | +|-----------|-----------|---------|--------------|-------| +| MobileSeg | 512x1024 | 80000 | 28.68 | 0.726 | ## References -- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file + +- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/ocnet/pytorch/README.md b/cv/semantic_segmentation/ocnet/pytorch/README.md index 4704d03b3..564782d98 100644 --- a/cv/semantic_segmentation/ocnet/pytorch/README.md +++ b/cv/semantic_segmentation/ocnet/pytorch/README.md @@ -2,30 +2,22 @@ ## Model Description -OCNet: Object Context Network , which focuses on enhancing the role of object information. -Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. -We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. -We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. -The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. -We propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. +OCNet (Object Context Network) is an advanced deep learning model for semantic segmentation that emphasizes object-level +context information. It introduces a dense relation matrix to capture pixel relationships within the same object +category, enhancing segmentation accuracy. The model employs an efficient interlaced sparse self-attention mechanism to +model dense relations between pixels, focusing on object boundaries and structures. OCNet's innovative approach to +leveraging object context information makes it particularly effective for complex scene understanding tasks in computer +vision applications. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -46,7 +38,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_ocnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -54,5 +52,5 @@ bash train_ocnet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md b/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md index 066c7ad61..70f3ed840 100644 --- a/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md +++ b/cv/semantic_segmentation/ocrnet/paddlepaddle/README.md @@ -2,26 +2,22 @@ ## Model Description - We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. +OCRNet (Object Contextual Representation Network) is a deep learning model for semantic segmentation that enhances +pixel-level understanding by incorporating object context information. It learns object regions from ground-truth +segmentation and aggregates pixel representations within these regions. By computing relationships between pixels and +object regions, OCRNet augments each pixel's representation with contextual information from relevant objects. This +approach improves segmentation accuracy, particularly in complex scenes, by better capturing object boundaries and +contextual relationships between different image elements. -## Step 1: Installation +## Model Preparation -```bash -git clone -b release/2.8 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.13 -python3 setup.py install +### Prepare Resources -yum install -y mesa-libGL -``` - -## Step 2: Preparing datasets - -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -44,12 +40,26 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +git clone -b release/2.8 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.13 +python3 setup.py install + +yum install -y mesa-libGL +``` + +### Preprocess Data + ```bash # Datasets preprocessing pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 - python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," # CityScapes PATH as follow: @@ -93,9 +103,11 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py \ ``` ## Model Results -| GPU | IPS | ACC | -|:-----------:|:----------------:|:-----------:| -| BI-V100 x 4 | 2.12 samples/sec | mIoU=0.8120 | + +| GPU | IPS | ACC | +|------------|------------------|-------------| +| BI-V100 x4 | 2.12 samples/sec | mIoU=0.8120 | ## References + - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/ocrnet/pytorch/README.md b/cv/semantic_segmentation/ocrnet/pytorch/README.md index 92bc6e887..5aad45ce4 100644 --- a/cv/semantic_segmentation/ocrnet/pytorch/README.md +++ b/cv/semantic_segmentation/ocrnet/pytorch/README.md @@ -1,19 +1,23 @@ # OCRNet -## Model Description +## Model Description -Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation -It presents a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. -First, we learn object regions under the supervision of ground-truth segmentation. -Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. -Last, the representation similarity we compute the relation between each pixel and each object region and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. +OCRNet (Object Contextual Representation Network) is a deep learning model for semantic segmentation that enhances +pixel-level understanding by incorporating object context information. It learns object regions from ground-truth +segmentation and aggregates pixel representations within these regions. By computing relationships between pixels and +object regions, OCRNet augments each pixel's representation with contextual information from relevant objects. This +approach improves segmentation accuracy, particularly in complex scenes, by better capturing object boundaries and +contextual relationships between different image elements. -## Step 1: Installing -### Datasets +## Model Preparation -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +### Prepare Resources -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. + +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -41,28 +45,25 @@ mkdir data/ ln -s /path/to/cityscapes data/cityscapes ``` -### Environment +### Install Dependencies + ```bash bash init.sh ``` -## Step 2: Training -### Train using 1x GPU card +## Model Training + ```bash +# Train using 1x GPU card bash train.sh -``` -### Train using Nx GPU cards -when using 4 x GPU -```bash +# Train using 4x GPU cards bash train_distx4.sh -``` -when using 8 x GPU -```bash +# Training using 8 x GPU bash train_distx8.sh ``` ## References -Ref: https://github.com/HRNet/HRNet-Semantic-Segmentation +- [HRNet-Semantic-Segmentation](https://github.com/HRNet/HRNet-Semantic-Segmentation) diff --git a/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md b/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md index f33c35911..415bb4b00 100644 --- a/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md +++ b/cv/semantic_segmentation/pp_humansegv1/paddlepaddle/README.md @@ -2,35 +2,24 @@ ## Model Description -Human segmentation is a high-frequency application in the field of image segmentation. -Generally, human segmentation can be classified as portrait segmentation and general human segmentation. +PP-HumanSegV1 is an efficient deep learning model for human segmentation, specializing in both portrait and general +human segmentation tasks. Developed by PaddleSeg, it offers excellent accuracy, fast inference speed, and robust +performance across various scenarios. The model supports zero-cost deployment for immediate use in products and allows +fine-tuning for enhanced performance. PP-HumanSegV1 is particularly valuable for applications like video background +replacement, portrait snapshot, and barrage penetration, providing high-quality segmentation results with minimal +computational requirements. -For portrait segmentation and general human segmentation, PaddleSeg releases the PP-HumanSeg models, which have **good performance in accuracy, inference speed, and robustness**. Besides, we can deploy PP-HumanSeg models to products without training -Besides, PP-HumanSeg models can be deployed to products at zero cost, and it also supports fine-tuning to achieve better performance. +## Model Preparation -The following are demonstration videos (due to the video being large, the loading will be slightly slow). We provide full-process application guides from training to deployment, as well as video streaming segmentation and background replacement tutorials. Based on Paddle.js, you can experience the effects of [Portrait Snapshot](https://paddlejs.baidu.com/humanseg), [Video Background Replacement and Barrage Penetration](https://www.paddlepaddle.org.cn/paddlejs). +### Prepare Resources -PP-HumanSegV1-Lite portrait segmentation model: It has good performance in accuracy and model size and the model architecture in [url](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/configs/pp_humanseg_lite). - -## Step 1: Installation - -```bash -git clone -b develop https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -python3 setup.py develop -``` - -## Step 2: Preparing datasets - -Go to visit [PP-HumanSeg14K official website](https://paperswithcode.com/dataset/pp-humanseg14k), then download the PP-HumanSeg14K dataset, or you can download via [Baidu Netdisk](https://pan.baidu.com/s/1Buy74e5ymu2vXYlYfGvBHg) password: vui7 , [Google Cloud Disk](https://drive.google.com/file/d/1eEIV9lM2Kl1Ejcj3Cuht8EHN5eNF8Zjn/view?usp=sharing) +Go to visit [PP-HumanSeg14K official website](https://paperswithcode.com/dataset/pp-humanseg14k), then download the +PP-HumanSeg14K dataset, or you can download via [Baidu Netdisk](https://pan.baidu.com/s/1Buy74e5ymu2vXYlYfGvBHg) +password: vui7 , [Google Cloud Disk](https://drive.google.com/file/d/1eEIV9lM2Kl1Ejcj3Cuht8EHN5eNF8Zjn/view?usp=sharing) The dataset path structure should look like: -``` +```bash PP-HumanSeg14K/ ├── annotations │   ├── train @@ -44,7 +33,18 @@ PP-HumanSeg14K/ └──test.txt └──LICENSE └──README.txt +``` + +### Install Dependencies +```bash +git clone -b develop https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +python3 setup.py develop ``` ## Model Training @@ -65,9 +65,10 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py \ ## Model Results -| GPUS | mIoU | Acc | Kappa |Dice | FPS | -| ---------- | ------ | ------ | ------ | ------ | ----- | +| GPUS | mIoU | Acc | Kappa | Dice | FPS | +|------------|--------|--------|--------|--------|-------| | BI-V100 x8 | 0.9591 | 0.9836 | 0.9581 | 0.9790 | 24.54 | ## References -- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file + +- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md b/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md index c36eac62c..1d26fd47c 100644 --- a/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md +++ b/cv/semantic_segmentation/pp_humansegv2/paddlepaddle/README.md @@ -2,33 +2,23 @@ ## Model Description -Human segmentation is a high-frequency application in the field of image segmentation. -Generally, human segentation can be classified as portrait segmentation and general human segmentation. +PP-HumanSegV2 is an advanced deep learning model for human segmentation, specializing in both portrait and general human +segmentation tasks. Developed by PaddleSeg, it offers improved accuracy, faster inference speed, and enhanced robustness +compared to its predecessor. The model supports zero-cost deployment for immediate use in products and allows +fine-tuning for better performance. PP-HumanSegV2 is particularly effective for applications like video background +replacement and portrait segmentation, delivering high-quality results with optimized computational efficiency. -For portrait segmentation and general human segmentation, PaddleSeg releases the PP-HumanSeg models, which has **good performance in accuracy, inference speed and robustness**. Besides, we can deploy PP-HumanSeg models to products without training -Besides, PP-HumanSeg models can be deployed to products at zero cost, and it also support fine-tuning to achieve better performance. +## Model Preparation -The following is demonstration videos (due to the video is large, the loading will be slightly slow) .We provide full-process application guides from training to deployment, as well as video streaming segmentation and background replacement tutorials. Based on Paddle.js, you can experience the effects of [Portrait Snapshot](https://paddlejs.baidu.com/humanseg), [Video Background Replacement and Barrage Penetration](https://www.paddlepaddle.org.cn/paddlejs). +### Prepare Resources -## Step 1: Installation - -```bash -git clone -b develop https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -python3 setup.py develop -``` - -## Step 2: Preparing datasets - -Go to visit [PP-HumanSeg14K official website](https://paperswithcode.com/dataset/pp-humanseg14k), then download the PP-HumanSeg14K dataset, or you can download via [Baidu Netdisk](https://pan.baidu.com/s/1Buy74e5ymu2vXYlYfGvBHg) password: vui7 , [Google Cloud Disk](https://drive.google.com/file/d/1eEIV9lM2Kl1Ejcj3Cuht8EHN5eNF8Zjn/view?usp=sharing) +Go to visit [PP-HumanSeg14K official website](https://paperswithcode.com/dataset/pp-humanseg14k), then download the +PP-HumanSeg14K dataset, or you can download via [Baidu Netdisk](https://pan.baidu.com/s/1Buy74e5ymu2vXYlYfGvBHg) +password: vui7 , [Google Cloud Disk](https://drive.google.com/file/d/1eEIV9lM2Kl1Ejcj3Cuht8EHN5eNF8Zjn/view?usp=sharing) The dataset path structure sholud look like: -``` +```bash PP-HumanSeg14K/ ├── annotations │   ├── train @@ -45,12 +35,23 @@ PP-HumanSeg14K/ ``` +### Install Dependencies + +```bash +git clone -b develop https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +python3 setup.py develop +``` ## Model Training -```bash -# Change ./contrib/PP-HumanSeg/configs/portrait_pp_humansegv2_lite.yml dataset path as your dateset path +Change ./contrib/PP-HumanSeg/configs/portrait_pp_humansegv2_lite.yml dataset path as your dateset path +```bash # One GPU CUDA_VISIBLE_DEVICES=0 python3 tools/train.py --config contrib/PP-HumanSeg/configs/portrait_pp_humansegv2_lite.yml --save_dir output/human_pp_humansegv2_lite --save_interval 500 --do_eval --use_vdl @@ -64,13 +65,10 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py \ ## Model Results -| MODEL | mIoU |Acc | Kappa |Dice | -| ---------- | ------ |------ |--------|----- | -| pp_humansegv2 | 0.798 |0.9860 |0.9642 |0.9821 | - -| GPUS | FPS | -| ---------- | ------ | -| BI-V100x 8 | 34.0294 | +| MODEL | GPU | mIoU | Acc | Kappa | Dice | FPS | +|---------------|------------|-------|--------|--------|--------|---------| +| PP-HumanSegV2 | BI-V100 x8 | 0.798 | 0.9860 | 0.9642 | 0.9821 | 34.0294 | ## References -- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file + +- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md b/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md index 2efe73901..463d8e4ab 100644 --- a/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md +++ b/cv/semantic_segmentation/pp_liteseg/paddlepaddle/README.md @@ -2,25 +2,17 @@ ## Model Description -PP-LiteSeg is a novel lightweight model for the real-time semantic segmentation task. Specifically, the model presents a Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature representations, this model proposes a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid Pooling Module (SPPM) is proposed to aggregate global context with low computation cost. +PP-LiteSeg is a novel lightweight model for the real-time semantic segmentation task. Specifically, the model presents a +Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature +representations, this model proposes a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and +channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid +Pooling Module (SPPM) is proposed to aggregate global context with low computation cost. -## Step 1: Installation - -``` -git clone -b release/2.8 https://github.com/PaddlePaddle/PaddleSeg.git - -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -yum install mesa-libGL -pip3 install paddleseg - -``` +## Model Preparation +### Prepare Resources -## Step 2: Preparing datasets - -``` +```bash mkdir -p data && cd data wget https://paddleseg.bj.bcebos.com/dataset/cityscapes.tar wget https://paddleseg.bj.bcebos.com/dataset/camvid.tar @@ -34,7 +26,7 @@ rm -rf camvid.tar the unzipped dataset structure sholud look like: -``` +```bash PaddleSeg/data ├── cityscapes │   ├── gtFine @@ -53,9 +45,22 @@ PaddleSeg/data │   └── val.txt ``` -## Model Training +### Install Dependencies + +```bash +git clone -b release/2.8 https://github.com/PaddlePaddle/PaddleSeg.git + +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +yum install mesa-libGL +pip3 install paddleseg ``` + +## Model Training + +```bash cd .. # 8 GPUs @@ -83,9 +88,10 @@ python3 tools/train.py \ ## Model Results -| Method | Backbone | Training Iters | FPS (BI x 8) | mIOU | -| ------ | --------- | ------ | -------- |--------------:| -| PP-LiteSeg-T | STDC1 | 160000 | 28.8 | 73.19% | +| Method | Backbone | Training Iters | FPS (BI x 8) | mIOU | +|--------------|----------|----------------|--------------|--------| +| PP-LiteSeg-T | STDC1 | 160000 | 28.8 | 73.19% | ## References -- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file + +- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/psanet/pytorch/README.md b/cv/semantic_segmentation/psanet/pytorch/README.md index 41c5c74b3..18c73228f 100644 --- a/cv/semantic_segmentation/psanet/pytorch/README.md +++ b/cv/semantic_segmentation/psanet/pytorch/README.md @@ -2,28 +2,21 @@ ## Model Description -The point-wise spatial attention network (PSANet) to relax the local neighborhood constraint. -Each position on the feature map is connected to all the other ones through a self-adaptively learned attention mask. -Moreover, information propagation in bi-direction for scene parsing is enabled. -Information at other positions can be collected to help the prediction of the current position and vice versa, information at the current position can be distributed to assist the prediction of other ones. +The point-wise spatial attention network (PSANet) to relax the local neighborhood constraint. Each position on the +feature map is connected to all the other ones through a self-adaptively learned attention mask. Moreover, information +propagation in bi-direction for scene parsing is enabled. Information at other positions can be collected to help the +prediction of the current position and vice versa, information at the current position can be distributed to assist the +prediction of other ones. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,7 +37,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_psanet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -52,5 +51,5 @@ bash train_psanet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/ycszen/TorchSeg -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [TorchSeg](https://github.com/ycszen/TorchSeg) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/pspnet/pytorch/README.md b/cv/semantic_segmentation/pspnet/pytorch/README.md index a60120902..7606a31ad 100644 --- a/cv/semantic_segmentation/pspnet/pytorch/README.md +++ b/cv/semantic_segmentation/pspnet/pytorch/README.md @@ -2,32 +2,21 @@ ## Model Description -An effective pyramid scene parsing network for complex scene understanding. The global pyramid pooling feature provides additional contextual information. -PSPNet provides a superior framework for pixellevel prediction. The proposed approach achieves state-ofthe-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes. +An effective pyramid scene parsing network for complex scene understanding. The global pyramid pooling feature provides +additional contextual information. PSPNet provides a superior framework for pixellevel prediction. The proposed approach +achieves state-ofthe-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL +VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC +2012 and accuracy 80.2% on Cityscapes. ## Model Preparation -### Install Dependencies - -```shell -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install mmsegmentation -git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 -cd mmsegmentation/ -pip install -v -e . +### Prepare Resources -pip install ftfy -``` - -### Prepare datasets -Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. -Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure sholud look like: +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: ```bash cityscapes/ @@ -55,17 +44,35 @@ mkdir data/ ln -s /path/to/cityscapes data/cityscapes ``` -## Step 2: Training -### Training on single card +### Install Dependencies + ```shell -python3 tools/train.py configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install mmsegmentation +git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 +cd mmsegmentation/ +pip install -v -e . + +pip install ftfy ``` -### Training on mutil-cards +## Model Training + + ```shell +# Training on single card +python3 tools/train.py configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py + +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/pspnet/pspnet_r50-d8_4xb2-40k_cityscapes-512x1024.py 8 ``` ## References -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) + +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/refinenet/pytorch/README.md b/cv/semantic_segmentation/refinenet/pytorch/README.md index 0b9eb4ae8..164645f25 100644 --- a/cv/semantic_segmentation/refinenet/pytorch/README.md +++ b/cv/semantic_segmentation/refinenet/pytorch/README.md @@ -2,28 +2,22 @@ ## Model Description -RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. -In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. -The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. -Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. +RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the +down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the +deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier +convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, +which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich +background context in an efficient manner. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,7 +38,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_refinenet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -52,5 +52,5 @@ bash train_refinenet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/segnet/pytorch/README.md b/cv/semantic_segmentation/segnet/pytorch/README.md index 12f58d364..b6bf7f1cd 100644 --- a/cv/semantic_segmentation/segnet/pytorch/README.md +++ b/cv/semantic_segmentation/segnet/pytorch/README.md @@ -2,30 +2,23 @@ ## Model Description -SegNet is a semantic segmentation model. -This core trainable segmentation architecture consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. -The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network. -The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. -The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature maps. -Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. +SegNet is a semantic segmentation model. This core trainable segmentation architecture consists of an encoder network, a +corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is +topologically identical to the 13 convolutional layers in the VGG16 network. The role of the decoder network is to map +the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty +of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature maps. Specifically, +the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear +upsampling. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training +### Prepare Resources -### Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -46,7 +39,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_segnet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -54,4 +53,4 @@ bash train_segnet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/stdc/paddlepaddle/README.md b/cv/semantic_segmentation/stdc/paddlepaddle/README.md index 5851beede..134b1de2b 100644 --- a/cv/semantic_segmentation/stdc/paddlepaddle/README.md +++ b/cv/semantic_segmentation/stdc/paddlepaddle/README.md @@ -2,23 +2,16 @@ ## Model Description -A novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension -of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC -network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, -the low-level features and deep features are fused to predict the final segmentation results. +STDC (Short-Term Dense Concatenate network) is an efficient deep learning model for semantic segmentation that reduces +structural redundancy while maintaining high performance. It gradually decreases feature map dimensions and aggregates +them for image representation. The model incorporates a Detail Aggregation module in the decoder to enhance spatial +information learning in low-level layers. By fusing both low-level and deep features, STDC achieves accurate +segmentation results with optimized computational efficiency, making it particularly suitable for real-time +applications. -## Step 1: Installation +## Model Preparation -```bash -git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -y -``` - -## Step 2: Preparing datasets +### Prepare Resources Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. @@ -45,19 +38,32 @@ cityscapes/ └── munster ``` +### Install Dependencies + +```bash +git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL -y +``` + +### Preprocess Data + ```bash # Datasets preprocessing pip3 install cityscapesscripts python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 - python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," ``` ## Model Training +Change '/path/to/cityscapes' as your local Cityscapes dataset path. + ```bash -# Change '/path/to/cityscapes' as your local Cityscapes dataset path data_dir=/path/to/cityscapes sed -i "s#: data/cityscapes#: ${data_dir}#g" configs/_base_/cityscapes.yml export FLAGS_cudnn_exhaustive_search=True @@ -77,10 +83,11 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 train.py \ ## Model Results -| GPUs | Crop Size | Lr schd | FPS | mIoU | -| ------ | --------- | ------: | -------- |--------------:| -| BI-V100 x8 | 512x1024 | 80000 | 14.36 | 0.7466 | +| GPUs | Crop Size | Lr schd | FPS | mIoU | +|------------|-----------|--------:|-------|--------| +| BI-V100 x8 | 512x1024 | 80000 | 14.36 | 0.7466 | ## References + - [cityscapes](https://mmsegmentation.readthedocs.io/en/latest/dataset_prepare.html#cityscapes) - [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/stdc/pytorch/README.md b/cv/semantic_segmentation/stdc/pytorch/README.md index ba05e33a9..6343c3bba 100644 --- a/cv/semantic_segmentation/stdc/pytorch/README.md +++ b/cv/semantic_segmentation/stdc/pytorch/README.md @@ -2,31 +2,16 @@ ## Model Description -We propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension -of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC -network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, -the low-level features and deep features are fused to predict the final segmentation results. +STDC (Short-Term Dense Concatenate network) is an efficient deep learning model for semantic segmentation that reduces +structural redundancy while maintaining high performance. It gradually decreases feature map dimensions and aggregates +them for image representation. The model incorporates a Detail Aggregation module in the decoder to enhance spatial +information learning in low-level layers. By fusing both low-level and deep features, STDC achieves accurate +segmentation results with optimized computational efficiency, making it particularly suitable for real-time +applications. -## Step 1: Installation +## Model Preparation -### Install packages - -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install mmsegmentation -git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 -cd mmsegmentation/ -pip install -v -e . - -pip install ftfy -``` - -## Step 2: Preparing datasets +### Prepare Resources Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the Cityscapes dataset. @@ -58,24 +43,40 @@ mkdir -p data/ ln -s /path/to/cityscapes data/ ``` +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install mmsegmentation +git clone -b v1.2.2 https://github.com/open-mmlab/mmsegmentation.git --depth=1 +cd mmsegmentation/ +pip install -v -e . + +pip install ftfy +``` + ## Model Training -### Training on single card ```shell +# Training on single card python3 tools/train.py configs/stdc/stdc1_4xb12-80k_cityscapes-512x1024.py -``` -### Training on mutil-cards -```shell +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/stdc/stdc1_4xb12-80k_cityscapes-512x1024.py 8 ``` ## Model Results -| GPUs | Crop Size | Lr schd | FPS | mIoU | -| ------ | --------- | ------: | -------- |--------------:| -| BI-V100 x8 | 512x1024 | 20000 | 39.38 | 70.74 | +| GPUs | Crop Size | Lr schd | FPS | mIoU | +|------------|-----------|--------:|-------|-------| +| BI-V100 x8 | 512x1024 | 20000 | 39.38 | 70.74 | ## References -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) + +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/unet++/pytorch/README.md b/cv/semantic_segmentation/unet++/pytorch/README.md index 4cebd7b99..07989de11 100644 --- a/cv/semantic_segmentation/unet++/pytorch/README.md +++ b/cv/semantic_segmentation/unet++/pytorch/README.md @@ -1,18 +1,32 @@ -# UNet++: A Nested U-Net Architecture for Medical Image Segmentation +# UNet++ ## Model Description -We present UNet++, a new, more powerful architecture for medical image segmentation. Our architecture is essentially -a deeply supervised encoder-decoder network where the encoder and decoder sub-networks are connected through a series of nested, dense skip -pathways. The re-designed skip pathways aim at reducing the semantic -gap between the feature maps of the encoder and decoder sub-networks. -We argue that the optimizer would deal with an easier learning task when -the feature maps from the decoder and encoder networks are semantically -similar. +UNet++ is an advanced deep learning architecture for medical image segmentation, featuring a deeply supervised +encoder-decoder network with nested, dense skip pathways. These redesigned connections reduce the semantic gap between +encoder and decoder feature maps, making the optimization task easier. By enhancing feature map similarity across +network levels, UNet++ improves segmentation accuracy, particularly in complex medical imaging tasks. Its architecture +effectively handles the challenges of precise boundary detection and small object segmentation in medical images. -## Step 1: Installation +## Model Preparation -### Install packages +### Prepare Resources + +If there is `DRIVE` dataset locally + +```bash +mkdir -p data/ +ln -s ${DRIVE_DATASET_PATH} data/ +``` + +If there is no `DRIVE` dataset locally, you can download `DRIVE` from a file server or +[DRIVE](https://drive.grand-challenge.org/) official website + +```bash +python3 tools/convert_datasets/drive.py /path/to/training.zip /path/to/test.zip +``` + +### Install Dependencies ```bash # Install libGL @@ -29,39 +43,23 @@ pip install -v -e . pip install ftfy ``` -## Step 2: Prepare datasets - -If there is `DRIVE` dataset locally - -```bash -mkdir -p data/ -ln -s ${DRIVE_DATASET_PATH} data/ -``` - -If there is no `DRIVE` dataset locally, you can download `DRIVE` from a file server or -[DRIVE](https://drive.grand-challenge.org/) official website - -```bash -python3 tools/convert_datasets/drive.py /path/to/training.zip /path/to/test.zip -``` - ## Model Training -### Training on single card + ```shell +# Training on single card python3 tools/train.py configs/unet/unet-s5-d16_pspnet_4xb4-40k_drive-64x64.py -``` -### Training on mutil-cards -```shell +# Training on mutil-cards sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/unet/unet-s5-d16_pspnet_4xb4-40k_drive-64x64.py 8 ``` ## Model Results -| GPUs| Crop Size | Lr schd | FPS | mDice | -| ------ | --------- | ------: | -------- |--------------:| -| BI-V100 x8 | 64x64 | 40000 | 238.9 | 87.52 | +| GPU | Crop Size | Lr schd | FPS | mDice | +|------------|-----------|---------|-------|-------| +| BI-V100 x8 | 64x64 | 40000 | 238.9 | 87.52 | ## References -[mmsegmentation](https://github.com/open-mmlab/mmsegmentation) + +- [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) diff --git a/cv/semantic_segmentation/unet/paddlepaddle/README.md b/cv/semantic_segmentation/unet/paddlepaddle/README.md index b2644e928..bb2d22014 100644 --- a/cv/semantic_segmentation/unet/paddlepaddle/README.md +++ b/cv/semantic_segmentation/unet/paddlepaddle/README.md @@ -1,17 +1,12 @@ # UNet + ## Model Description -A network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. -The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. -## Step 1: Installing -``` -git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git -cd PaddleSeg -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -``` +A network and training strategy that relies on the strong use of data augmentation to use the available annotated +samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding +path that enables precise localization. + +## Model Preparation ### Prepare Resources @@ -40,38 +35,32 @@ cityscapes/ └── munster ``` -Datasets preprocessing: +### Install Dependencies ```bash -pip3 install cityscapesscripts - -python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 - -python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," +git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleSeg.git +cd PaddleSeg +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL ``` -then the cityscapes path as follows: +### Preprocess Data ```bash -root@5574247e63f8:~# ls -al /path/to/cityscapes -total 11567948 -drwxr-xr-x 4 root root 227 Jul 18 03:32 . -drwxr-xr-x 6 root root 179 Jul 18 06:48 .. --rw-r--r-- 1 root root 298 Feb 20 2016 README -drwxr-xr-x 5 root root 58 Jul 18 03:30 gtFine --rw-r--r-- 1 root root 252567705 Jul 18 03:22 gtFine_trainvaltest.zip -drwxr-xr-x 5 root root 58 Jul 18 03:30 leftImg8bit --rw-r--r-- 1 root root 11592327197 Jul 18 03:27 leftImg8bit_trainvaltest.zip --rw-r--r-- 1 root root 1646 Feb 17 2016 license.txt --rw-r--r-- 1 root root 193690 Jul 18 03:32 test.txt --rw-r--r-- 1 root root 398780 Jul 18 03:32 train.txt --rw-r--r-- 1 root root 65900 Jul 18 03:32 val.txt +pip3 install cityscapesscripts + +python3 tools/data/convert_cityscapes.py --cityscapes_path /path/to/cityscapes --num_workers 8 +python3 tools/data/create_dataset_list.py /path/to/cityscapes --type cityscapes --separator "," ``` ## Model Training + Notice: modify configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml file, modify the datasets path as yours. The training is use AMP model. -``` + +```bash cd PaddleSeg export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True @@ -86,4 +75,5 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3 train.py \ ``` ## References -- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) \ No newline at end of file + +- [PaddleSeg](https://github.com/PaddlePaddle/PaddleSeg) diff --git a/cv/semantic_segmentation/unet/pytorch/README.md b/cv/semantic_segmentation/unet/pytorch/README.md index 386028af1..eb00fc6cb 100644 --- a/cv/semantic_segmentation/unet/pytorch/README.md +++ b/cv/semantic_segmentation/unet/pytorch/README.md @@ -2,27 +2,19 @@ ## Model Description -A network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. -The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. - +A network and training strategy that relies on the strong use of data augmentation to use the available annotated +samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding +path that enables precise localization. ## Model Preparation -### Install Dependencies - -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Training - -### Preparing datasets +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -43,7 +35,13 @@ coco2017 └── ... ``` -### Training on COCO dataset +### Install Dependencies + +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` + +## Model Training ```shell bash train_unet_dist.sh --data-path /path/to/coco2017/ --dataset coco @@ -51,5 +49,5 @@ bash train_unet_dist.sh --data-path /path/to/coco2017/ --dataset coco ## References -Ref: https://github.com/LikeLy-Journey/SegmenTron -Ref: [torchvision](../../torchvision/pytorch/README.md) +- [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) +- [torchvision](../../torchvision/pytorch/README.md) diff --git a/cv/semantic_segmentation/unet3d/pytorch/README.md b/cv/semantic_segmentation/unet3d/pytorch/README.md index a5e180cce..88f1f3bd8 100644 --- a/cv/semantic_segmentation/unet3d/pytorch/README.md +++ b/cv/semantic_segmentation/unet3d/pytorch/README.md @@ -2,22 +2,15 @@ ## Model Description -A network for volumetric segmentation that learns from sparsely annotated volumetric images. -Two attractive use cases of this method: -(1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. -(2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. -The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. -The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. +3D-UNet is a deep learning model designed for volumetric image segmentation, extending the traditional 2D U-Net +architecture to three dimensions. It effectively processes 3D medical imaging data like CT and MRI scans, learning from +sparsely annotated volumes to produce dense 3D segmentations. The model supports both semi-automated and fully-automated +segmentation workflows, incorporating on-the-fly elastic deformations for efficient data augmentation. 3D-UNet is +particularly valuable in medical imaging for tasks requiring precise 3D anatomical structure delineation. ## Model Preparation -### Install Dependencies - -```shell -pip3 install 'scipy' 'tqdm' -``` - -### Prepare datasets +### Prepare Resources if there is local 'kits19' dataset: @@ -31,31 +24,33 @@ else: bash prepare.sh ``` -## Step 2: Training - -### Single GPU +### Install Dependencies ```shell -bash train.sh +pip3 install 'scipy' 'tqdm' ``` -### Multi GPU +## Model Training ```shell +# Single GPU +bash train.sh + +# Multi GPU export NUM_GPUS=8 bash train_dist.sh ``` -## Results on BI-V100 +## Model Results -| GPUs | FP16 | FPS | Mean dice | -| ------ | ------- | ------- | ----------- | -| 1x8 | False | 11.52 | 0.908 | +| GPU | FP16 | FPS | Mean dice | +|------------|-------|-------|-----------| +| BI-V100 x8 | False | 11.52 | 0.908 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | -| ---------------------- | ------------------------------------------ | ------------- | ---------- | ------------ | ------------- | ------------------------- | ----------- | -| 0.908 | SDK V2.2, bs:4, 8x, fp32 | 12 | 0.908 | 152\*8 | 0.85 | 19.6\*8 | 1 | +|----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------| +| 0.908 | SDK V2.2, bs:4, 8x, fp32 | 12 | 0.908 | 152\*8 | 0.85 | 19.6\*8 | 1 | ## References -Reference: https://github.com/mlcommons/training/tree/master/image_segmentation/pytorch +- [mlcommons](https://github.com/mlcommons/training/tree/master/image_segmentation/pytorch) diff --git a/cv/semantic_segmentation/vnet/tensorflow/README.md b/cv/semantic_segmentation/vnet/tensorflow/README.md index e198ed0b8..e719fb849 100644 --- a/cv/semantic_segmentation/vnet/tensorflow/README.md +++ b/cv/semantic_segmentation/vnet/tensorflow/README.md @@ -1,33 +1,39 @@ # VNet ## Model Description -[V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation](https://arxiv.org/abs/1606.04797) Fausto Milletari, Nassir Navab, Seyed-Ahmad Ahmadi -## Prepare -``` -pip3 install -r requirements.txt -``` +VNet is a fully convolutional neural network specifically designed for volumetric medical image segmentation. It extends +traditional 2D segmentation approaches to 3D, effectively processing volumetric data like CT and MRI scans. The +architecture incorporates residual connections and volumetric convolutions to capture spatial context in three +dimensions. VNet's innovative design enables precise segmentation of complex anatomical structures, making it +particularly valuable in medical imaging tasks such as organ segmentation and tumor detection in volumetric datasets. -## Download dataset -``` +## Model Preparation + +### Prepare Resources + +```bash python3 download_dataset.py --data_dir ./data ``` +### Install Dependencies -## Run training -### single card +```bash +pip3 install -r requirements.txt ``` + +## Model Training + +```bash +# single card python3 examples/vnet_train_and_evaluate.py --gpus 1 --batch_size 8 --base_lr 0.0001 --data_dir ./data/Task04_Hippocampus/ --model_dir ./model_train/ -``` -### 8 cards -``` +# 8 cards python3 examples/vnet_train_and_evaluate.py --gpus 8 --batch_size 8 --base_lr 0.0001 --data_dir ./data/Task04_Hippocampus/ --model_dir ./model_train/ - ``` -## Result +## Model Results -| | background_dice | anterior_dice | posterior_dice | -| --- | --- | --- | --- | -| multi_card | 0.9912699 | 0.83743376 | 0.81537557 | +| Model | Type | background_dice | anterior_dice | posterior_dice | +|-------|------------|-----------------|---------------|----------------| +| VNet | multi_card | 0.9912699 | 0.83743376 | 0.81537557 | diff --git a/docs/MODEL_TEMPLATE.md b/docs/MODEL_TEMPLATE.md index a97bd7d65..45c826646 100644 --- a/docs/MODEL_TEMPLATE.md +++ b/docs/MODEL_TEMPLATE.md @@ -23,7 +23,7 @@ python3 dataset/coco/download_coco.py Go to huggingface. -## Install Dependencies +### Install Dependencies ```bash pip install -r requirements.txt -- Gitee