From e60957cb30d9568938068948c61b29b1b2d07b87 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Mon, 10 Mar 2025 15:02:53 +0800
Subject: [PATCH 1/9] unify model readme format - cv/pose

---
 cv/pose/alphapose/pytorch/README.md           | 67 ++++++++-------
 cv/pose/hrnet/paddlepaddle/README.md          | 59 +++++++------
 cv/pose/hrnet/pytorch/README.md               | 47 +++++-----
 cv/pose/openpose/mindspore/README.md          | 85 ++++++++++---------
 .../mae/pytorch/README.md                     | 51 ++++++-----
 5 files changed, 169 insertions(+), 140 deletions(-)

diff --git a/cv/pose/alphapose/pytorch/README.md b/cv/pose/alphapose/pytorch/README.md
index a85b2257f..20ebc61f8 100755
--- a/cv/pose/alphapose/pytorch/README.md
+++ b/cv/pose/alphapose/pytorch/README.md
@@ -1,34 +1,22 @@
 
 # AlphaPose
 
-## Model description
+## Model Description
 
-AlphaPose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (75 mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset.
+AlphaPose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (75
+mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across
+frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose
+tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# install libGL
-yum install mesa-libGL
+### Prepare Resources
 
-# install zlib
-wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
-tar xvf zlib-1.2.9.tar.gz
-cd zlib-1.2.9/
-./configure && make install
-cd ..
-rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-# install requirements
-pip3 install seaborn pandas pycocotools matplotlib
-pip3 install easydict tensorboardX opencv-python
-```
-
-## Step 2: Preparing datasets
-
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -49,7 +37,26 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# install libGL
+yum install mesa-libGL
+
+# install zlib
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+cd ..
+rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
+
+# install requirements
+pip3 install seaborn pandas pycocotools matplotlib
+pip3 install easydict tensorboardX opencv-python
+```
+
+## Model Training
 
 ```bash
 # create soft link to coco
@@ -60,10 +67,12 @@ ln -s /path/to/coco2017 /home/datasets/cv/coco
 bash ./scripts/trainval/train.sh ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml 1
 ```
 
-## Results
-| GPUs | FPS | ACC | 
-| ---- | ---- | ---- |
-| BI-V100 x8 | 1.71s/it | acc: 0.8429 |
+## Model Results
+
+| Model     | GPU        | FPS      | ACC         |
+|-----------|------------|----------|-------------|
+| AlphaPose | BI-V100 x8 | 1.71s/it | acc: 0.8429 |
+
+## References
 
-## Reference
 - [AlphaPose](https://github.com/MVIG-SJTU/AlphaPose)
diff --git a/cv/pose/hrnet/paddlepaddle/README.md b/cv/pose/hrnet/paddlepaddle/README.md
index 55bffd1eb..5bb76c392 100644
--- a/cv/pose/hrnet/paddlepaddle/README.md
+++ b/cv/pose/hrnet/paddlepaddle/README.md
@@ -1,26 +1,24 @@
 # HRNet-W32
 
-## Model description
-HRNet (High-Resolution Net) is proposed for the task of 2D human pose estimation (Human Pose Estimation or Keypoint Detection), and the network is mainly aimed at the pose assessment of a single individual. Most existing human pose estimation methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, HRNet maintains high-resolution representations through the whole process. HRNet starts from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. Then, HRNet  conducts repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. 
+## Model Description
 
+HRNet, or High-Resolution Net, is a general purpose convolutional neural network for tasks like semantic segmentation,
+object detection and image classification. It is able to maintain high resolution representations through the whole
+process. We start from a high-resolution convolution stream, gradually add high-to-low resolution convolution streams
+one by one, and connect the multi-resolution streams in parallel. The resulting network consists of several stages and
+the nth stage contains n streams corresponding to n resolutions. The authors conduct repeated multi-resolution fusions
+by exchanging the information across the parallel streams over and over.
 
-## Step 1: Installation
-```
-git clone https://github.com/PaddlePaddle/PaddleDetection.git
-```
+## Model Preparation
 
-```
-cd PaddleDetection
-pip3 install numba==0.56.4
-pip3 install -r requirements.txt
-python3 setup.py install
-```
+### Prepare Resources
 
-## Step 2: Preparing Datasets
-```
+```bash
 python3 dataset/coco/download_coco.py
 ```
+
 The coco2017 dataset path structure should look like:
+
 ```bash
 coco2017
 ├── annotations
@@ -39,31 +37,40 @@ coco2017
 ├── val2017.txt
 └── ...
 ```
-## Step 3: Training
 
-single GPU
+### Install Dependencies
+
+```bash
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection
+pip3 install numba==0.56.4
+pip3 install -r requirements.txt
+python3 setup.py install
 ```
+
+## Model Training
+
+```bash
+# single GPU
 export CUDA_VISIBLE_DEVICES=0
 export FLAGS_cudnn_exhaustive_search=True
 export FLAGS_cudnn_batchnorm_spatial_persistent=True
 python3 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml  --eval -o use_gpu=true
-```
 
-8 GPU(Distributed Training)
-```
+# 8 GPU(Distributed Training)
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 export FLAGS_cudnn_exhaustive_search=True
 export FLAGS_cudnn_batchnorm_spatial_persistent=True
 python3 -m paddle.distributed.launch --log_dir=./log_hrnet_w32_384x288/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml --eval
 ```
-## Results on BI-V100
-<div align="center">
 
-| GPU         | FP32                                               |
-| ----------- | ----------------------------------------------     |
-| 8 cards     | BatchSize=64,AP(coco val)=78.4, single GPU FPS= 45 |
+## Model Results
+
+| GPU        | FP32                                               |
+|------------|----------------------------------------------------|
+| BI-V100 x8 | BatchSize=64,AP(coco val)=78.4, single GPU FPS= 45 |
 
-</div>
+## References
 
-## Reference
 - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
diff --git a/cv/pose/hrnet/pytorch/README.md b/cv/pose/hrnet/pytorch/README.md
index d5b86de4e..02a1f0b93 100644
--- a/cv/pose/hrnet/pytorch/README.md
+++ b/cv/pose/hrnet/pytorch/README.md
@@ -1,16 +1,17 @@
 # HRNet
 
-## Model description
+## Model Description
 
-HRNet, or High-Resolution Net, is a general purpose convolutional neural network for tasks like semantic segmentation, object detection and image classification. It is able to maintain high resolution representations through the whole process. We start from a high-resolution convolution stream, gradually add high-to-low resolution convolution streams one by one, and connect the multi-resolution streams in parallel. The resulting network consists of several stages and the nth stage contains n streams corresponding to n resolutions. The authors conduct repeated multi-resolution fusions by exchanging the information across the parallel streams over and over.
+HRNet, or High-Resolution Net, is a general purpose convolutional neural network for tasks like semantic segmentation,
+object detection and image classification. It is able to maintain high resolution representations through the whole
+process. We start from a high-resolution convolution stream, gradually add high-to-low resolution convolution streams
+one by one, and connect the multi-resolution streams in parallel. The resulting network consists of several stages and
+the nth stage contains n streams corresponding to n resolutions. The authors conduct repeated multi-resolution fusions
+by exchanging the information across the parallel streams over and over.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```shell
-pip3 install -r requirements.txt
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
@@ -35,35 +36,31 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
-
-### On single GPU
+### Install Dependencies
 
 ```shell
-export COCO_DATASET_PATH=/path/to/coco2017
+pip3 install -r requirements.txt
 ```
 
+## Model Training
+
 ```shell
-python3 ./tools/train.py --cfg ./configs/coco/w32_512_adam_lr1e-3.yaml --datadir=${COCO_DATASET_PATH} --max_epochs=2
-```
+export COCO_DATASET_PATH=/path/to/coco2017
 
-### On single GPU (AMP)
+# On single GPU
+python3 ./tools/train.py --cfg ./configs/coco/w32_512_adam_lr1e-3.yaml --datadir=${COCO_DATASET_PATH} --max_epochs=2
 
-```shell
+# On single GPU (AMP)
 python3 ./tools/train.py --cfg ./configs/coco/w32_512_adam_lr1e-3.yaml --datadir=${COCO_DATASET_PATH} --max_epochs=2 --amp
-```
 
-### Multiple GPUs on one machine
+# Multiple GPUs on one machine
 
-```shell
 python3 ./tools/train.py --cfg ./configs/coco/w32_512_adam_lr1e-3.yaml --datadir=${COCO_DATASET_PATH} --max_epochs=2 --dist
-```
-
-### Multiple GPUs on one machine (AMP)
 
-```shell
+# Multiple GPUs on one machine (AMP)
 python3 ./tools/train.py --cfg ./configs/coco/w32_512_adam_lr1e-3.yaml --datadir=${COCO_DATASET_PATH} --max_epochs=2 --amp --dist
 ```
 
-## Reference
-https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation
+## References
+
+- [HigherHRNet-Human-Pose-Estimation](https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation)
diff --git a/cv/pose/openpose/mindspore/README.md b/cv/pose/openpose/mindspore/README.md
index 81bb4d2eb..ed0616724 100644
--- a/cv/pose/openpose/mindspore/README.md
+++ b/cv/pose/openpose/mindspore/README.md
@@ -1,29 +1,22 @@
-# Openpose
+# OpenPose
 
-## Model description
+## Model Description
 
-Openpose network proposes a bottom-up human attitude estimation algorithm using Part Affinity Fields (PAFs). Instead of a top-down algorithm: Detect people first and then return key-points and skeleton. The advantage of openpose is that the computing time does not increase significantly as the number of people in the image increases.However,the top-down algorithm is based on the detection result, and the runtimes grow linearly with the number of people.
+OpenPose is a real-time multi-person 2D pose estimation model that uses a bottom-up approach with Part Affinity Fields
+(PAFs) to detect human body keypoints and their connections. Unlike top-down methods, OpenPose's computational
+efficiency remains stable regardless of the number of people in an image. It simultaneously detects body parts and
+associates them to individuals, making it particularly effective for scenarios with multiple people, such as crowd
+analysis and human-computer interaction applications.
 
-[Paper](https://arxiv.org/abs/1611.08050): Zhe Cao,Tomas Simon,Shih-En Wei,Yaser Sheikh,"Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields",The IEEE Conference on Computer Vision and Pattern Recongnition(CVPR),2017
+## Model Preparation
 
-## Step 1:Installation
+### Prepare Resources
 
-```
-# Pip the requirements
-pip3 install -r requirements.txt
-wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz
-tar xf openmpi-4.0.7.tar.gz
-cd openmpi-4.0.7/
-./configure --prefix=/usr/local/bin --with-orte
-make -j4 && make install
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
-```
-
-## Step 2:Preparing datasets
-
-* Go to visit [COCO official website](https://gitee.com/link?target=https%3A%2F%2Fcocodataset.org%2F%23download), then select the COCO dataset you want to download. Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+- Go to visit [COCO official website](https://gitee.com/link?target=https%3A%2F%2Fcocodataset.org%2F%23download), then
+  select the COCO dataset you want to download. Take coco2017 dataset as an example, specify `/path/to/coco2017` to your
+  COCO path in later training process, the unzipped dataset path structure sholud look like:
 
-  ```
+```bash
   coco2017
   ├── annotations
   │   ├── instances_train2017.json
@@ -40,15 +33,17 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
   ├── train2017.txt
   ├── val2017.txt
   └── ...
-  ```
-* Create the mask dataset. Run python gen_ignore_mask.py
+```
+
+- Create the mask dataset. Run python gen_ignore_mask.py
+
+```bash
+python3 ./src/gen_ignore_mask.py --train_ann ./coco2017/annotations/person_keypoints_train2017.json --val_ann ./coco2017/annotations/person_keypoints_val2017.json --train_dir ./coco2017/train2017 --val_dir ./coco2017/val2017
+```
 
-  ```
-  python3 ./src/gen_ignore_mask.py --train_ann ./coco2017/annotations/person_keypoints_train2017.json --val_ann ./coco2017/annotations/person_keypoints_val2017.json --train_dir ./coco2017/train2017 --val_dir ./coco2017/val2017
-  ```
-* The dataset folder is generated in the root directory and contains the following files:
+- The dataset folder is generated in the root directory and contains the following files:
 
-  ```
+```bash
   ├── coco2017
       ├── annotations
           ├─ person_keypoints_train2017.json
@@ -58,22 +53,36 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
       ├─ train2017
       ├─ val2017
       └─ ...
-  ```
-* Download the VGG19 model of the MindSpore version:
+```
+
+- Download the VGG19 model of the MindSpore version:
 
-  [vgg19-0-97_5004.ckpt](https://download.mindspore.cn/model_zoo/converted_pretrained/vgg/vgg19-0-97_5004.ckpt)
+[vgg19-0-97_5004.ckpt](https://download.mindspore.cn/model_zoo/converted_pretrained/vgg/vgg19-0-97_5004.ckpt)
 
-## Step 3:Training
+### Install Dependencies
+
+```bash
+# Pip the requirements
+pip3 install -r requirements.txt
+wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.7.tar.gz
+tar xf openmpi-4.0.7.tar.gz
+cd openmpi-4.0.7/
+./configure --prefix=/usr/local/bin --with-orte
+make -j4 && make install
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
+```
+
+## Model Training
 
 Change the absolute path of the data in running shell `train_openpose_coco2017_1card.sh`  `train_openpose_coco2017_8card.sh`.
 
 For example in `train_openpose_coco2017_1card.sh`:
 
-```
+```bash
 bash scripts/run_standalone_train.sh /home/coco2017/train2017 /home/coco2017/annotations/person_keypoints_train2017.json /home/coco2017/ignore_mask_train /home/vgg19-0-97_5004.ckpt
 ```
 
-```
+```bash
 # Run on 1 GPU
 bash train_openpose_coco2017_1card.sh
 
@@ -84,12 +93,12 @@ bash train_openpose_coco2017_8card.sh
 python3 eval.py --model_path /home/openpose_train_8gpu_ckpt/0-80_663.ckpt --imgpath_val coco2017/val2017 --ann coco2017/annotations/person_keypoints_val2017.json
 ```
 
-## Results
+## Model Results
 
 | GPUS       | AP     | AP  .5 | AR     | AR  .5 |
-| ---------- | ------ | ------- | ------ | ------- |
-| BI V100×8 | 0.3979 | 0.6654  | 0.4435 | 0.6889  |
+|------------|--------|--------|--------|--------|
+| BI-V100 ×8 | 0.3979 | 0.6654 | 0.4435 | 0.6889 |
 
-## Reference
+## References
 
-[Openpose](https://gitee.com/mindspore/models/tree/master/official/cv/OpenPose)
+- [Openpose](https://gitee.com/mindspore/models/tree/master/official/cv/OpenPose)
diff --git a/cv/self_supervised_learning/mae/pytorch/README.md b/cv/self_supervised_learning/mae/pytorch/README.md
index ff57584ad..685f2d1b2 100644
--- a/cv/self_supervised_learning/mae/pytorch/README.md
+++ b/cv/self_supervised_learning/mae/pytorch/README.md
@@ -1,43 +1,50 @@
-# MAE-pytorch
+# MAE
 
-## Model description
-This repository is built upon BEiT, an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners. We implement the pretrain and finetune process according to the paper, but still can't guarantee the performance reported in the paper can be reproduced!
+## Model Description
 
-## Environment
+MAE (Masked Autoencoders) is a self-supervised learning model for computer vision that learns powerful representations
+by reconstructing randomly masked portions of input images. It employs an asymmetric encoder-decoder architecture where
+the encoder processes only the visible patches, and the lightweight decoder reconstructs the original image from the
+latent representation and mask tokens. MAE demonstrates that high masking ratios (e.g., 75%) can lead to robust feature
+learning, making it scalable and effective for various downstream vision tasks.
 
-```
-pip3 install -r requirements.txt
-mkdir -p /home/datasets/cv/ImageNet_ILSVRC2012
-mkdir -p pretrain
-mkdir -p output
-```
+## Model Preparation
 
-## Download dataset
+### Prepare Resources
 
-```
+Download dataset.
+
+```bash
 cd /home/datasets/cv/ImageNet_ILSVRC2012
 Download the [ImageNet Dataset](https://www.image-net.org/download.php)
 ```
 
-## Download pretrain weight
+Download pretrain weight
 
-```
+```bash
 cd pretrain
 Download the [pretrain_mae_vit_base_mask_0.75_400e.pth](https://drive.google.com/drive/folders/182F5SLwJnGVngkzguTelja4PztYLTXfa)
 ```
 
-## Finetune
+### Install Dependencies
 
+```bash
+pip3 install -r requirements.txt
+mkdir -p /home/datasets/cv/ImageNet_ILSVRC2012
+mkdir -p pretrain
+mkdir -p output
 ```
+
+## Model Training
+
+```bash
+# Finetune
 cd ..
 bash run.sh
 ```
 
-## Results on BI-V100
-
-```
-| GPUs | FPS | Train Epochs | Accuracy  |
-|------|-----|--------------|------|
-| 1x8  | 1233 | 100           | 82.9% |
-```
+## Model Results
 
+| GPU        | FPS  | Train Epochs | Accuracy |
+|------------|------|--------------|----------|
+| BI-V100 x8 | 1233 | 100          | 82.9%    |
-- 
Gitee


From 660d96efcb1c8b7e767d92eac7172723f7040f51 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Mon, 10 Mar 2025 15:48:51 +0800
Subject: [PATCH 2/9] unify model readme format - cv/ocr

---
 cv/ocr/README.md                              |  1 -
 cv/ocr/crnn/mindspore/README.md               | 90 +++++++++++--------
 cv/ocr/crnn/paddlepaddle/README.md            | 34 ++++---
 cv/ocr/dbnet/pytorch/README.md                | 80 ++++++++++-------
 .../pytorch/configs/textdet/dbnet/README.md   |  2 +-
 cv/ocr/dbnetpp/paddlepaddle/README.md         | 61 +++++++------
 cv/ocr/dbnetpp/pytorch/README.md              | 39 ++++----
 cv/ocr/pp-ocr-db/paddlepaddle/README.md       | 29 ++++--
 cv/ocr/pp-ocr-east/paddlepaddle/README.md     | 44 +++++----
 cv/ocr/pse/paddlepaddle/README.md             | 30 ++++---
 cv/ocr/sar/pytorch/README.md                  | 49 +++++-----
 cv/ocr/sast/paddlepaddle/README.md            | 44 +++++----
 cv/ocr/satrn/pytorch/base/README.md           | 78 ++++++++--------
 cv/point_cloud/point-bert/pytorch/README.md   | 76 +++++++++-------
 14 files changed, 376 insertions(+), 281 deletions(-)
 delete mode 100644 cv/ocr/README.md

diff --git a/cv/ocr/README.md b/cv/ocr/README.md
deleted file mode 100644
index c92aa3bdd..000000000
--- a/cv/ocr/README.md
+++ /dev/null
@@ -1 +0,0 @@
-# Optical Character Recognition
diff --git a/cv/ocr/crnn/mindspore/README.md b/cv/ocr/crnn/mindspore/README.md
index 7ddd2025b..649384425 100644
--- a/cv/ocr/crnn/mindspore/README.md
+++ b/cv/ocr/crnn/mindspore/README.md
@@ -1,12 +1,51 @@
 # CRNN
 
-## Model description
+## Model Description
 
-CRNN was a neural network for image based sequence recognition and its Application to scene text recognition.In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios.
+CRNN (Convolutional Recurrent Neural Network) is an end-to-end trainable model for image-based sequence recognition,
+particularly effective for scene text recognition. It combines convolutional layers for feature extraction with
+recurrent layers for sequence modeling, followed by a transcription layer. CRNN handles sequences of arbitrary lengths
+without character segmentation or horizontal scaling, making it versatile for both lexicon-free and lexicon-based text
+recognition tasks. Its compact architecture and unified framework make it practical for real-world applications like
+document analysis and OCR.
 
-[Paper](https://arxiv.org/abs/1507.05717): Baoguang Shi, Xiang Bai, Cong Yao, "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition", ArXiv, vol. abs/1507.05717, 2015.
+## Model Preparation
 
-## Step 1:Installation
+### Prepare Resources
+
+- Go to visit [Syn90K official website](https://www.robots.ox.ac.uk/~vgg/data/text/), then download the dataset for
+  training. The dataset path structure sholud look like:
+
+```bash
+  ├── Syn90k
+  │   ├── shuffle_labels.txt
+  │   ├── label.txt
+  │   ├── label.lmdb
+  │   ├── mnt
+```
+
+- Go to visit [IIIT5K official
+  website](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset), then download the dataset
+  for test. The dataset path structure sholud look like:
+
+```bash
+  ├── IIIT5K
+  │   ├── traindata.mat
+  │   ├── testdata.mat
+  │   ├── trainCharBound.mat
+  │   ├── testCharBound.mat
+  │   ├── lexicon.txt
+  │   ├── train
+  │   ├── test
+```
+
+- The annotation need to be extracted from the matlib data file.
+
+```bash
+python3 convert_iiit5k.py -m ./IIIT5K/testdata.mat -o ./IIIT5K -a ./IIIT5K/annotation.txt
+```
+
+### Install Dependencies
 
 ```shell
 # Install requirements
@@ -21,34 +60,7 @@ make -j4 && make install
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
 ```
 
-## Step 2:Preparing datasets
-
-* Go to visit [Syn90K official website](https://www.robots.ox.ac.uk/~vgg/data/text/), then download the dataset for training. The dataset path structure sholud look like:
-
-  ```
-  ├── Syn90k
-  │   ├── shuffle_labels.txt
-  │   ├── label.txt
-  │   ├── label.lmdb
-  │   ├── mnt
-  ```
-* Go to visit [IIIT5K official website](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset), then download the dataset for test. The dataset path structure sholud look like:
-
-```
-├── IIIT5K
-│   ├── traindata.mat
-│   ├── testdata.mat
-│   ├── trainCharBound.mat
-│   ├── testCharBound.mat
-│   ├── lexicon.txt
-│   ├── train
-│   ├── test
-```
-* The annotation need to be extracted from the matlib data file.
-```
-python3 convert_iiit5k.py -m ./IIIT5K/testdata.mat -o ./IIIT5K -a ./IIIT5K/annotation.txt
-```
-## Step 3:Training
+## Model Training
 
 ```bash
 # Run on 1 GPU
@@ -60,17 +72,17 @@ python3 train.py --train_dataset=synth --train_dataset_path=./Syn90k/mnt/ramdisk
 
 # Run eval
 python3 eval.py --eval_dataset=iiit5k \
---eval_dataset_path=./IIIT5K/ \
+--eval_dataset_path=./IIIT5K/ \ 
 --checkpoint_path=./ckpt_0/crnn-10_14110.ckpt \
 --device_target=GPU 2>&1 | tee eval.log
 ```
 
-## Results
+## Model Results
 
-| GPUS       | DATASETS   | ACC     |  FPS    | 
-| ---------- | ---------- | ------  | ------  |
-| BI-V100 x8 | IIIT5K     | 0.798   | 7976.44 |
+| GPUS       | DATASETS | ACC   | FPS     |
+|------------|----------|-------|---------|
+| BI-V100 x8 | IIIT5K   | 0.798 | 7976.44 |
 
-## Reference
+## References
 
-[CRNN](https://gitee.com/mindspore/models/tree/master/official/cv/CRNN)
+- [CRNN](https://gitee.com/mindspore/models/tree/master/official/cv/CRNN)
diff --git a/cv/ocr/crnn/paddlepaddle/README.md b/cv/ocr/crnn/paddlepaddle/README.md
index 43f568a35..6c9b8b167 100644
--- a/cv/ocr/crnn/paddlepaddle/README.md
+++ b/cv/ocr/crnn/paddlepaddle/README.md
@@ -1,28 +1,40 @@
 # CRNN
 
+## Model Description
 
-## Step 1: Installing
-```
+CRNN (Convolutional Recurrent Neural Network) is an end-to-end trainable model for image-based sequence recognition,
+particularly effective for scene text recognition. It combines convolutional layers for feature extraction with
+recurrent layers for sequence modeling, followed by a transcription layer. CRNN handles sequences of arbitrary lengths
+without character segmentation or horizontal scaling, making it versatile for both lexicon-free and lexicon-based text
+recognition tasks. Its compact architecture and unified framework make it practical for real-world applications like
+document analysis and OCR.
+
+## Model Preparation
+
+### Prepare Resources
+
+Download [data_lmdb_release](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here).
+
+### Install Dependencies
+
+```bash
 git clone https://github.com/PaddlePaddle/PaddleOCR.git
-```
 
-```
-cd PaddleOCR
+cd PaddleOCR/
 pip3 install -r requirements.txt
 ```
 
-## Step 2: Prepare Datasets
-Download [data_lmdb_release](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here).
+## Model Training
 
-## Step 3: Training
 Notice: modify configs/rec/rec_mv3_none_bilstm_ctc.yml file, modify the datasets path as yours.
-```
-cd PaddleOCR
+
+```bash
 export FLAGS_cudnn_exhaustive_search=True
 export FLAGS_cudnn_batchnorm_spatial_persistent=True
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7  tools/train.py -c configs/rec/rec_mv3_none_bilstm_ctc.yml Global.use_visualdl=True
 ```
 
-## Reference
+## References
+
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
diff --git a/cv/ocr/dbnet/pytorch/README.md b/cv/ocr/dbnet/pytorch/README.md
index 6f0bf3b73..8202dc43e 100755
--- a/cv/ocr/dbnet/pytorch/README.md
+++ b/cv/ocr/dbnet/pytorch/README.md
@@ -1,16 +1,33 @@
 # DBNet
 
-## Model description
-Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency.
-## Step 2: Preparing datasets
+## Model Description
+
+Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more
+accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is
+essential for segmentation-based detection, which converts probability maps produced by a segmentation method into
+bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can
+perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can
+adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the
+performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on
+five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and
+speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can
+look for an ideal tradeoff between detection accuracy and efficiency.
+
+## Model Preparation
+
+### Prepare Resources
 
 ```shell
-$ mkdir data
-$ cd data
+mkdir data
+cd data
 ```
-ICDAR 2015
-Please [ICDAR 2015](https://rrc.cvc.uab.es/?ch=4&com=downloads) download ICDAR 2015 here
-ch4_training_images.zip、ch4_test_images.zip、ch4_training_localization_transcription_gt.zip、Challenge4_Test_Task1_GT.zip
+
+Download [ICDAR 2015](https://rrc.cvc.uab.es/?ch=4&com=downloads).
+
+- ch4_training_images.zip
+- ch4_test_images.zip
+- ch4_training_localization_transcription_gt.zip
+- Challenge4_Test_Task1_GT.zip
 
 ```shell
 mkdir icdar2015 && cd icdar2015
@@ -22,49 +39,46 @@ mv ch4_test_images imgs/test
 mv ch4_training_localization_transcription_gt annotations/training
 mv Challenge4_Test_Task1_GT annotations/test
 ```
-Please [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json) download instances_training.json here
-Please [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json) download instances_test.json here
 
-```shell
+Download [instances_training.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_training.json).
 
+Download [instances_test.json](https://download.openmmlab.com/mmocr/data/icdar2015/instances_test.json).
+
+```shell
 icdar2015/
 ├── imgs
 │   ├── test
 │   └── training
 ├── instances_test.json
 └── instances_training.json
-
 ```
-### Build Extension
 
-```shell
-$ DBNET_CV_WITH_OPS=1 python3 setup.py build && cp build/lib.linux*/dbnet_cv/_ext.cpython* dbnet_cv
-```
-### Install packages
+### Install Dependencies
 
 ```shell
-$ pip3 install -r requirements.txt
-```
+# Build Extension
+DBNET_CV_WITH_OPS=1 python3 setup.py build && cp build/lib.linux*/dbnet_cv/_ext.cpython* dbnet_cv
 
-### Training on single card
-```shell
-$ python3 train.py configs/textdet/dbnet/dbnet_mobilenetv3_fpnc_1200e_icdar2015.py
+# Install packages
+pip3 install -r requirements.txt
 ```
 
-### Training on mutil-cards
+## Model Training
+
 ```shell
-$ bash dist_train.sh configs/textdet/dbnet/dbnet_mobilenetv3_fpnc_1200e_icdar2015.py 8
+# Training on single card
+python3 train.py configs/textdet/dbnet/dbnet_mobilenetv3_fpnc_1200e_icdar2015.py
+
+# Training on mutil-cards
+bash dist_train.sh configs/textdet/dbnet/dbnet_mobilenetv3_fpnc_1200e_icdar2015.py 8
 ```
 
-## Results on BI-V100
+## Model Results
 
-| approach|  GPUs   | train mem | train FPS |
-| :-----: |:-------:| :-------: |:--------: |
-|  dbnet  | BI100x8 |   5426    |  54.375   |
+| Model | GPUs       | train mem | train FPS | 0_hmean-iou:recall | 0_hmean-iou:precision | 0_hmean-iou:hmean |
+|-------|------------|-----------|-----------|--------------------|-----------------------|-------------------|
+| DBNet | BI-V100 x8 | 5426      | 54.375    | 0.7111             | 0.8062                | 0.7557            |
 
-|0_hmean-iou:recall: |  0_hmean-iou:precision:  | 0_hmean-iou:hmean:|
-|      :-----:       |       :-------:          |     :-------:     |
-|      0.7111        |          0.8062          |       0.7557      |  
+## References
 
-## Reference
-https://github.com/open-mmlab/mmocr
+- [mmocr](https://github.com/open-mmlab/mmocr)SSS
diff --git a/cv/ocr/dbnet/pytorch/configs/textdet/dbnet/README.md b/cv/ocr/dbnet/pytorch/configs/textdet/dbnet/README.md
index d2007c72e..ac7a1be78 100755
--- a/cv/ocr/dbnet/pytorch/configs/textdet/dbnet/README.md
+++ b/cv/ocr/dbnet/pytorch/configs/textdet/dbnet/README.md
@@ -12,7 +12,7 @@ Recently, segmentation-based methods are quite popular in scene text detection,
 <img src="https://user-images.githubusercontent.com/22607038/142791306-0da6db2a-20a6-4a68-b228-64ff275f67b3.png"/>
 </div>
 
-## Results and models
+## Model Results
 
 ### ICDAR2015
 
diff --git a/cv/ocr/dbnetpp/paddlepaddle/README.md b/cv/ocr/dbnetpp/paddlepaddle/README.md
index fec54ed01..ad038cda1 100644
--- a/cv/ocr/dbnetpp/paddlepaddle/README.md
+++ b/cv/ocr/dbnetpp/paddlepaddle/README.md
@@ -1,33 +1,22 @@
 # DBNet++
 
-## Model description
+## Model Description
 
-Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
+DBNet++ is an advanced scene text detection model that combines a Differentiable Binarization (DB) module with an
+Adaptive Scale Fusion (ASF) mechanism. The DB module integrates binarization directly into the segmentation network,
+simplifying post-processing and improving accuracy. The ASF module enhances scale robustness by adaptively fusing
+multi-scale features. This architecture enables DBNet++ to detect text of arbitrary shapes and extreme aspect ratios
+efficiently, achieving state-of-the-art performance in both accuracy and speed across various text detection benchmarks.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# Clone PaddleOCR, branch: release/2.5
-git clone -b release/2.5  https://github.com/PaddlePaddle/PaddleOCR.git
-
-# Copy PaddleOCR 2.5 patch from toolbox
-yes | cp -rf ../../../../toolbox/PaddleOCR/* PaddleOCR/
-cd PaddleOCR
-
-# install requirements.
-bash ../init.sh
-
-# build PaddleOCR
-python3 setup.py develop
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
-
-Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015)
+Download [ICDAR 2015](https://deepai.org/dataset/icdar-2015) Dataset.
 
 ```bash
 # ICDAR2015 PATH as follow:
-ls -al /home/datasets/ICDAR2015/text_localization
+$ ls -al /home/datasets/ICDAR2015/text_localization
 total 133420
 drwxr-xr-x 4 root root      179 Jul 21 15:54 .
 drwxr-xr-x 3 root root       39 Jul 21 15:50 ..
@@ -46,7 +35,24 @@ ln -s /path/to/icdar2015/ train_data/icdar2015
 wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# Clone PaddleOCR, branch: release/2.5
+git clone -b release/2.5  https://github.com/PaddlePaddle/PaddleOCR.git
+
+# Copy PaddleOCR 2.5 patch from toolbox
+yes | cp -rf ../../../../toolbox/PaddleOCR/* PaddleOCR/
+cd PaddleOCR
+
+# install requirements.
+bash ../init.sh
+
+# build PaddleOCR
+python3 setup.py develop
+```
+
+## Model Training
 
 ```bash
 # run training
@@ -59,13 +65,12 @@ python3 -m paddle.distributed.launch --gpus $CUDA_VISIBLE_DEVICES \
     -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
 ```
 
-## Results
-
+## Model Results
 
-| GPUs       | IPS            | ACC               |
-| ------------ | ---------------- | ------------------- |
-| BI-V100 x8 | 5.46 samples/s | precision: 0.9062 |
+| Model   | GPUs       | IPS            | ACC               |
+|---------|------------|----------------|-------------------|
+| DBNet++ | BI-V100 x8 | 5.46 samples/s | precision: 0.9062 |
 
-## Reference
+## References
 
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git)
diff --git a/cv/ocr/dbnetpp/pytorch/README.md b/cv/ocr/dbnetpp/pytorch/README.md
index 244dc8284..ece02b7e8 100644
--- a/cv/ocr/dbnetpp/pytorch/README.md
+++ b/cv/ocr/dbnetpp/pytorch/README.md
@@ -1,10 +1,23 @@
 # DBNet++
 
-## Model description
+## Model Description
 
-Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the pixel-level descriptions. However, the vast majority of the existing segmentation-based approaches are limited to their complex post-processing algorithms and the scale robustness of their segmentation models, where the post-processing algorithms are not only isolated to the model optimization but also time-consuming and the scale robustness is usually strengthened by fusing multi-scale feature maps directly. In this paper, we propose a Differentiable Binarization (DB) module that integrates the binarization process, one of the most important steps in the post-processing procedure, into a segmentation network. Optimized along with the proposed DB module, the segmentation network can produce more accurate results, which enhances the accuracy of text detection with a simple pipeline. Furthermore, an efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively. By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
+DBNet++ is an advanced scene text detection model that combines a Differentiable Binarization (DB) module with an
+Adaptive Scale Fusion (ASF) mechanism. The DB module integrates binarization directly into the segmentation network,
+simplifying post-processing and improving accuracy. The ASF module enhances scale robustness by adaptively fusing
+multi-scale features. This architecture enables DBNet++ to detect text of arbitrary shapes and extreme aspect ratios
+efficiently, achieving state-of-the-art performance in both accuracy and speed across various text detection benchmarks.
 
-## Step 1: Installation
+## Model Preparation
+
+### Prepare Resources
+
+```bash
+mkdir data
+python3 tools/dataset_converters/prepare_dataset.py icdar2015 --task textdet
+```
+
+### Install Dependencies
 
 ```bash
 # Install libGL
@@ -26,14 +39,7 @@ mkdir -p /root/.cache/torch/hub/checkpoints/
 wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
 ```
 
-## Step 2: Preparing datasets
-
-```bash
-mkdir data
-python3 tools/dataset_converters/prepare_dataset.py icdar2015 --task textdet
-```
-
-## Step 3: Training
+## Model Training
 
 ```bash
 sed -i 's/val_interval=20/val_interval=1200/g' configs/textdet/_base_/schedules/schedule_sgd_1200e.py
@@ -47,12 +53,13 @@ python3 tools/train.py configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar
 # Multiple GPUs on one machine
 bash tools/dist_train.sh configs/textdet/dbnetpp/dbnetpp_resnet50_fpnc_1200e_icdar2015.py 8
 ```
-## Results
 
-|    GPUs    | Precision | Recall | Hmean |
-| ---------- | --------- | ------ | ----- |
-| BI-V100 x8 | 0.8823 | 0.8156 | 0.8476 |
+## Model Results
+
+| Model   | GPU        | Precision | Recall | Hmean  |
+|---------|------------|-----------|--------|--------|
+| DBNet++ | BI-V100 x8 | 0.8823    | 0.8156 | 0.8476 |
 
-## Reference
+## References
 
 - [mmocr](https://github.com/open-mmlab/mmocr/tree/v1.0.1/configs/textdet/dbnetpp)
diff --git a/cv/ocr/pp-ocr-db/paddlepaddle/README.md b/cv/ocr/pp-ocr-db/paddlepaddle/README.md
index 15ecf8377..0d1b59ca9 100644
--- a/cv/ocr/pp-ocr-db/paddlepaddle/README.md
+++ b/cv/ocr/pp-ocr-db/paddlepaddle/README.md
@@ -1,15 +1,18 @@
 # PP-OCR-DB
 
-## Step 1: Installing
-```bash
-git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git
-cd PaddleOCR
-pip3 install -r requirements.txt
-```
+## Model Description
+
+PP-OCR-DB is an efficient deep learning model for text detection, part of the PaddleOCR framework. It combines a
+MobileNetV3 backbone with a Differentiable Binarization (DB) module to accurately detect text in various scenarios. The
+model is optimized for real-time performance and can handle diverse text layouts and orientations. PP-OCR-DB is
+particularly effective in document analysis and scene text recognition tasks, offering a balance between accuracy and
+computational efficiency for practical OCR applications.
+
+## Model Preparation
 
-## Step 2: Download data
+### Prepare Resources
 
-Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015) 
+Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015)
 
 ```bash
 # ICDAR2015 PATH as follow:
@@ -26,7 +29,15 @@ drwxr-xr-x 2 root root    24576 Jul 21 15:53 icdar_c4_train_imgs
 
 ```
 
-## Step 3: Run PP-OCR-DB
+### Install Dependencies
+
+```bash
+git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR
+pip3 install -r requirements.txt
+```
+
+## Model Training
 
 ```bash
 # Notice: modify "configs/det/det_mv3_db.yml" file, set the datasets path as yours.
diff --git a/cv/ocr/pp-ocr-east/paddlepaddle/README.md b/cv/ocr/pp-ocr-east/paddlepaddle/README.md
index a3acde4f4..d842f8308 100644
--- a/cv/ocr/pp-ocr-east/paddlepaddle/README.md
+++ b/cv/ocr/pp-ocr-east/paddlepaddle/README.md
@@ -1,23 +1,22 @@
 # PP-OCR-EAST
-## Model description
 
-EAST (Efficient and Accurate Scene Text Detector) is a deep learning model designed for detecting and recognizing text in natural scene images. 
-It was developed by researchers at the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, and was presented in a research paper in 2017.
+## Model Description
 
-## Step 1: Installation
-```bash
-git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git
-cd PaddleOCR
-pip3 install -r requirements.txt
-```
+PP-OCR-EAST is an efficient scene text detection model based on the EAST architecture, optimized within the PaddleOCR
+framework. It combines a MobileNetV3 backbone with the EAST detection mechanism to accurately locate text in natural
+scene images. The model is designed for real-time performance and can handle text of various orientations and sizes.
+PP-OCR-EAST is particularly effective in complex scenarios, offering a balance between detection accuracy and
+computational efficiency for practical OCR applications.
+
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
-Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015) 
+Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015)
 
 ```bash
 # ICDAR2015 PATH as follow:
-ls -al /home/datasets/ICDAR2015/text_localization
+$ ls -al /home/datasets/ICDAR2015/text_localization
 total 133420
 drwxr-xr-x 4 root root      179 Jul 21 15:54 .
 drwxr-xr-x 3 root root       39 Jul 21 15:50 ..
@@ -30,7 +29,15 @@ drwxr-xr-x 2 root root    24576 Jul 21 15:53 icdar_c4_train_imgs
 
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR
+pip3 install -r requirements.txt
+```
+
+## Model Training
 
 ```bash
 # Notice: modify "configs/det/det_mv3_east.yml" file, set the datasets path as yours.
@@ -41,11 +48,12 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3 -u -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/det_mv3_east.yml -o Global.use_visualdl=True 
 ```
 
-## Results
+## Model Results
+
+| Model       | GPU        | FPS   | ACC                             |
+|-------------|------------|-------|---------------------------------|
+| PP-OCR-EAST | BI-V100 x8 | 50.08 | hmean:0.7711, precision: 0.7752 |
 
-GPUs|FPS|ACC
-----|---|---
-BI-V100 x8|50.08|hmean:0.7711, precision: 0.7752
+## References
 
-## Reference
 - [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git)
diff --git a/cv/ocr/pse/paddlepaddle/README.md b/cv/ocr/pse/paddlepaddle/README.md
index 18e6078e3..413c7f087 100644
--- a/cv/ocr/pse/paddlepaddle/README.md
+++ b/cv/ocr/pse/paddlepaddle/README.md
@@ -1,21 +1,20 @@
 # PSE
 
-## Model description
-[Shape robust text detection with progressive scale expansion network](https://arxiv.org/abs/1903.12473) Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai CVPR, 2019
+## Model Description
 
-## Step 1: Installing
-```bash
-git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git
-cd PaddleOCR
-pip3 install -r requirements.txt
-```
+PSE (Progressive Scale Expansion Network) is a deep learning model for robust text detection in natural scenes. It
+addresses the challenge of detecting text with arbitrary shapes by progressively expanding text regions through a scale
+expansion algorithm. PSE effectively handles complex scenarios like curved text and overlapping instances. The model's
+architecture combines feature pyramid networks with a novel post-processing method, making it particularly suitable for
+detecting text in diverse orientations and layouts with high accuracy.
 
-## Step 2: Download data
+## Model Preparation
 
-Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015) 
+### Prepare Resources
 
-```bash
+Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015)
 
+```bash
 # ICDAR2015 PATH as follow:
 ls -al /home/datasets/ICDAR2015/text_localization
 total 133420
@@ -27,10 +26,17 @@ drwxr-xr-x 2 root root    12288 Jul 21 15:53 ch4_test_images
 drwxr-xr-x 2 root root    24576 Jul 21 15:53 icdar_c4_train_imgs
 -rw-r--r-- 1 root root   468453 Jul 21 15:54 test_icdar2015_label.txt
 -rw-r--r-- 1 root root  1063118 Jul 21 15:54 train_icdar2015_label.txt
+```
+
+### Install Dependencies
 
+```bash
+git clone --recursive https://github.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR
+pip3 install -r requirements.txt
 ```
 
-## Step 3: Run PSE
+## Model Training
 
 ```bash
 # Notice: modify "configs/det/det_r50_vd_pse.yml" file, set the datasets path as yours.
diff --git a/cv/ocr/sar/pytorch/README.md b/cv/ocr/sar/pytorch/README.md
index 06a7dee28..cee369f63 100755
--- a/cv/ocr/sar/pytorch/README.md
+++ b/cv/ocr/sar/pytorch/README.md
@@ -1,28 +1,26 @@
 # SAR
 
-## Model description
+## Model Description
 
-Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using off-the-shelf neural network components and only word-level annotations. It is composed of a 31-layer ResNet, an LSTM-based encoder-decoder framework and a 2-dimensional attention module. Despite its simplicity, the proposed method is robust and achieves state-of-the-art performance on both regular and irregular scene text recognition benchmarks.
+Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as
+curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra
+fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data
+collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using
+off-the-shelf neural network components and only word-level annotations. It is composed of a 31-layer ResNet, an
+LSTM-based encoder-decoder framework and a 2-dimensional attention module. Despite its simplicity, the proposed method
+is robust and achieves state-of-the-art performance on both regular and irregular scene text recognition benchmarks.
 
-## Step 1: Installation
+## Model Preparation
 
-```shell
-cd csrc/
-bash clean.sh
-bash build.sh
-bash install.sh
-cd ..
-pip3 install -r requirements.txt
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 ```bash
 mkdir data
 cd data
 ```
 
-Reffering to [MMOCR Docs](https://mmocr.readthedocs.io/zh_CN/dev-1.x/user_guides/data_prepare/datasetzoo.html) to prepare datasets. Datasets path would look like below:
+Reffering to [MMOCR Docs](https://mmocr.readthedocs.io/zh_CN/dev-1.x/user_guides/data_prepare/datasetzoo.html) to
+prepare datasets. Datasets path would look like below:
 
 ```bash
 ├── mixture
@@ -95,18 +93,27 @@ Reffering to [MMOCR Docs](https://mmocr.readthedocs.io/zh_CN/dev-1.x/user_guides
 │   │   ├── val_label.txt
 ```
 
-## Step 3: Training
+### Install Dependencies
 
-### Training on single card
-```bash
-python3 train.py configs/sar_r31_parallel_decoder_academic.py
+```shell
+cd csrc/
+bash clean.sh
+bash build.sh
+bash install.sh
+cd ..
+pip3 install -r requirements.txt
 ```
 
-### Training on mutil-cards
+## Model Training
+
 ```bash
+# Training on single card
+python3 train.py configs/sar_r31_parallel_decoder_academic.py
+
+# Training on mutil-cards
 bash dist_train.sh configs/sar_r31_parallel_decoder_academic.py 8
 ```
 
-## Reference
-https://github.com/open-mmlab/mmocr
+## References
 
+- [mmocr](https://github.com/open-mmlab/mmocr)
diff --git a/cv/ocr/sast/paddlepaddle/README.md b/cv/ocr/sast/paddlepaddle/README.md
index db4db6670..f98f954b7 100644
--- a/cv/ocr/sast/paddlepaddle/README.md
+++ b/cv/ocr/sast/paddlepaddle/README.md
@@ -1,20 +1,21 @@
 # SAST
 
-## Description
+## Model Description
 
-SAST is a cutting-edge segmentation-based text detector designed for recognizing scene text of arbitrary shapes. Leveraging a context attended multi-task learning framework anchored on a Fully Convolutional Network (FCN), it adeptly learns geometric properties to reconstruct text regions into polygonal shapes. Incorporating a Context Attention Block, SAST captures long-range pixel dependencies for improved segmentation accuracy, while its Point-to-Quad assignment method efficiently clusters pixels into text instances by merging high-level and low-level information. Demonstrated to be highly effective across several benchmarks like ICDAR2015 and SCUT-CTW1500, SAST not only shows superior accuracy but also operates efficiently, achieving significant performance metrics such as running at 27.63 FPS on a NVIDIA Titan Xp with a high detection accuracy, making it a notable solution for arbitrary-shaped text detection challenges.
+SAST is a cutting-edge segmentation-based text detector designed for recognizing scene text of arbitrary shapes.
+Leveraging a context attended multi-task learning framework anchored on a Fully Convolutional Network (FCN), it adeptly
+learns geometric properties to reconstruct text regions into polygonal shapes. Incorporating a Context Attention Block,
+SAST captures long-range pixel dependencies for improved segmentation accuracy, while its Point-to-Quad assignment
+method efficiently clusters pixels into text instances by merging high-level and low-level information. Demonstrated to
+be highly effective across several benchmarks like ICDAR2015 and SCUT-CTW1500, SAST not only shows superior accuracy but
+also operates efficiently, achieving significant performance metrics such as running at 27.63 FPS on a NVIDIA Titan Xp
+with a high detection accuracy, making it a notable solution for arbitrary-shaped text detection challenges.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleOCR.git
-cd PaddleOCR/
-pip3 install -r requirements.txt
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
-Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015) 
+Download the [ICDAR2015 Dataset](https://deepai.org/dataset/icdar-2015)
 
 ```bash
 # ICDAR2015 PATH as follow:
@@ -28,10 +29,17 @@ drwxr-xr-x 2 root root    12288 Jul 21 15:53 ch4_test_images
 drwxr-xr-x 2 root root    24576 Jul 21 15:53 icdar_c4_train_imgs
 -rw-r--r-- 1 root root   468453 Jul 21 15:54 test_icdar2015_label.txt
 -rw-r--r-- 1 root root  1063118 Jul 21 15:54 train_icdar2015_label.txt
+```
 
+### Install Dependencies
+
+```bash
+git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR/
+pip3 install -r requirements.txt
 ```
 
-## Step 3: Training
+## Model Training
 
 ```bash
 # Notice: modify "configs/det/det_r50_vd_sast_icdar15.yml" file, set the datasets path as yours.
@@ -43,12 +51,12 @@ python3 -u -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7' tools/train.py
 >train.log 2>&1 &
 ```
 
-## Results
+## Model Results
 
-GPUs|FPS|ACC
-----|---|---
-BI-V100 x8| ips: 11.24631 samples/s | hmean: 0.817155756207675
+| Model | GPU        | FPS                     | ACC                      |
+|-------|------------|-------------------------|--------------------------|
+| SAST  | BI-V100 x8 | ips: 11.24631 samples/s | hmean: 0.817155756207675 |
 
-## Reference
-- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git)
+## References
 
+- [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR.git)
diff --git a/cv/ocr/satrn/pytorch/base/README.md b/cv/ocr/satrn/pytorch/base/README.md
index 29ba40015..f3ce80cdd 100755
--- a/cv/ocr/satrn/pytorch/base/README.md
+++ b/cv/ocr/satrn/pytorch/base/README.md
@@ -1,35 +1,25 @@
-# SATRN(Self-Attention Text Recognition Network)
+# SATRN
 
-## Model description
+## Model Description
 
-Scene text recognition (STR) is the task of recognizing character sequences in natural scenes. While there have been great advances in STR methods, current methods still fail to recognize texts in arbitrary shapes, such as heavily curved or rotated texts, which are abundant in daily life (e.g. restaurant signs, product labels, company logos, etc). This paper introduces a novel architecture to recognizing texts of arbitrary shapes, named Self-Attention Text Recognition Network (SATRN), which is inspired by the Transformer. SATRN utilizes the self-attention mechanism to describe two-dimensional (2D) spatial dependencies of characters in a scene text image. Exploiting the full-graph propagation of self-attention, SATRN can recognize texts with arbitrary arrangements and large inter-character spacing. As a result, SATRN outperforms existing STR models by a large margin of 5.7 pp on average in "irregular text" benchmarks. We provide empirical analyses that illustrate the inner mechanisms and the extent to which the model is applicable (e.g. rotated and multi-line text). We will open-source the code.
+SATRN (Self-Attention Text Recognition Network) is an advanced deep learning model for scene text recognition,
+particularly effective for texts with arbitrary shapes like curved or rotated characters. Inspired by Transformer
+architecture, SATRN utilizes self-attention mechanisms to capture 2D spatial dependencies between characters. This
+enables it to handle complex text arrangements and large inter-character spacing with high accuracy. SATRN significantly
+outperforms traditional methods in recognizing irregular texts, making it valuable for real-world applications like sign
+and logo recognition.
 
+## Model Preparation
 
-## Step 1: Installation
-
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-dev
-
-cd /satrn/pytorch/base/csrc
-bash clean.sh
-bash build.sh
-bash install.sh
-cd ..
-pip3 install -r requirements.txt
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 ```bash
 mkdir data
 cd data
 ```
 
-Reffering to [MMOCR Docs](https://mmocr.readthedocs.io/zh_CN/dev-1.x/user_guides/data_prepare/datasetzoo.html) to prepare datasets. Datasets path would look like below:
+Reffering to [MMOCR Docs](https://mmocr.readthedocs.io/zh_CN/dev-1.x/user_guides/data_prepare/datasetzoo.html) to
+prepare datasets. Datasets path would look like below:
 
 ```bash
 ├── mixture
@@ -102,37 +92,43 @@ Reffering to [MMOCR Docs](https://mmocr.readthedocs.io/zh_CN/dev-1.x/user_guides
 │   │   ├── val_label.txt
 ```
 
-## Step 3: Training
+### Install Dependencies
 
-### Training on single card
 ```bash
-python3 train.py configs/models/satrn_academic.py
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-dev
+
+cd /satrn/pytorch/base/csrc
+bash clean.sh
+bash build.sh
+bash install.sh
+cd ..
+pip3 install -r requirements.txt
 ```
 
-### Training on mutil-cards
+## Model Training
+
 ```bash
+# Training on single card
+python3 train.py configs/models/satrn_academic.py
+
+# Training on mutil-cards
 bash dist_train.sh configs/models/satrn_academic.py 8
 ```
 
-## Results on BI-V100
+## Model Results
 
-| approach|  GPUs   | train mem | train FPS |
-| :-----: |:-------:| :-------: |:--------: |
-|  satrn  | BI100x8 |   14.159G |  549.94   |
-
-| dataset |   acc   |
-| :-----: |:-------:|
-|  IIIT5K |   94.5  |
-|  IC15   |   83.3  |
-|  SVTP   |   88.4  |
+| Model | GPU        | train mem | train FPS | ACC                                  |
+|-------|------------|-----------|-----------|--------------------------------------|
+| SATRN | BI-V100 x8 | 14.159G   | 549.94    | IIIT5K: 94.5, IC15: 83.3, SVTP: 88.4 |
 
 | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power（W） | Scalability | Memory utilization（G） | Stability |
 |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------|
 | 0.841                | SDK V2.2,bs:128,8x,fp32                  | 630         | 88.4     | 166\*8     | 0.98        | 28.5\*8                 | 1         |
 
+## References
 
-## Reference
-https://github.com/open-mmlab/mmocr
-
-
-
+- [mmocr](https://github.com/open-mmlab/mmocr)
diff --git a/cv/point_cloud/point-bert/pytorch/README.md b/cv/point_cloud/point-bert/pytorch/README.md
index dc7d729b9..71231e2c1 100644
--- a/cv/point_cloud/point-bert/pytorch/README.md
+++ b/cv/point_cloud/point-bert/pytorch/README.md
@@ -1,33 +1,63 @@
 # Point-BERT
 
-Point-BERT is a new paradigm for learning Transformers to generalize the concept of BERT onto 3D point cloud. Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first divide a point cloud into several local patches, and a point cloud Tokenizer is devised via a discrete Variational AutoEncoder (dVAE) to generate discrete point tokens containing meaningful local information. Then, we randomly mask some patches of input point clouds and feed them into the backbone Transformer. The pre-training objective is to recover the original point tokens at the masked locations under the supervision of point tokens obtained by the Tokenizer.
+## Model Description
 
-## Step 1: Installing packages
+Point-BERT is a new paradigm for learning Transformers to generalize the concept of BERT onto 3D point cloud. Inspired
+by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first
+divide a point cloud into several local patches, and a point cloud Tokenizer is devised via a discrete Variational
+AutoEncoder (dVAE) to generate discrete point tokens containing meaningful local information. Then, we randomly mask
+some patches of input point clouds and feed them into the backbone Transformer. The pre-training objective is to recover
+the original point tokens at the masked locations under the supervision of point tokens obtained by the Tokenizer.
+
+## Model Preparation
+
+### Prepare Resources
+
+Please refer to [DATASET.md](./DATASET.md) for preparing `ShapeNet55` and `processed ModelNet`.
+The dataset dircectory tree would be like:
+
+```bash
+data/
+├── ModelNet
+│   └── modelnet40_normal_resampled
+│       ├── modelnet40_test_8192pts_fps.dat
+│       └── modelnet40_train_8192pts_fps.dat
+├── ScanObjectNN_shape_names.txt
+├── ShapeNet55-34
+│   ├── ShapeNet-55
+│   │   ├── test.txt
+│   │   └── train.txt
+│   └── shapenet_pc
+└── shapenet_synset_dict.json
+```
+
+### Install Dependencies
 
 > Warning: Now only support Ubuntu OS. If your OS is centOS, you may need to compile open3d from source.
 
 * system
 
-```sh
+```bash
 apt update
 apt install libgl1-mesa-glx
 ```
 
 * python
 
-```sh
-pip3 install argparse easydict h5py matplotlib numpy open3d==0.10 opencv-python pyyaml scipy tensorboardX timm==0.4.5  tqdm transforms3d termcolor scikit-learn==0.24.1 Ninja --default-timeout=1000
+```bash
+pip3 install argparse easydict h5py matplotlib numpy open3d==0.10 opencv-python pyyaml scipy tensorboardX timm==0.4.5 \
+             tqdm transforms3d termcolor scikit-learn==0.24.1 Ninja --default-timeout=1000
 ```
 
 * Chamfer Distance
 
-```sh
+```bash
 bash install.sh
 ```
 
 * PointNet++
 
-```sh
+```bash
 cd ./Pointnet2_PyTorch
 pip3 install pointnet2_ops_lib/.
 cd -
@@ -35,35 +65,15 @@ cd -
 
 * GPU kNN
 
-```sh
+```bash
 pip3 install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
 ```
 
-## Step 2: Preparing datasets
-
-Please refer to [DATASET.md](./DATASET.md) for preparing `ShapeNet55` and `processed ModelNet`.
-The dataset dircectory tree would be like:
-
-```sh
-data/
-├── ModelNet
-│   └── modelnet40_normal_resampled
-│       ├── modelnet40_test_8192pts_fps.dat
-│       └── modelnet40_train_8192pts_fps.dat
-├── ScanObjectNN_shape_names.txt
-├── ShapeNet55-34
-│   ├── ShapeNet-55
-│   │   ├── test.txt
-│   │   └── train.txt
-│   └── shapenet_pc
-└── shapenet_synset_dict.json
-```
-
-## Step 3: Training
+## Model Training
 
 * dVAE train
 
-```sh
+```bash
 bash scripts/train.sh 0 --config cfgs/ShapeNet55_models/dvae.yaml --exp_name dVAE
 ```
 
@@ -71,10 +81,10 @@ bash scripts/train.sh 0 --config cfgs/ShapeNet55_models/dvae.yaml --exp_name dVA
 
 When dVAE has finished training, you should be edit `cfgs/Mixup_models/Point-BERT.yaml`, and add the path of dvae_config-ckpt.
 
-```sh
+```bash
 bash ./scripts/dist_train_BERT.sh <NumGPUs> 12345 --config cfgs/Mixup_models/Point-BERT.yaml --exp_name pointBERT_pretrain --val_freq 2 
 ```
 
-## Reference
+## References
 
-[Point-BERT](https://github.com/lulutang0608/Point-BERT)
+* [Point-BERT](https://github.com/lulutang0608/Point-BERT)
-- 
Gitee


From f6627f0e6e0489df2873a4459d4852ff454b9082 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Mon, 10 Mar 2025 16:06:05 +0800
Subject: [PATCH 3/9] unify model readme format - cv/mot

Signed-off-by: mingjiang.li <mingjiang.li@iluvatar.com>
---
 cv/multi_object_tracking/README.md            |   1 -
 .../bytetrack/paddlepaddle/README.md          |  59 ++++-----
 .../deep_sort/pytorch/README.md               |  39 +++---
 .../fairmot/pytorch/README.md                 | 119 ++++++------------
 4 files changed, 91 insertions(+), 127 deletions(-)
 delete mode 100644 cv/multi_object_tracking/README.md

diff --git a/cv/multi_object_tracking/README.md b/cv/multi_object_tracking/README.md
deleted file mode 100644
index 366545845..000000000
--- a/cv/multi_object_tracking/README.md
+++ /dev/null
@@ -1 +0,0 @@
-# Object Tracking
diff --git a/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md b/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md
index 66132c1aa..4e7ef4662 100644
--- a/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md
+++ b/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md
@@ -1,30 +1,25 @@
 # ByteTrack
 
-## Model description
+## Model Description
 
-ByteTrack is a simple, fast and strong multi-object tracker.
+ByteTrack is an efficient multi-object tracking (MOT) model that improves tracking accuracy by associating every
+detection box, including low-score ones, rather than discarding them. It addresses challenges like occluded objects and
+fragmented trajectories by leveraging similarities between detections and tracklets. ByteTrack achieves state-of-the-art
+performance on benchmarks like MOT17, with high MOTA, IDF1, and HOTA scores while maintaining real-time processing
+speeds. Its simple yet effective design makes it a robust solution for various object tracking applications in video
+analysis.
 
-Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 scores ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a simple and strong tracker, named ByteTrack. For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU.
+## Model Preparation
 
-## Step 1: Installation
+### Prepare Resources
 
-```bash
-git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip3 install -r requirements.txt
-pip3 install protobuf==3.20.3 
-pip3 install urllib3==1.26.6
-yum install mesa-libGL
-python3 setup.py develop
-```
-
-## Step 2: Preparing datasets
-
-Go to visit [MOT17 official website](https://motchallenge.net/), then download the MOT17 dataset, or you can download via [paddledet data](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip),then extract and place it in the dataset/mot/folder.
+Go to visit [MOT17 official website](https://motchallenge.net/), then download the MOT17 dataset, or you can download
+via [paddledet data](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip),then extract and place it in the
+dataset/mot/folder.
 
 The dataset path structure sholud look like:
 
-```
+```bash
 datasets/mot/MOT17/
 ├── annotations
 │   ├── train_half.json
@@ -39,10 +34,21 @@ datasets/mot/MOT17/
 
 ```
 
-## Step 3: Training
+### Install Dependencies
 
 ```bash
+git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleDetection.git
+cd PaddleDetection
+pip3 install -r requirements.txt
+pip3 install protobuf==3.20.3 
+pip3 install urllib3==1.26.6
+yum install mesa-libGL
+python3 setup.py develop
+```
 
+## Model Training
+
+```bash
 # One GPU
 CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
 
@@ -50,15 +56,12 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/mot/bytetrack/detector/
 python3 -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp
 ```
 
-## Results
+## Model Results
 
-| MODEL         | mAP(0.5:0.95)|
-| ----------    | ------------ |
-| ByteTrack     |     0.538    |
+| Model     | GPU        | FPS    | mAP(0.5:0.95) |
+|-----------|------------|--------|---------------|
+| ByteTrack | BI-V100 x8 | 4.6504 | 0.538         |
 
-| GPUS       | FPS     | 
-| ---------- | ------  |
-| BI-V100x 8 | 4.6504  |
+## References
 
-## Reference
-- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
\ No newline at end of file
+- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
diff --git a/cv/multi_object_tracking/deep_sort/pytorch/README.md b/cv/multi_object_tracking/deep_sort/pytorch/README.md
index 0a8517920..bf4debfe1 100644
--- a/cv/multi_object_tracking/deep_sort/pytorch/README.md
+++ b/cv/multi_object_tracking/deep_sort/pytorch/README.md
@@ -1,18 +1,16 @@
 # DeepSORT
 
-## Model description
+## Model Description
 
-This is an implement of MOT tracking algorithm deep sort. Deep sort is basicly the same with sort
-but added a CNN model to extract features in image of human part bounded by a detector. This CNN
-model is indeed a RE-ID model and the detector used in [PAPER](https://arxiv.org/abs/1703.07402) is
-FasterRCNN , and the original source code is [HERE](https://github.com/nwojke/deep_sort). However in
-original code, the CNN model is implemented with tensorflow, which I'm not familier with. SO I
-re-implemented the CNN feature extraction model with PyTorch, and changed the CNN model a little
-bit. Also, I use **YOLOv3** to generate bboxes instead of FasterRCNN.
+DeepSORT is an advanced multi-object tracking algorithm that extends SORT by incorporating deep learning-based
+appearance features. It combines motion information with a CNN-based RE-ID model to track objects more accurately,
+especially in complex scenarios with occlusions. DeepSORT uses a Kalman filter for motion prediction and associates
+detections using both motion and appearance cues. This approach improves tracking consistency and reduces identity
+switches, making it particularly effective for person tracking in crowded scenes and video surveillance applications.
 
-We just need to train the RE-ID model!
+## Model Preparation
 
-## Preparing datasets
+### Prepare Resources
 
 Download the [Market-1501](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html)
 
@@ -50,25 +48,26 @@ data
 ├── gallery
 ```
 
-## Training
+## Model Training
 
-The original model used in paper is in original_model.py, and its parameter here [original_ckpt.t7](https://drive.google.com/drive/folders/1xhG0kRH1EX5B9_Iz8gQJb7UNnn_riXi6).  
+The original model used in paper is in original_model.py, and its parameter here
+[original_ckpt.t7](https://drive.google.com/drive/folders/1xhG0kRH1EX5B9_Iz8gQJb7UNnn_riXi6).  
 
 ```sh
+# Train
 python3 train.py --data-dir your path
-```
-
-## Evaluate your model
 
-```sh
+# Evaluate your model.
 python3 test.py --data-dir your path
 python3 evaluate.py
 ```
 
-## Results
+## Model Results
 
-Acc top1:0.980
+| Model    | GPU        | Top1 ACC |
+|----------|------------|----------|
+| DeepSORT | BI-V100 x8 | 0.980    |
 
-## Reference
+## References
 
-Please refer to <https://github.com/ZQPei/deep_sort_pytorch>
+- [deep_sort_pytorch](https://github.com/ZQPei/deep_sort_pytorch)
diff --git a/cv/multi_object_tracking/fairmot/pytorch/README.md b/cv/multi_object_tracking/fairmot/pytorch/README.md
index f087b9587..e4167dd92 100644
--- a/cv/multi_object_tracking/fairmot/pytorch/README.md
+++ b/cv/multi_object_tracking/fairmot/pytorch/README.md
@@ -1,19 +1,18 @@
 # FairMOT
 
-## Model description
+## Model Description
 
-FairMOT is a model for multi-object tracking which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the tasks is used to achieve high levels of detection and tracking accuracy. The detection branch is implemented in an anchor-free style which estimates object centers and sizes represented as position-aware measurement maps. Similarly, the re-ID branch estimates a re-ID feature for each pixel to characterize the object centered at the pixel. Note that the two branches are completely homogeneous which essentially differs from the previous methods which perform detection and re-ID in a cascaded style. It is also worth noting that FairMOT operates on high-resolution feature maps of strides four while the previous anchor-based methods operate on feature maps of stride 32. The elimination of anchors as well as the use of high-resolution feature maps better aligns re-ID features to object centers which significantly improves the tracking accuracy.
+FairMOT is an innovative multi-object tracking model that unifies detection and re-identification in a single framework.
+It features two homogeneous branches: one for anchor-free object detection and another for re-ID feature extraction.
+Operating on high-resolution feature maps, FairMOT achieves fairness between detection and re-ID tasks, resulting in
+improved tracking accuracy. Its joint learning approach eliminates the need for cascaded processing, making it more
+efficient and effective for complex tracking scenarios in crowded environments.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```shell
-$ pip3 install -r requirements.txt
-$ pip3 install pandas progress
-```
+### Prepare Resources
 
-## Step 2: Preparing data
-
-### Download MOT17 dataset
+Download MOT17 dataset
 
 - [Baidu NetDisk](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ)
 - [Google Drive](https://drive.google.com/file/d/1ET-6w12yHNo8DKevOVgK1dBlYs739e_3/view?usp=sharing)
@@ -21,18 +20,18 @@ $ pip3 install pandas progress
 
 ```shell
 # Download MOT17
-$ mkdir -p data/MOT
-$ cd data/MOT
+mkdir -p data/MOT
+cd data/MOT
 
 ```
 
 ```shell
-$ unzip -q MOT17.zip
-$ mkdir MOT17/images && mkdir MOT17/labels_with_ids
-$ mv ./MOT17/train ./MOT17/images/ && mv ./MOT17/test ./MOT17/images/
+unzip -q MOT17.zip
+mkdir MOT17/images && mkdir MOT17/labels_with_ids
+mv ./MOT17/train ./MOT17/images/ && mv ./MOT17/test ./MOT17/images/
 
-$ cd ../../
-$ python3 src/gen_labels_17.py
+cd ../../
+python3 src/gen_labels_17.py
 
 ## The dataset path looks like below
 data/
@@ -45,7 +44,7 @@ data/
             └── train
 ```
 
-### Download Pretrained models
+Download Pretrained models
 
 - DLA-34 COCO pretrained model: [DLA-34 official](https://drive.google.com/file/d/18Q3fzzAsha_3Qid6mn4jcIFPeOGUaj1d)
 - HRNetV2-W18 ImageNet pretrained model: [BaiduYun（Access Code: r5xn)](https://pan.baidu.com/s/1Px_g1E2BLVRkKC5t-b-R5Q)
@@ -53,91 +52,55 @@ data/
 
 ```shell
 # Download ctdet_coco_dla_2x
-$ mkdir -p models
-$ cd models
+mkdir -p models
+cd models
 ```
 
-## Step 3: Training
-
-**The available train scripts are as follows:**
+### Install Dependencies
 
 ```shell
-train_dla34_mot17.sh
-train_hrnet18_mot17.sh
-train_hrnet32_mot17.sh
-
+pip3 install -r requirements.txt
+pip3 install pandas progress
 ```
 
+## Model Training
 
-### On single GPU
+The available train scripts are as follows:
 
 ```shell
-$ GPU_NUMS=1 bash <script> --gpus 0
-```
-for example, wen train dla34 with MOT17 dataset, batchsize is 18, can use cmd: 
-```shell
-$ GPU_NUMS=1 bash train_dla34_mot17.sh --gpus 0 --batch_size 18
-```
+train_dla34_mot17.sh
+train_hrnet18_mot17.sh
+train_hrnet32_mot17.sh
 
-### Multiple GPUs on one machine
+# On single GPU
+GPU_NUMS=1 bash train_dla34_mot17.sh --gpus 0 --batch_size 18
 
-```shell
-$ GPU_NUMS=<gpu numbers> bash <script> --gpus <gpu ids>
-```
-for example, wen train dla34 with MOT17 dataset, using gpu x8, batchsize is 144(per gpu batchsize is 18), can use cmd: 
-```shell
-$ GPU_NUMS=8 bash train_dla34_mot17.sh --gpus 0,1,2,3,4,5,6,7 --batch_size 144
+# Multiple GPUs on one machine
+GPU_NUMS=8 bash train_dla34_mot17.sh --gpus 0,1,2,3,4,5,6,7 --batch_size 144
 ```
 
-** To reduce training time span, you can append "--num_epochs 1 --num_iters 300" to the command. **
-
-### Training arguments
-
-```python
-
-# dataloader threads. 0 for single-thread.
-num_workers: int = 8
-
-# gpus model trained on, -1 for CPU, use comma for multiple gpus
-# gpu is indexed starting from 0
-# When multiple gpus are used, gpus of consecutive index are allowed by default. If you want to use gpus of nonconsecutive index, set CUDA_VISIBLE_DEVICES environment variable.
-gpus: default = '0'
+To reduce training time span, you can append "--num_epochs 1 --num_iters 300" to the command.
 
-# not use torch.backends.cudnn.benchmark = True
-# If needed, use --not_cuda_benchmark on command line.
-not_cuda_benchmark: action = 'store_true'
+Evaluate on tesing datasets.
 
-# batch size of all gpus
-batch_size: int = 12
-
-# total training epochs
-num_epochs: int = 30
-
-# number of iters in one epoch,set to -1 use (samples_number / batch_size)
-num_iters: int = -1
-
-```
-
-## Testing
-> Only command on testing dataset is provided. MOT dataset is not public. To access MOT, submit via [motchallenge](https://motchallenge.net/instructions/).
+> Only command on testing dataset is provided. MOT dataset is not public. To access MOT, submit via
+> [motchallenge](https://motchallenge.net/instructions/).
 
 ```shell
 cd /path/to/modelzoo/official/cv/tracking/fairmot/pytorch/src
 python3 track.py mot --val_mot17 True --load_model /path/to/saved/model --conf_thres 0.4 --data_dir ../data/MOT
-
 ```
 
-## Results on BI-V100
-
-| GPUs | FPS   | MOTA |
-|------|-------| ------------ |
-| 1x8  | 28.5 | 69.8         |
+## Model Results
 
+| Model   | GPU        | FPS  | MOTA |
+|---------|------------|------|------|
+| FairMOT | BI-V100 x8 | 28.5 | 69.8 |
 
 | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power（W） | Scalability | Memory utilization（G） | Stability |
 |----------------------|------------------------------------------|-------------|----------|------------|-------------|-------------------------|-----------|
 | MOTA:69.8            | SDK V2.2,bs:64,8x,fp32                   | 52          | 69.8     | 132\*8     | 0.97        | 19.1\*8                 | 1         |
 
+## References
 
-## Reference
-https://github.com/ifzhang/FairMOT
+- [FairMOT](https://github.com/ifzhang/FairMOT)
-- 
Gitee


From 18450dea0c87407748a6df096bbac5c7e54ba402 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Mon, 10 Mar 2025 16:32:50 +0800
Subject: [PATCH 4/9] unify model readme format - cv/instance_segmentation

---
 cv/instance_segmentation/README.md            |   1 -
 .../solo/pytorch/README.md                    |  72 ++++++-----
 .../solov2/pytorch/README.md                  |  75 ++++++------
 .../yolact/pytorch/README.md                  | 112 ++++++++++--------
 4 files changed, 146 insertions(+), 114 deletions(-)
 delete mode 100644 cv/instance_segmentation/README.md

diff --git a/cv/instance_segmentation/README.md b/cv/instance_segmentation/README.md
deleted file mode 100644
index 8f98e41f5..000000000
--- a/cv/instance_segmentation/README.md
+++ /dev/null
@@ -1 +0,0 @@
-# Instance Segmentation
diff --git a/cv/instance_segmentation/solo/pytorch/README.md b/cv/instance_segmentation/solo/pytorch/README.md
index e28f4b851..237fd3b69 100644
--- a/cv/instance_segmentation/solo/pytorch/README.md
+++ b/cv/instance_segmentation/solo/pytorch/README.md
@@ -1,37 +1,26 @@
 # SOLO
 
-## Model description
+## Model Description
 
-We present a new, embarrassingly simple approach to instance segmentation in images. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the 'detect-thensegment' strategy as used by Mask R-CNN, or predict category masks first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location and size, thus nicely converting instance mask segmentation into a classification-solvable problem. Now instance segmentation is decomposed into two classification tasks. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent singleshot instance segmenters in accuracy. We hope that this very simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation.
+SOLO (Segmenting Objects by Locations) is an innovative instance segmentation model that simplifies the task by
+introducing "instance categories". It converts instance segmentation into a classification problem by assigning
+categories to each pixel based on an object's location and size. Unlike traditional methods, SOLO directly predicts
+instance masks without complex post-processing or region proposals. This approach achieves competitive accuracy with
+Mask R-CNN while offering a simpler and more flexible framework for instance-level recognition tasks.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
-
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-
-# Prepare resnet50-0676ba61.pth, skip this if fast network
-mkdir -p /root/.cache/torch/hub/checkpoints/
-wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 ```bash
-$ mkdir -p data/coco
-$ cd data/coco
+mkdir -p data/coco
+cd data/coco
 ```
+
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -52,23 +41,42 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
 
-### One single GPU
 ```bash
-python3 tools/train.py configs/solo/solo_r50_fpn_1x_coco.py
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+
+# Prepare resnet50-0676ba61.pth, skip this if fast network
+mkdir -p /root/.cache/torch/hub/checkpoints/
+wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
 ```
 
-### Multiple GPUs on one machine
+## Model Training
 
 ```bash
+# One single GPU
+python3 tools/train.py configs/solo/solo_r50_fpn_1x_coco.py
+
+# Multiple GPUs on one machine
 sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/solo/solo_r50_fpn_1x_coco.py 8
 ```
 
-## Results on BI-V100
+## Model Results
+
+| Model | GPU        | mAP(0.5:0.95) |
+|-------|------------|---------------|
+| SOLO  | BI-V100 x8 | 0.361         |
 
-Average Precision (AP) @[ loU=0.50:0.95 | area= all | maxDets=1001 ] = 0.361
+## References
 
-## Reference
-[mmdetection](https://github.com/open-mmlab/mmdetection)
+- [mmdetection](https://github.com/open-mmlab/mmdetection)
diff --git a/cv/instance_segmentation/solov2/pytorch/README.md b/cv/instance_segmentation/solov2/pytorch/README.md
index ccc8e3a32..01939217a 100644
--- a/cv/instance_segmentation/solov2/pytorch/README.md
+++ b/cv/instance_segmentation/solov2/pytorch/README.md
@@ -1,36 +1,22 @@
 # SOLOV2
 
-## Model description
+## Model Description
 
-In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.
+SOLOv2 is an enhanced instance segmentation model that builds upon SOLO's approach by dynamically learning mask heads
+conditioned on object locations. It decouples mask prediction into kernel and feature branches for improved efficiency.
+SOLOv2 introduces Matrix NMS, a faster non-maximum suppression technique that processes masks in parallel. This
+architecture achieves state-of-the-art performance in both speed and accuracy, with a lightweight version running at
+31.3 FPS. It serves as a strong baseline for various instance-level recognition tasks beyond segmentation.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+### Prepare Resources
 
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-# Prepare resnet50-0676ba61.pth, skip this if fast network
-mkdir -p /root/.cache/torch/hub/checkpoints/
-wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
-
-# Install others
-pip3 install yapf==0.31.0 urllib3==1.26.18
-```
-
-## Step 2: Preparing datasets
-
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -51,7 +37,29 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+
+# Prepare resnet50-0676ba61.pth, skip this if fast network
+mkdir -p /root/.cache/torch/hub/checkpoints/
+wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
+
+# Install others
+pip3 install yapf==0.31.0 urllib3==1.26.18
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -68,11 +76,12 @@ python3 tools/train.py configs/solov2/solov2_r50_fpn_1x_coco.py
 bash tools/dist_train.sh configs/solov2/solov2_r50_fpn_1x_coco.py 8
 ```
 
-## Results
+## Model Results
+
+| Model  | GPUs       | FPS            |
+|--------|------------|----------------|
+| SOLOV2 | BI-V100 x8 | 21.26 images/s |
 
-|    GPUs    | FPS |
-| ---------- | --------- |
-| BI-V100 x8 | 21.26 images/s |
+## References
 
-## Reference
-[mmdetection](https://github.com/open-mmlab/mmdetection)
+- [mmdetection](https://github.com/open-mmlab/mmdetection)
diff --git a/cv/instance_segmentation/yolact/pytorch/README.md b/cv/instance_segmentation/yolact/pytorch/README.md
index 0506ddaf6..c06cee1a5 100644
--- a/cv/instance_segmentation/yolact/pytorch/README.md
+++ b/cv/instance_segmentation/yolact/pytorch/README.md
@@ -1,24 +1,16 @@
 # YOLACT
 
-## Model description
-A simple, fully convolutional model for real-time instance segmentation. This is the code for papers:
- - [YOLACT: Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689)
- - [YOLACT++: Better Real-time Instance Segmentation](https://arxiv.org/abs/1912.06218)
+## Model Description
 
-## Step 1: Installing packages
-```
-# Cython needs to be installed before pycocotools
-pip3 install cython
-pip3 install opencv-python pillow pycocotools matplotlib 
-```
+YOLACT (You Only Look At Coefficients) is a real-time instance segmentation model that separates mask prediction from
+object detection. It generates prototype masks and prediction coefficients independently, then combines them to produce
+instance masks. This approach enables fast processing while maintaining competitive accuracy. YOLACT++ further enhances
+performance with deformable convolutions and optimized prediction heads. The model achieves real-time speeds on single
+GPUs, making it suitable for applications requiring quick instance segmentation in video streams or interactive systems.
 
-If you want to use YOLACT++, compile deformable convolutional layers (from [DCNv2](https://github.com/CharlesShang/DCNv2/tree/pytorch_1.0)).
-   ```Shell
-   cd external/DCNv2
-   python3 setup.py build develop
-   ```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
@@ -43,25 +35,45 @@ coco2017
 └── ...
 ```
 
-Modify the configuration file(data/config.py)
+### Install Dependencies
+
+```shell
+# Cython needs to be installed before pycocotools
+pip3 install cython
+pip3 install opencv-python pillow pycocotools matplotlib 
 ```
-$ vim data/config.py
-$ # 'train_images': the path of train images
-$ # 'train_info': the path of train_info
-$ # 'valid_images': the path of valid images
-$ # 'valid_info': the path of valid_info
+
+If you want to use YOLACT++, compile deformable convolutional layers (from
+[DCNv2](https://github.com/CharlesShang/DCNv2/tree/pytorch_1.0)).
+
+```shell
+cd external/DCNv2
+python3 setup.py build develop
 ```
 
-# Training
+Modify the configuration file(data/config.py).
+
+```shell
+vim data/config.py
+# 'train_images': the path of train images
+# 'train_info': the path of train_info
+# 'valid_images': the path of valid images
+# 'valid_info': the path of valid_info
+```
+
+## Model Training
+
 By default, we train on COCO. Make sure to download the entire dataset using the commands above.
- - To train, grab an imagenet-pretrained model and put it in `./weights`.
-   - For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing).
-   - For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing).
-   - For Darknet53, download `darknet53.pth` from [here](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view?usp=sharing).
- - Run one of the training commands below.
-   - Note that you can press ctrl+c while training and it will save an `*_interrupt.pth` file at the current iteration.
-   - All weights are saved in the `./weights` directory by default with the file name `<config>_<epoch>_<iter>.pth`.
-```Shell
+
+- To train, grab an imagenet-pretrained model and put it in `./weights`.
+  - For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing).
+  - For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing).
+  - For Darknet53, download `darknet53.pth` from [here](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view?usp=sharing).
+- Run one of the training commands below.
+  - Note that you can press ctrl+c while training and it will save an `*_interrupt.pth` file at the current iteration.
+  - All weights are saved in the `./weights` directory by default with the file name `<config>_<epoch>_<iter>.pth`.
+
+```shell
 # Trains using the base config with a batch size of 8 (the default).
 python3 train.py --config=yolact_base_config
 
@@ -75,29 +87,33 @@ python3 train.py --config=yolact_base_config --resume=weights/yolact_base_10_321
 python3 train.py --help
 ```
 
-## Multi-GPU Support
 YOLACT now supports multiple GPUs seamlessly during training:
 
- - Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]`
-   - Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).
-   - You should still do this if only using 1 GPU.
-   - You can check the indices of your GPUs with `nvidia-smi`.
- - Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values.
-   - If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.
-   - If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`.
- - The learning rate should divide the number of gpus.
+- Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]`
+  - Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).
+  - You should still do this if only using 1 GPU.
+  - You can check the indices of your GPUs with `nvidia-smi`.
+- Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values.
+  - If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.
+  - If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where
+    [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`.
+- The learning rate should divide the number of gpus.
 
 For example: use 8 GPUs to train.
-```
+
+```shell
+# Multi-GPU Support
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python3 train.py --config=yolact_base_config --batch_size 64 --lr 0.000125
 ```
-## Results
 
-|       |  all  |  .50  |  .55  |  .60  |  .65  |  .70  |  .75  |  .80  |  .85  |  .90  |  .95  |
-|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
-|   box | 11.64 | 23.13 | 21.51 | 19.36 | 16.98 | 14.19 | 10.93 |  6.83 |  2.83 |  0.61 |  0.05 |
-|  mask | 11.20 | 20.51 | 19.07 | 17.51 | 15.52 | 13.23 | 10.74 |  8.10 |  5.13 |  2.06 |  0.13 |
+## Model Results
+
+ | Model  | Result | all   | .50   | .55   | .60   | .65   | .70   | .75   | .80  | .85  | .90  | .95  |
+ |--------|--------|-------|-------|-------|-------|-------|-------|-------|------|------|------|------|
+ | YOLACT | box    | 11.64 | 23.13 | 21.51 | 19.36 | 16.98 | 14.19 | 10.93 | 6.83 | 2.83 | 0.61 | 0.05 |
+ | mask   | 11.20  | 20.51 | 19.07 | 17.51 | 15.52 | 13.23 | 10.74 | 8.10  | 5.13 | 2.06 | 0.13 | 0.13 |
+
+## References
 
-## Reference
-https://github.com/dbolya/yolact
+- [yolact](https://github.com/dbolya/yolact)
-- 
Gitee


From ddf337911c074c2653cfb493cf02089b183ea924 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Tue, 11 Mar 2025 09:52:41 +0800
Subject: [PATCH 5/9] unify model readme format - cv/image_generation

---
 cv/image_generation/README.md                 |  1 -
 cv/image_generation/dcgan/mindspore/README.md | 53 ++++++++++--------
 .../pix2pix/paddlepaddle/README.md            | 55 +++++++++++--------
 3 files changed, 61 insertions(+), 48 deletions(-)
 delete mode 100644 cv/image_generation/README.md

diff --git a/cv/image_generation/README.md b/cv/image_generation/README.md
deleted file mode 100644
index dd3dc40b4..000000000
--- a/cv/image_generation/README.md
+++ /dev/null
@@ -1 +0,0 @@
-# Image Generation
diff --git a/cv/image_generation/dcgan/mindspore/README.md b/cv/image_generation/dcgan/mindspore/README.md
index 776865ad3..e5c2825da 100644
--- a/cv/image_generation/dcgan/mindspore/README.md
+++ b/cv/image_generation/dcgan/mindspore/README.md
@@ -1,25 +1,24 @@
 # DCGAN
 
-## Model description
+## Model Description
 
-The deep convolutional generative adversarial networks (DCGANs) first introduced CNN into the GAN structure, and the strong feature extraction ability of convolution layer was used to improve the generation effect of GAN.
+The deep convolutional generative adversarial networks (DCGANs) first introduced CNN into the GAN structure, and the
+strong feature extraction ability of convolution layer was used to improve the generation effect of GAN.
 
-[Paper](https://arxiv.org/pdf/1511.06434.pdf): Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks[J]. Computer ence, 2015.
-## Step 1: Installing
-```
-pip3 install -r requirements.txt
-```
-## Step 2: Prepare Datasets
+## Model Preparation
+
+### Prepare Resources
 
 Train DCGAN Dataset used: [Imagenet-1k](http://www.image-net.org/index)
 
 - Dataset size: ~125G, 224*224 colorful images in 1000 classes
-    - Train: 120G, 1281167 images
-    - Test: 5G, 50000 images
+  - Train: 120G, 1281167 images
+  - Test: 5G, 50000 images
 - Data format: RGB images.
-    - Note: Data will be processed in src/dataset.py
+  - Note: Data will be processed in src/dataset.py
 
-Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process.
+Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to
+download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process.
 
 The ImageNet dataset path structure should look like:
 
@@ -37,28 +36,34 @@ imagenet
 └── val_list.txt
 ```
 
-## Step 3: Training
-### On single GPU 
+### Install Dependencies
+
 ```bash
-python3 train.py --device_id=2 --data_url=/path/to/imagenet/train --train_url=./ --device_target=GPU
+pip3 install -r requirements.txt
 ```
-### [Evaluation]
+
+## Model Training
 
 ```bash
+# On single GPU
+python3 train.py --device_id=2 --data_url=/path/to/imagenet/train --train_url=./ --device_target=GPU
+
+# Evaluation
 python3 -u eval.py --device_id=$DEVICE_ID --img_url=$PATH1 --ckpt_url=$PATH2 --device_target=GPU
 ```
 
-### [Evaluation result]
-### 单卡性能数据：BI-V100
-![image](image2022-9-14_10-39-29.png)
-![image](image2022-9-14_10-41-12.png)
-### 单卡性能数据：NV-V100S 
-![image](image2022-9-13_13-5-52.png)
-![image](image2022-9-13_13-12-42.png)
-
+## Model Results
 
+One BI-V100 GPU
 
+![image](image2022-9-14_10-39-29.png)
+![image](image2022-9-14_10-41-12.png)
 
+One NV-V100S GPU
 
+![image](image2022-9-13_13-5-52.png)
+![image](image2022-9-13_13-12-42.png)
 
+## References
 
+- [Paper](https://arxiv.org/pdf/1511.06434.pdf)
diff --git a/cv/image_generation/pix2pix/paddlepaddle/README.md b/cv/image_generation/pix2pix/paddlepaddle/README.md
index 7d337174e..684ed993a 100755
--- a/cv/image_generation/pix2pix/paddlepaddle/README.md
+++ b/cv/image_generation/pix2pix/paddlepaddle/README.md
@@ -1,20 +1,16 @@
 # Pix2Pix
-## Model description
-Pix2Pix uses paired images for image translation, which has two different styles of the same image as input, can be used for style transfer. Pix2pix is encouraged by cGAN, cGAN inputs a noisy image and a condition as the supervision information to the generation network, Pix2pix uses another style of image as the supervision information input into the generation network, so the fake image is related to another style of image which is input as supervision information, thus realizing the process of image translation.
 
-## Step 1: Installation
-```bash
-git clone https://github.com/PaddlePaddle/PaddleGAN.git
-```
+## Model Description
 
-```bash
-cd PaddleGAN
-pip3 install -r requirements.txt
-pip3 install urllib3==1.26.6
-yum install mesa-libGL -y
-```
+Pix2Pix uses paired images for image translation, which has two different styles of the same image as input, can be used
+for style transfer. Pix2pix is encouraged by cGAN, cGAN inputs a noisy image and a condition as the supervision
+information to the generation network, Pix2pix uses another style of image as the supervision information input into the
+generation network, so the fake image is related to another style of image which is input as supervision information,
+thus realizing the process of image translation.
 
-## Step 2: Preparing datasets
+## Model Preparation
+
+### Prepare Resources
 
 Datasets used by Pix2Pix can be downloaded from [here](http://efrosgans.eecs.berkeley.edu/pix2pix/datasets/).
 
@@ -31,7 +27,20 @@ facades
     └── val
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+git clone https://github.com/PaddlePaddle/PaddleGAN.git
+```
+
+```bash
+cd PaddleGAN
+pip3 install -r requirements.txt
+pip3 install urllib3==1.26.6
+yum install mesa-libGL -y
+```
+
+## Model Training
 
 ```bash
 # move facades dataset to data/ 
@@ -41,21 +50,21 @@ mv facades/ data/
 python3 -u tools/main.py --config-file configs/pix2pix_facades.yaml
 ```
 
-## Step 4: Evaluation
-
 ```bash
+# Evaluation
 python3 tools/main.py --config-file configs/pix2pix_facades.yaml --evaluate-only --load ${PATH_OF_WEIGHT}
 ```
 
-## Results
-|GPUs|Metric FID|FPS|
-|:---:|:---:|:---:|
-|BI-V100|120.5818|16.12240|
+## Model Results
+
+| Model   | GPU     | Metric FID | FPS      |
+|---------|---------|------------|----------|
+| Pix2Pix | BI-V100 | 120.5818   | 16.12240 |
 
 The generated images at epoch 200 is shown below:
 
-<img src = 'results.png'>
+![results](results.png)
 
+## References
 
-## Reference
-- [PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN) 
+- [PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN)
-- 
Gitee


From 666ad091c4be77901e310329687e444436480311 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Tue, 11 Mar 2025 10:12:02 +0800
Subject: [PATCH 6/9] unify model readme format - cv/gnn

---
 cv/gnn/gat/paddlepaddle/README.md       | 38 +++++++++++---------
 cv/gnn/gcn/mindspore/README.md          | 47 +++++++++++--------------
 cv/gnn/gcn/paddlepaddle/README.md       | 43 +++++++++++-----------
 cv/gnn/graphsage/paddlepaddle/README.md | 37 ++++++++++---------
 4 files changed, 82 insertions(+), 83 deletions(-)

diff --git a/cv/gnn/gat/paddlepaddle/README.md b/cv/gnn/gat/paddlepaddle/README.md
index 6caf3844a..6429faf2a 100644
--- a/cv/gnn/gat/paddlepaddle/README.md
+++ b/cv/gnn/gat/paddlepaddle/README.md
@@ -1,34 +1,40 @@
-# GAT (Graph Attention Networks)
+# GAT
 
-[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on
-graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based
-on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of
-indicators as the paper in citation network benchmarks.
+## Model Description
 
-## Step 1: Installation
+GAT (Graph Attention Network) is a novel neural network architecture for graph-structured data that uses self-attention
+mechanisms to process node features. Unlike traditional graph convolutional networks, GAT assigns different importance
+to neighboring nodes through attention coefficients, allowing for more flexible and expressive feature aggregation. This
+approach enables the model to handle varying neighborhood sizes and capture complex relationships in graph data, making
+it particularly effective for tasks like node classification and graph-based prediction problems.
+
+## Model Preparation
+
+### Prepare Resources
+
+There's no need to prepare dastasets. The details for datasets can be found in the
+[paper](https://arxiv.org/abs/1609.02907).
+
+### Install Dependencies
 
 ```bash
 git clone -b 2.2.5 https://github.com/PaddlePaddle/PGL
 pip3 install pgl==2.2.5
 ```
 
-## Step 2: Preparing datasets
-
-There's no need to prepare dastasets. The details for datasets can be found in the [paper](https://arxiv.org/abs/1609.02907).
-
-## Step 3: Training
+## Model Training
 
 ```bash
 cd PGL/examples/gat/
 CUDA_VISIBLE_DEVICES=0 python3 train.py --dataset cora
 ```
 
-## Results
+## Model Results
 
-| GPUs | Accuracy | FPS |
-| --- | --- | --- |
-| BI-V100 | 83.16% | 65.56 it/s |
+| Model | GPU     | Accuracy | FPS        |
+|-------|---------|----------|------------|
+| GAT   | BI-V100 | 83.16%   | 65.56 it/s |
 
-## Reference
+## References
 
 - [PGL](https://github.com/PaddlePaddle/PGL)
diff --git a/cv/gnn/gcn/mindspore/README.md b/cv/gnn/gcn/mindspore/README.md
index a15ece1f9..8e48864cd 100755
--- a/cv/gnn/gcn/mindspore/README.md
+++ b/cv/gnn/gcn/mindspore/README.md
@@ -1,23 +1,15 @@
 # GCN
 
-## Model description
+## Model Description
 
-GCN(Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured
+GCN (Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured
 data. A scalable approach based on an efficient variant of convolutional neural networks which operate directly on
 graphs was presented. The model scales linearly in the number of graph edges and learns hidden layer representations
 that encode both local graph structure and features of nodes.
 
-[Paper](https://arxiv.org/abs/1609.02907): Thomas N. Kipf, Max Welling. 2016. Semi-Supervised Classification with Graph
-Convolutional Networks. In ICLR 2016.
+## Model Preparation
 
-## Step 1: Installing
-
-```sh
-pip3 install -r requirements.txt
-pip3 install easydict
-```
-
-## Step 2: Prepare Datasets
+### Prepare Resources
 
 Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant
 domain/network architecture. In the following sections, we will introduce how to run the scripts using the related
@@ -28,31 +20,32 @@ dataset below.
 | Cora     | Citation network | 2708  | 5429  | 7       | 1433     | 0.052      |
 | Citeseer | Citation network | 3327  | 4732  | 6       | 3703     | 0.036      |
 
-## Step 3: Training
+### Install Dependencies
 
 ```sh
-cd scripts 
-bash train_gcn_1p.sh
+pip3 install -r requirements.txt
+pip3 install easydict
 ```
 
-## Evaluation
+## Model Training
 
 ```sh
-cd ..
+cd scripts/
+bash train_gcn_1p.sh
+
+# Evaluation
+cd ../
 python3 eval.py --data_dir=scripts/data_mr/cora --device_target="GPU" \
                 --model_ckpt scripts/train/ckpt/ckpt_gcn-200_1.ckpt &> eval.log &
 ```
 
-## Evaluation result
-
-### Results on BI-V100
+## Model Results
 
-| GPUs | per step time  |  Acc  |
-|------|--------------  |-------|
-|   1  |   4.454        | 0.8711|
+| Model | GPU         | per step time | Acc    |
+|-------|-------------|---------------|--------|
+| GCN   | BI-V100 x1  | 4.454         | 0.8711 |
+| GCN   | NV-V100s x1 | 5.278         | 0.8659 |
 
-### Results on NV-V100s
+## References
 
-| GPUs | per step time  |  Acc  |
-|------|--------------  |-------|
-|   1  |   5.278        | 0.8659|
+- [Paper](https://arxiv.org/abs/1609.02907)
diff --git a/cv/gnn/gcn/paddlepaddle/README.md b/cv/gnn/gcn/paddlepaddle/README.md
index a36c0901c..63cb50c18 100644
--- a/cv/gnn/gcn/paddlepaddle/README.md
+++ b/cv/gnn/gcn/paddlepaddle/README.md
@@ -1,16 +1,21 @@
 # GCN
 
-## Model description
+## Model Description
 
-GCN(Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured
+GCN (Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured
 data. A scalable approach based on an efficient variant of convolutional neural networks which operate directly on
 graphs was presented. The model scales linearly in the number of graph edges and learns hidden layer representations
 that encode both local graph structure and features of nodes.
 
-[Paper](https://gitee.com/link?target=https%3A%2F%2Farxiv.org%2Fabs%2F1609.02907): Thomas N. Kipf, Max Welling. 2016.
-Semi-Supervised Classification with Graph Convolutional Networks. In ICLR 2016.
+## Model Preparation
 
-## Step 1:Installation
+### Prepare Resources
+
+Datasets are called in the code.
+
+The datasets contain three citation networks: CORA, PUBMED, CITESEER.
+
+### Install Dependencies
 
 ```sh
 # Clone PGL repository
@@ -24,32 +29,24 @@ pip3 install urllib3==1.23
 pip3 install networkx
 ```
 
-## Step 2:Preparing datasets
-
-Datasets are called in the code.
-
-The datasets  contain three citation networks: CORA, PUBMED, CITESEER.
-
-## Step 3:Training
+## Model Training
 
 ```sh
 cd PGL/examples/gcn/
 
-# Run on CPU
-python3 train.py --dataset cora
-
 # Run on GPU
 CUDA_VISIBLE_DEVICES=0 python3 train.py --dataset cora
 ```
 
-## Results
+## Model Results
 
-| GPUS      | Datasets | speed  | Accurary |
-|-----------|----------|--------|----------|
-| BI V100×1 | CORA     | 0.0064 | 80.3%    |
-| BI V100×1 | PUBMED   | 0.0076 | 79.0%    |
-| BI V100×1 | CITESEER | 0.0085 | 70.6%    |
+ | Model | GPU        | Datasets | speed  | Accurary |
+ |-------|------------|----------|--------|----------|
+ | GCN   | BI-V100 ×1 | CORA     | 0.0064 | 80.3%    |
+ | GCN   | BI-V100 ×1 | PUBMED   | 0.0076 | 79.0%    |
+ | GCN   | BI-V100 ×1 | CITESEER | 0.0085 | 70.6%    |
 
-## Reference
+## References
 
-[PGL/GCN](https://github.com/PaddlePaddle/PGL/tree/main/examples/gcn)
+- [PGL](https://github.com/PaddlePaddle/PGL/tree/main/examples/gcn)
+- [Paper](https://arxiv.org/abs/1609.02907)
diff --git a/cv/gnn/graphsage/paddlepaddle/README.md b/cv/gnn/graphsage/paddlepaddle/README.md
index ff1851654..fe7a86ba6 100644
--- a/cv/gnn/graphsage/paddlepaddle/README.md
+++ b/cv/gnn/graphsage/paddlepaddle/README.md
@@ -1,21 +1,16 @@
-# GraphSAGE (Inductive Representation Learning on Large Graphs)
+# GraphSAGE
 
-[GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that
-leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen
-data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by
-sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and
-reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and
-training in PGL.
+## Model Description
 
-## Step 1: Installation
+GraphSAGE (Graph Sample and Aggregated) is an inductive graph neural network model designed for large-scale graph data.
+Unlike traditional methods that learn fixed node embeddings, GraphSAGE learns a function to generate embeddings by
+sampling and aggregating features from a node's local neighborhood. This approach enables the model to generalize to
+unseen nodes and graphs, making it particularly effective for dynamic graphs and large-scale applications like social
+network analysis and recommendation systems.
 
-```sh
-git clone -b 2.2.5 https://github.com/PaddlePaddle/PGL
-pip3 install scikit-learn
-pip3 install pgl==2.2.5
-```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
 The reddit dataset should be downloaded from the following links and placed in the directory ```pgl.data```. The details
 for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf).
@@ -28,7 +23,15 @@ for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/
 ln -s /path/to/reddit/ /usr/local/lib/python3.7/site-packages/pgl/data/
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```sh
+git clone -b 2.2.5 https://github.com/PaddlePaddle/PGL
+pip3 install scikit-learn
+pip3 install pgl==2.2.5
+```
+
+## Model Training
 
 To  train a GraphSAGE model on Reddit Dataset, you can just run:
 
@@ -38,12 +41,12 @@ cd PGL/examples/graphsage/cpu_sample_version
 CUDA_VISIBLE_DEVICES=0 python3 train.py  --epoch 10  --normalize --symmetry
 ```
 
-## Results
+## Model Results
 
 | GPUs       | Accuracy | FPS        |
 |------------|----------|------------|
 | BI-V100 x1 | 0.9072   | 47.54 s/it |
 
-## Reference
+## References
 
 - [PGL](https://github.com/PaddlePaddle/pgl)
-- 
Gitee


From 8556a9ef075b58cb937426b2a8a6ac3639033906 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Wed, 12 Mar 2025 16:40:53 +0800
Subject: [PATCH 7/9] unify model readme format - cv/face

---
 .../retinaface/pytorch/README.md              | 45 ++++----------
 cv/face_recognition/arcface/pytorch/README.md | 34 ++++++-----
 .../blazeface/paddlepaddle/README.md          | 54 +++++++++--------
 cv/face_recognition/cosface/pytorch/README.md | 30 +++++-----
 cv/face_recognition/facenet/pytorch/README.md | 58 +++++++++----------
 .../facenet/tensorflow/README.md              | 58 +++++++++----------
 6 files changed, 129 insertions(+), 150 deletions(-)

diff --git a/cv/face_detection/retinaface/pytorch/README.md b/cv/face_detection/retinaface/pytorch/README.md
index 36267d792..e220ac6f4 100644
--- a/cv/face_detection/retinaface/pytorch/README.md
+++ b/cv/face_detection/retinaface/pytorch/README.md
@@ -1,6 +1,6 @@
-# RetinaFace: Single-stage Dense Face Localisation in the Wild
+# RetinaFace
 
-## Model description
+## Model Description
 
 Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the
 wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which
@@ -14,11 +14,9 @@ On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to
 verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on
 a single CPU core for a VGA-resolution image.
 
-## Prepare
+## Model Preparation
 
-### Install packages
-
-### Download dataset
+### Prepare Resources
 
 1. Download the [WIDER FACE](http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html) dataset.
 
@@ -53,48 +51,29 @@ a single CPU core for a VGA-resolution image.
 
 ```
 
-## Training
-
-### Single GPU with mobilenet backbone
+## Model Training
 
 ```shell
-
+# Single GPU with mobilenet backbone
 python3 train.py --network mobile0.25
 
-```
-
-### Multi GPU with resnet50 backbone
-
-```shell
-
+# Multi GPU with resnet50 backbone
 python3 train.py --network resnet50
 
-```
-
-## Evaluate
-
-1. Generate txt file
-
-```Shell
-
+# Evaluate
+## 1. Generate txt file
 python3 test_widerface.py --trained_model ${weight_file} --network mobile0.25 or resnet50
 
-```
-
-2. Evaluate txt results. Demo come from [Here](https://github.com/wondervictor/WiderFace-Evaluation)
-
-```Shell
-
+## 2. Evaluate txt results. Demo come from [Here](https://github.com/wondervictor/WiderFace-Evaluation)
 cd ./widerface_evaluate
 python3 setup.py build_ext --inplace
 python3 evaluation.py
-
 ```
 
-3. You can also use widerface official Matlab evaluate demo in
+You can also use widerface official Matlab evaluate demo in
    [Here](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)
 
-## Results
+## Model Results
 
 | Model      | GPU     | Type       | AP                 |
 |------------|---------|------------|--------------------|
diff --git a/cv/face_recognition/arcface/pytorch/README.md b/cv/face_recognition/arcface/pytorch/README.md
index f478e2e60..d11fd055c 100644
--- a/cv/face_recognition/arcface/pytorch/README.md
+++ b/cv/face_recognition/arcface/pytorch/README.md
@@ -1,23 +1,15 @@
 # ArcFace
 
-## Model description
+## Model Description
 
 This repo is a pytorch implement of ArcFace, which propose an Additive Angular Margin Loss to obtain highly
 discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to the exact
 correspondence to the geodesic distance on the hypersphere. ArcFace consistently outperforms the state-of-the-art and
 can be easily implemented with negligible computational overhead
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-pip3 install -r requirements.txt
-wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
-tar xvf zlib-1.2.9.tar.gz
-cd zlib-1.2.9/
-./configure && make install
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'.
 
@@ -32,18 +24,28 @@ ln -s ${your_path_to_face} lfw_pair.txt
 python3 txt_annotation.py
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+pip3 install -r requirements.txt
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+```
+
+## Model Training
 
 ```bash
 bash run.sh $GPUS
 ```
 
-## Results
+## Model Results
 
-|   Model |    FPS | LFW_Accuracy     |
-|---------|--------| -----------------|
+| Model   | FPS    | LFW_Accuracy     |
+|---------|--------|------------------|
 | ArcFace | 38.272 | 0.99000+-0.00615 |
 
-## Reference
+## References
 
 - [arcface-pytorch](https://github.com/bubbliiiing/arcface-pytorch)
diff --git a/cv/face_recognition/blazeface/paddlepaddle/README.md b/cv/face_recognition/blazeface/paddlepaddle/README.md
index b5b510c84..60772cbb5 100644
--- a/cv/face_recognition/blazeface/paddlepaddle/README.md
+++ b/cv/face_recognition/blazeface/paddlepaddle/README.md
@@ -1,28 +1,13 @@
 # BlazeFace
 
-## Model description
+## Model Description
 
 BlazeFace is Google Research published face detection model. It's lightweight but good performance, and tailored for
 mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices.
 
-## Step 1: Installing
+## Model Preparation
 
-```bash
-git clone https://github.com/PaddlePaddle/PaddleDetection.git
-```
-
-```bash
-cd PaddleDetection
-yum install mesa-libGL -y
-
-pip3 install -r requirements.txt
-pip3 install protobuf==3.20.1
-pip3 install urllib3==1.26.6
-pip3 install IPython
-pip3 install install numba==0.56.4
-```
-
-## Step 2: Prepare Datasets
+### Prepare Resources
 
 We use WIDER-FACE dataset for training and model tests, the official web site provides detailed data is introduced.
 
@@ -59,19 +44,32 @@ dataset/wider_face/
 cd dataset/wider_face && ./download_wider_face.sh
 ```
 
-## Step 3: Training
+### Install Dependencies
 
 ```bash
-cd PaddleDetection
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+```
 
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml
+```bash
+cd PaddleDetection
+yum install mesa-libGL -y
 
+pip3 install -r requirements.txt
+pip3 install protobuf==3.20.1
+pip3 install urllib3==1.26.6
+pip3 install IPython
+pip3 install install numba==0.56.4
 ```
 
-## Step 4: Evaluation
+## Model Training
 
 ```bash
+cd PaddleDetection
+
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml
+
+# Evaluation
 python3 -u tools/eval.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml \
        -o weights=output/blazeface_fpn_ssh_1000e/model_final.pdopt \
        multi_scale=True
@@ -85,12 +83,12 @@ python3 setup.py build_ext --inplace
 python3 evaluation.py -p ../output/pred/ -g ../eval_tools/ground_truth
 ```
 
-## Step 5: result
+## Model Results
 
-| GPUs        | Network structure | Easy/Medium/Hard Set  | Ips
-|-------------|-----------|-----------|---—------|
-| BI-V100 x 8 |    BlazeFace-FPN-SSH   |  0.886/0.860/0.753 |  8.6813      |
+ | Model             | GPUs       | Easy/Medium/Hard Set | Ips    |
+ |-------------------|------------|----------------------|--------|
+ | BlazeFace-FPN-SSH | BI-V100 x8 | 0.886/0.860/0.753    | 8.6813 |
 
-## Reference
+## References
 
 - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
diff --git a/cv/face_recognition/cosface/pytorch/README.md b/cv/face_recognition/cosface/pytorch/README.md
index e2076e8e8..3ac73556b 100644
--- a/cv/face_recognition/cosface/pytorch/README.md
+++ b/cv/face_recognition/cosface/pytorch/README.md
@@ -1,22 +1,14 @@
 # CosFace
 
-## Model description
+## Model Description
 
 CosFace is a face recognition model that achieves state-of-the-art results by introducing a cosine margin penalty in the
 loss function when training the neural network, which learns highly discriminative facial embeddings by maximizing
 inter-class differences and minimizing intra-class variations.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# install zlib
-wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
-tar xvf zlib-1.2.9.tar.gz
-cd zlib-1.2.9/
-./configure && make install
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'.
 
@@ -31,7 +23,17 @@ ln -s ${your_path_to_face} lfw_pair.txt .
 python3 txt_annotation.py
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# install zlib
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+```
+
+## Model Training
 
 ```bash
 # 1 GPU
@@ -41,12 +43,12 @@ bash train.sh 0
 bash train.sh 0,1,2,3,4,5,6,7
 ```
 
-## Results
+## Model Results
 
 | Model   | FPS     | LFW_Accuracy |
 |---------|---------|--------------|
 | Cosface | 5492.08 | 0.9865       |
 
-## Reference
+## References
 
 - [CosFace_pytorch](https://github.com/MuggleWang/CosFace_pytorch)
diff --git a/cv/face_recognition/facenet/pytorch/README.md b/cv/face_recognition/facenet/pytorch/README.md
index 3cd9d1f50..9d0d66112 100644
--- a/cv/face_recognition/facenet/pytorch/README.md
+++ b/cv/face_recognition/facenet/pytorch/README.md
@@ -2,15 +2,29 @@
 
 ## Model Description
 
-This is a facenet-pytorch library that can be used to train your own face recognition model.
-> Despite significant recent advances in the field of face recognition, implementing face verification and recognition
-efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called
-FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly
-correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition,
-verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature
-vectors.
+Facenet is a deep learning model for face recognition that directly maps face images to a compact Euclidean space, where
+distances correspond to face similarity. It uses a triplet loss function to ensure that faces of the same person are
+closer together than those of different individuals. Facenet excels in tasks like face verification, recognition, and
+clustering, offering high accuracy and efficiency. Its compact embeddings make it scalable for large-scale applications
+in security and identity verification.
 
-## Step 1: Installation
+## Model Preparation
+
+### Prepare Resources
+
+You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'.
+
+```bash
+# download
+ln -s ${your_path_to_face} datasets/
+ln -s ${your_path_to_face} lfw/
+ln -s ${your_path_to_face} lfw_pair.txt
+
+# preprocess
+python3 txt_annotation.py
+```
+
+### Install Dependencies
 
 ```bash
 pip3 install -r requirements.txt
@@ -26,33 +40,19 @@ pip3 install matplotlib
 pip3 install scikit-learn
 ```
 
-## Step 2: Preparing datasets
-
-You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'.
-
-```bash
-# download
-ln -s ${your_path_to_face} datasets/
-ln -s ${your_path_to_face} lfw/
-ln -s ${your_path_to_face} lfw_pair.txt
-
-# preprocess
-python3 txt_annotation.py
-```
-
-## Step 3: Training
+### Model Training
 
 ```bash
 python3 train.py
 ```
 
-## Results
+## Model Results
 
-|   Model |    FPS | LFW_Accuracy     |
-|---------|--------| -----------------|
-| Facenet | 1256.96| 0.97933+-0.00624 |
+| Model   | FPS     | LFW_Accuracy     |
+|---------|---------|------------------|
+| Facenet | 1256.96 | 0.97933+-0.00624 |
 
-## Reference
+## References
 
-- [paper](https://arxiv.org/abs/1503.03832)
+- [Paper](https://arxiv.org/abs/1503.03832)
 - [facenet-pytorch](https://github.com/bubbliiiing/facenet-pytorch)
diff --git a/cv/face_recognition/facenet/tensorflow/README.md b/cv/face_recognition/facenet/tensorflow/README.md
index 547d4e76d..16dee2400 100644
--- a/cv/face_recognition/facenet/tensorflow/README.md
+++ b/cv/face_recognition/facenet/tensorflow/README.md
@@ -1,18 +1,16 @@
 # Facenet
 
-## Model description
+## Model Description
 
-This is a facenet-tensorflow library that can be used to train your own face recognition model.
+Facenet is a deep learning model for face recognition that directly maps face images to a compact Euclidean space, where
+distances correspond to face similarity. It uses a triplet loss function to ensure that faces of the same person are
+closer together than those of different individuals. Facenet excels in tasks like face verification, recognition, and
+clustering, offering high accuracy and efficiency. Its compact embeddings make it scalable for large-scale applications
+in security and identity verification.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# Install requirements.
-bash init.sh
-pip3 install numpy==1.23.5
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training.
 This training set consists of total of 453 453 images over 10 575 identities after face detection. Some performance
@@ -20,26 +18,22 @@ improvement has been seen if the dataset has been filtered before training. Some
 done will come later. The best performing model has been trained on the
 [VGGFace2](https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/) dataset consisting of ~3.3M faces and ~9000 classes.
 
-## download
+Download from [Baidu YunPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'.
 
-```bash
-cd data
-download dataset in this way: 
-download link: https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw   password: bcrq
-The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training
+The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training.
 
-# Data tree:
+```bash
 $ ls data/webface_182_44
 0000045
-......
+...
 
 $ ls data/lfw_data
 lfw  lfw_160  lfw.tgz
 ```
 
-## Pre-processing
+Pre-processing.
 
-### Face alignment using MTCNN
+Face alignment using MTCNN.
 
 One problem with the above approach seems to be that the Dlib face detector misses some of the hard examples (partial
 occlusion, silhouettes, etc). This makes the training set too "easy" which causes the model to perform worse on other
@@ -51,31 +45,35 @@ good results. A Python/Tensorflow implementation of MTCNN can be found
 [here](https://github.com/davidsandberg/facenet/tree/master/src/align). This implementation does not give identical
 results to the Matlab/Caffe implementation but the performance is very similar.
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# Install requirements.
+bash init.sh
+pip3 install numpy==1.23.5
+```
+
+## Model Training
 
 Currently, the best results are achieved by training the model using softmax loss. Details on how to train a model using
 softmax loss on the CASIA-WebFace dataset can be found on the page [Classifier training of
 Inception-ResNet-v1](https://github.com/davidsandberg/facenet/wiki/Classifier-training-of-inception-resnet-v1) and .
 
-### One Card
-
 ```bash
+# One Card
 nohup bash train_facenet.sh 1> train_facenet.log 2> train_facenet_error.log & tail -f train_facenet.log
-```
 
-### Multiple cards (DDP)
-
-```bash
-# 8 Cards(DDP)
+# Multiple cards (DDP)
+## 8 Cards(DDP)
 bash train_facenet_ddp.sh
 ```
 
-## Results
+## Model Results
 
 | Model   | FPS    | LFW_Accuracy     |
 |---------|--------|------------------|
 | Facenet | 216.96 | 0.98900+-0.00642 |
 
-## Reference
+## References
 
 - [facenet](https://github.com/davidsandberg/facenet)
-- 
Gitee


From 1f9c64c26fc9a598bfdb4dcda92ff369ea4f96b6 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Wed, 12 Mar 2025 17:00:22 +0800
Subject: [PATCH 8/9] unify model readme format - cv/distiller

---
 cv/distiller/cwd/pytorch/README.md  | 92 +++++++++++++++++------------
 cv/distiller/rkd/pytorch/README.md  | 32 ++++++----
 cv/distiller/wsld/pytorch/README.md | 73 +++++++++++++----------
 3 files changed, 114 insertions(+), 83 deletions(-)

diff --git a/cv/distiller/cwd/pytorch/README.md b/cv/distiller/cwd/pytorch/README.md
index bdfbfff4a..db68c3944 100644
--- a/cv/distiller/cwd/pytorch/README.md
+++ b/cv/distiller/cwd/pytorch/README.md
@@ -1,14 +1,56 @@
 # CWD
 
-> [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256)
+## Model Description
 
-<!-- [ALGORITHM] -->
+CWD (Channel-wise Knowledge Distillation) is a novel knowledge distillation method for dense prediction tasks like
+semantic segmentation. Unlike traditional spatial distillation, CWD aligns feature maps channel-wise between teacher and
+student networks by transforming each channel's feature map into a probability map and minimizing their KL divergence.
+This approach focuses on the most salient regions of channel-wise maps, improving distillation efficiency and accuracy.
+CWD outperforms spatial distillation methods while requiring less computational cost during training.
 
-## Model description
+## Model Preparation
 
-Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for dense prediction tasks align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a probability map using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures.
+### Prepare Resources
 
-## Step 1: Installation
+Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the
+Cityscapes dataset.
+
+Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure
+sholud look like:
+
+```bash
+cityscapes/
+├── gtFine
+│   ├── test
+│   ├── train
+│   │   ├── aachen
+│   │   └── bochum
+│   └── val
+│       ├── frankfurt
+│       ├── lindau
+│       └── munster
+└── leftImg8bit
+    ├── train
+    │   ├── aachen
+    │   └── bochum
+    └── val
+        ├── frankfurt
+        ├── lindau
+        └── munster
+```
+
+```shell
+mkdir -p data/
+ln -s /path/to/cityscapes data/cityscapes
+```
+
+```bash
+# Preprocess Data
+cd ../
+python3 tools/dataset_converters/cityscapes.py data/cityscapes --nproc 8
+```
+
+### Install Dependencies
 
 ```bash
 # install libGL
@@ -41,34 +83,7 @@ pip3 install mmengine==0.7.3
 python3 setup.py develop 
 ```
 
-## Step 2: Preparing datasets
-
-Cityscapes 官方网站可以下载 [Cityscapes](<https://www.cityscapes-dataset.com/>) 数据集，按照官网要求注册并登陆后，数据可以在[这里](<https://www.cityscapes-dataset.com/downloads/>)找到。
-
-```bash
-mkdir data/
-cd data/
-```
-
-按照惯例，**labelTrainIds.png 用于 cityscapes 训练。 我们提供了一个基于 cityscapesscripts 的脚本用于生成 **labelTrainIds.png。
-
-```bash
-  ├── data
-  │   ├── cityscapes
-  │   │   ├── leftImg8bit
-  │   │   │   ├── train
-  │   │   │   ├── val
-  │   │   ├── gtFine
-  │   │   │   ├── train
-  │   │   │   ├── val
-```
-
-```bash
-cd ..
-# --nproc 表示 8 个转换进程，也可以省略。
-python3 tools/dataset_converters/cityscapes.py data/cityscapes --nproc 8
-```
-## Step 3: Training
+## Model Training
 
 ```bash
 # On single GPU
@@ -78,11 +93,12 @@ python3 tools/train.py configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspne
 bash tools/dist_train.sh configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py 8
 ```
 
-## Results
+## Model Results
+
+| Model               | GPU        | FP32         |
+|---------------------|------------|--------------|
+| pspnet_r18(student) | BI-V100 x8 | Miou=  75.32 |
 
-|       model       |     GPU     | FP32                                 | 
-|-------------------| ----------- | ------------------------------------ |
-|   pspnet_r18(student)   | 8 cards     | Miou=  75.32                           |
+## References
 
-## Reference
 - [mmrazor](https://github.com/open-mmlab/mmrazor)
diff --git a/cv/distiller/rkd/pytorch/README.md b/cv/distiller/rkd/pytorch/README.md
index d58ed0e0e..3bd0e331f 100755
--- a/cv/distiller/rkd/pytorch/README.md
+++ b/cv/distiller/rkd/pytorch/README.md
@@ -1,11 +1,16 @@
-# RKD (Relational Knowledge Distillation)
+# RKD
 
-## Model description
+## Model Description
 
-Official implementation of [Relational Knowledge Distillation](https://arxiv.org/abs/1904.05068), CVPR 2019\
-This repository contains the source code of experiments for metric learning.
+RKD (Relational Knowledge Distillation) is a knowledge distillation technique that transfers relational information
+between data points from a teacher model to a student model. Instead of mimicking individual outputs, RKD focuses on
+preserving the relationships (distance and angle) between embeddings. This approach is particularly effective for metric
+learning tasks, where maintaining the relative structure of the embedding space is crucial. RKD enhances student model
+performance by capturing higher-order relational knowledge from the teacher.
 
-## Step 1: Installation
+## Model Preparation
+
+### Install Dependencies
 
 ```bash
 # If 'ZLIB_1.2.9' is not found, you need to install it as below.
@@ -17,7 +22,9 @@ cd ..
 rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
 ```
 
-## Step 2: Distillation
+## Model Training
+
+### Model Distillation
 
 ```bash
 # Train a teacher embedding network of resnet50 (d=512) sing triplet loss (margin=0.2) with distance-weighted sampling.
@@ -57,13 +64,12 @@ python3 run.py --mode eval \
                --load student/best.pth 
 ```
 
-## Results
-| model   | acc |
-|:----------:|:--------:|
-| RKD|  Best Train Recall: 0.7940, Best Eval Recall: 0.5763 |
-
-## Reference
-- [paper](https://arxiv.org/abs/2302.05637)
+## Model Results
 
+| Model | ACC                                                 |
+|-------|-----------------------------------------------------|
+| RKD   | Best Train Recall: 0.7940, Best Eval Recall: 0.5763 |
 
+## References
 
+- [Paper](https://arxiv.org/abs/2302.05637)
diff --git a/cv/distiller/wsld/pytorch/README.md b/cv/distiller/wsld/pytorch/README.md
index cdd5c757b..d3c2b3b26 100644
--- a/cv/distiller/wsld/pytorch/README.md
+++ b/cv/distiller/wsld/pytorch/README.md
@@ -1,28 +1,19 @@
-# WSLD: Weighted Soft Label Distillation
+# WSLD
 
-## Model description
-Knowledge distillation is an effective approach to leverage a well-trained network or an ensemble of them, named as the teacher, to guide the training of a student network. The outputs from the teacher network are used as soft labels for supervising the training of a new network.we investigate the bias-variance tradeoff brought by distillation with soft labels. Specifically, we observe that during training the bias-variance tradeoff varies sample-wisely. Further, under the same distillation temperature setting, we observe that the distillation performance is negatively associated with the number of some specific samples, which are named as regularization samples since these samples lead to bias increasing and variance decreasing. Nevertheless, we empirically find that completely filtering out regularization samples also deteriorates distillation performance. 
+## Model Description
 
-## Step 1: Installation
-```bash
-# Install libGL
-yum install mesa-libGL
-
-# Install zlib
-wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
-tar xvf zlib-1.2.9.tar.gz
-cd zlib-1.2.9/
-./configure && make install
-cd ..
-rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
+WSLD (Weighted Soft Label Distillation) is a knowledge distillation technique that focuses on transferring soft label
+information from a teacher model to a student model. Unlike traditional distillation methods that use uniform weighting,
+WSLD assigns different weights to each class based on their importance or difficulty, allowing the student to focus more
+on challenging or critical classes. This approach improves the student model's performance, particularly in imbalanced
+datasets or tasks where certain classes require more attention.
 
-# Install requirements
-pip3 install opencv-python lmdb msgpack
-```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
-Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process.
+Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to
+download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process.
 
 The ImageNet dataset path structure should look like:
 
@@ -40,9 +31,28 @@ imagenet
 └── val_list.txt
 ```
 
-**ImageNet to lmdb file**
+### Install Dependencies
+
+```bash
+# Install libGL
+yum install mesa-libGL
+
+# Install zlib
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+cd ..
+rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
+
+# Install requirements
+pip3 install opencv-python lmdb msgpack
+```
+
+### Preprocess Data
 
-The code is used for training Imagenet. Our pre-trained teacher models are Pytorch official models. By default, we pack the ImageNet data as the lmdb file for faster IO. The lmdb files can be made as follows.
+The code is used for training Imagenet. Our pre-trained teacher models are Pytorch official models. By default, we pack
+the ImageNet data as the lmdb file for faster IO. The lmdb files can be made as follows.
 
 ```bash
 # 1. Generate the list of the image data.
@@ -53,7 +63,7 @@ python3 dataset/img2lmdb.py --image_path /path/to/imagenet --list_path /path/to/
 python3 dataset/img2lmdb.py --image_path /path/to/imagenet --list_path /path/to/imagenet --output_path '/path/to/imagenet' --split 'val'
 ```
 
-## Step 3: Training
+## Model Training
 
 - train_with_distillation.py: train the model with our distillation method.
 - imagenet_train_cfg.py: all dataset and hyperparameter settings.
@@ -66,16 +76,15 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python3 train_with_distillation.py
 ```
 
+## Model Results
 
-## Results
-
-|GPUs|   Network  |  Method  | acc |
-|:----:|:----------:|:--------:|:----------:|
-| | ResNet 34 |  Teacher | 73.19 |
-| | ResNet 18 | Original | 69.75 |
-| BI-V100 x 8 | ResNet 18 | Proposed | __71.6__ |
+| GPU        | Network   | Method   | acc   |
+|------------|-----------|----------|-------|
+| BI-V100 x8 | ResNet 34 | Teacher  | 73.19 |
+| BI-V100 x8 | ResNet 18 | Original | 69.75 |
+| BI-V100 x8 | ResNet 18 | Proposed | 71.6  |
 
-## Reference
+## References
 
-- [Rethinking soft labels for knowledge distillation: A Bias-Variance Tradeoff Perspective](https://arxiv.org/abs/2102.00650)
+- [Paper](https://arxiv.org/abs/2102.00650)
 - [Weighted Soft Label Distillation](https://github.com/bellymonster/Weighted-Soft-Label-Distillation)
-- 
Gitee


From fd5b3108f616db2aa3388d1ac54aab116f2d53d2 Mon Sep 17 00:00:00 2001
From: "mingjiang.li" <mingjiang.li@iluvatar.com>
Date: Wed, 12 Mar 2025 18:15:36 +0800
Subject: [PATCH 9/9] unify model readme format - cv/detection

---
 cv/detection/README.md                        |   1 -
 cv/detection/atss_mmdet/pytorch/README.md     |  63 +++++----
 cv/detection/autoassign/pytorch/README.md     |  58 ++++----
 .../cascade_rcnn_mmdet/pytorch/README.md      |  63 +++++----
 cv/detection/centermask2/pytorch/README.md    |  37 +++--
 cv/detection/centernet/paddlepaddle/README.md |  54 ++++----
 cv/detection/centernet/pytorch/README.md      |  86 ++++++------
 cv/detection/co-detr/pytorch/README.md        |  88 ++++++------
 .../cornernet_mmdet/pytorch/README.md         |  64 +++++----
 cv/detection/dcnv2_mmdet/pytorch/README.md    |  65 +++++----
 cv/detection/detr/paddlepaddle/README.md      |  54 ++++----
 cv/detection/fasterrcnn/pytorch/README.md     |  54 ++++----
 cv/detection/fcos/paddlepaddle/README.md      |  52 +++----
 cv/detection/fcos/pytorch/README.md           |  58 +++++---
 cv/detection/mamba_yolo/pytorch/README.md     |  45 +++---
 cv/detection/maskrcnn/paddlepaddle/README.md  |  44 +++---
 cv/detection/maskrcnn/pytorch/README.md       |  61 +++++----
 cv/detection/oc_sort/paddlepaddle/README.md   |  61 +++++----
 .../oriented_reppoints/pytorch/README.md      |  86 ++++++------
 cv/detection/picodet/paddlepaddle/README.md   |  77 +++++------
 cv/detection/pp-yoloe/paddlepaddle/README.md  |  29 ++--
 cv/detection/pp_yoloe+/paddlepaddle/README.md |  38 +++---
 cv/detection/pvanet/pytorch/README.md         |  94 +++----------
 .../reppoints_mmdet/pytorch/README.md         |  62 +++++----
 cv/detection/retinanet/paddlepaddle/README.md |  39 ++++--
 cv/detection/retinanet/pytorch/README.md      |  41 +++---
 cv/detection/rt-detr/pytorch/README.md        |  42 +++---
 cv/detection/rtmdet/pytorch/README.md         |  65 +++++----
 cv/detection/solov2/paddlepaddle/README.md    |  52 +++----
 cv/detection/ssd/mindspore/README.md          | 129 ++++++++++--------
 cv/detection/ssd/paddlepaddle/README.md       |  41 +++---
 cv/detection/ssd/pytorch/README.md            |  39 +++---
 cv/detection/ssd/tensorflow/README.md         |  81 ++++++-----
 cv/detection/ssd/tensorflow/readme_origin.md  |   2 +-
 cv/detection/yolof/pytorch/README.md          |  65 +++++----
 cv/detection/yolov10/pytorch/README.md        |  39 +++---
 cv/detection/yolov3/paddlepaddle/README.md    |  29 ++--
 cv/detection/yolov3/pytorch/README.md         |  36 ++---
 cv/detection/yolov3/tensorflow/README.md      |  43 +++---
 cv/detection/yolov5/paddlepaddle/README.md    |  32 +++--
 cv/detection/yolov5/pytorch/README.md         |  73 +++++-----
 cv/detection/yolov6/pytorch/README.md         |  74 +++++-----
 cv/detection/yolov7/pytorch/README.md         |  66 +++++----
 cv/detection/yolov8/pytorch/README.md         |  61 +++++----
 cv/detection/yolov9/pytorch/README.md         |  46 ++++---
 45 files changed, 1322 insertions(+), 1167 deletions(-)
 delete mode 100644 cv/detection/README.md

diff --git a/cv/detection/README.md b/cv/detection/README.md
deleted file mode 100644
index 11c2f4b90..000000000
--- a/cv/detection/README.md
+++ /dev/null
@@ -1 +0,0 @@
-# Object Detection
diff --git a/cv/detection/atss_mmdet/pytorch/README.md b/cv/detection/atss_mmdet/pytorch/README.md
index 4c756bc6e..f5c0fc7b5 100644
--- a/cv/detection/atss_mmdet/pytorch/README.md
+++ b/cv/detection/atss_mmdet/pytorch/README.md
@@ -1,31 +1,22 @@
 # ATSS
 
-## Model description
+## Model Description
 
-Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to 50.7% AP without introducing any overhead.
+ATSS (Adaptive Training Sample Selection) is an innovative object detection framework that bridges the gap between
+anchor-based and anchor-free methods. It introduces an adaptive mechanism to select positive and negative training
+samples based on statistical characteristics of objects, improving detection accuracy. ATSS automatically determines
+optimal sample selection thresholds, eliminating the need for manual tuning. This approach enhances both anchor-based
+and anchor-free detectors, achieving state-of-the-art performance on benchmarks like COCO without additional
+computational overhead.
 
-## Step 1: Installation
+## Model Preparation
 
-ATSS model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
-
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
-
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -46,7 +37,24 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+ATSS model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -67,11 +75,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/atss/atss_r50_fpn_1x_coco.py 8
 ```
 
-## Results
+## Model Results
+
+| GPU        | FP32     |
+|------------|----------|
+| BI-V100 x8 | MAP=39.5 |
 
-|     GPUs    | FP32                                 | 
-| ----------- | ------------------------------------ |
-| BI-V100 x8  | MAP=39.5                             |
+## References
 
-## Reference
-[mmdetection](https://github.com/open-mmlab/mmdetection)
+- [mmdetection](https://github.com/open-mmlab/mmdetection)
diff --git a/cv/detection/autoassign/pytorch/README.md b/cv/detection/autoassign/pytorch/README.md
index 69e794236..f874fe8bd 100755
--- a/cv/detection/autoassign/pytorch/README.md
+++ b/cv/detection/autoassign/pytorch/README.md
@@ -1,26 +1,17 @@
 # AutoAssign
 
-## Model description
+## Model Description
 
-Determining positive/negative samples for object detection is known as label assignment. Here we present an anchor-free detector named AutoAssign. It requires little human knowledge and achieves appearance-aware through a fully differentiable weighting mechanism. During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions. To adapt to object appearances, Confidence Weighting is proposed to adjust the specific assign strategy of each instance. The two weighting modules are then combined to generate positive and negative weights to adjust each location's confidence. 
+AutoAssign is an anchor-free object detection model that introduces a fully differentiable label assignment mechanism.
+It combines Center Weighting and Confidence Weighting to adaptively determine positive and negative samples during
+training. Center Weighting adjusts category-specific prior distributions, while Confidence Weighting customizes
+assignment strategies for each instance. This approach eliminates the need for manual anchor design and achieves
+appearance-aware detection through automatic sample selection, resulting in improved performance and reduced human
+intervention in the detection process.
 
+## Model Preparation
 
-## Step 1: Installing packages
-
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
-
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 ```bash
 mkdir -p data 
@@ -31,9 +22,11 @@ mkdir -p /root/.cache/torch/hub/checkpoints/
 wget https://download.openmmlab.com/pretrain/third_party/resnet50_msra-5891d200.pth -O /root/.cache/torch/hub/checkpoints/resnet50_msra-5891d200.pth
 ```
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -54,19 +47,32 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
-
-### One single GPU
+### Install Dependencies
 
 ```bash
-python3 tools/train.py configs/autoassign/autoassign_r50-caffe_fpn_1x_coco.py
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
 ```
 
-### Multiple GPUs on one machine
+## Model Training
+
 ```bash
+# One single GPU
+python3 tools/train.py configs/autoassign/autoassign_r50-caffe_fpn_1x_coco.py
+
+# Multiple GPUs on one machine
 sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/autoassign/autoassign_r50-caffe_fpn_1x_coco.py 8
 ```
 
-## Reference
+## References
+
 [mmdetection](https://github.com/open-mmlab/mmdetection)
diff --git a/cv/detection/cascade_rcnn_mmdet/pytorch/README.md b/cv/detection/cascade_rcnn_mmdet/pytorch/README.md
index 756bc2f08..e673a73d2 100644
--- a/cv/detection/cascade_rcnn_mmdet/pytorch/README.md
+++ b/cv/detection/cascade_rcnn_mmdet/pytorch/README.md
@@ -1,30 +1,22 @@
 # Cascade R-CNN
 
-## Model description
+## Model Description
 
-In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN.
+Cascade R-CNN is a multi-stage object detection framework that progressively improves detection quality through a
+sequence of detectors trained with increasing IoU thresholds. Each stage refines the bounding boxes from the previous
+stage, addressing the paradox of high-quality detection by minimizing overfitting and ensuring quality consistency
+between training and inference. This architecture achieves state-of-the-art performance on various datasets, including
+COCO, and can be extended to instance segmentation tasks, outperforming models like Mask R-CNN.
 
-## Step 1: Installation
-Cascade R-CNN model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
+## Model Preparation
 
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+### Prepare Resources
 
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-## Step 2: Preparing datasets
-
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -45,7 +37,24 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+Cascade R-CNN model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -66,12 +75,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh  configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.py  8
 ```
 
-## Results
+## Model Results
 
-|     GPUs    | FP32                                 | 
-| ----------- | ------------------------------------ |
-| BI-V100 x8  | MAP=40.4                             |
+| GPU        | FP32     |
+|------------|----------|
+| BI-V100 x8 | MAP=40.4 |
 
-## Reference
+## References
 
-- [Cascade R-CNN: High Quality Object Detection and Instance Segmentation](https://arxiv.org/abs/1906.09756)
+- [Paper](https://arxiv.org/abs/1906.09756)
diff --git a/cv/detection/centermask2/pytorch/README.md b/cv/detection/centermask2/pytorch/README.md
index 8938a8254..6992d0698 100644
--- a/cv/detection/centermask2/pytorch/README.md
+++ b/cv/detection/centermask2/pytorch/README.md
@@ -1,20 +1,17 @@
 # CenterMask2
 
-CenterMask2 is an upgraded implementation on top of detectron2 beyond original CenterMask based on maskrcnn-benchmark.
+## Model Description
 
-## Step 1: Installation
+CenterMask2 is an advanced instance segmentation model built on Detectron2, extending the original CenterMask
+architecture. It improves mask prediction accuracy by incorporating spatial attention mechanisms and VoVNet backbones.
+CenterMask2 enhances object localization and segmentation through its dual-branch design, combining mask and box
+predictions effectively. The model achieves state-of-the-art performance on COCO dataset benchmarks, offering efficient
+training and inference capabilities. It's particularly effective for complex scenes with overlapping objects and varying
+scales.
 
-All you need to use centermask2 is [detectron2](https://github.com/facebookresearch/detectron2). It's easy!
-
-you just install [detectron2](https://github.com/facebookresearch/detectron2) following [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md).
-
-```bash
-# Install detectron2
-git clone https://github.com/facebookresearch/detectron2.git
-python3 -m pip install -e detectron2
-```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
@@ -44,7 +41,19 @@ mkdir -p <project_path>/datasets/
 ln -s /path/to/coco2017 <project_path>/datasets/
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+All you need to use centermask2 is [detectron2](https://github.com/facebookresearch/detectron2). It's easy!
+
+you just install [detectron2](https://github.com/facebookresearch/detectron2) following [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md).
+
+```bash
+# Install detectron2
+git clone https://github.com/facebookresearch/detectron2.git
+python3 -m pip install -e detectron2
+```
+
+## Model Training
 
 For example, to launch CenterMask training with VoVNetV2-39 backbone on 8 GPUs,
 one should execute:
@@ -55,6 +64,6 @@ cd centermask2
 python3 train_net.py --config-file "configs/centermask/centermask_V_39_eSE_FPN_ms_3x.yaml" --num-gpus 8
 ```
 
-## Reference
+## References
 
 - [CenterMask2](https://github.com/youngwanLEE/centermask2)
\ No newline at end of file
diff --git a/cv/detection/centernet/paddlepaddle/README.md b/cv/detection/centernet/paddlepaddle/README.md
index 485c006e9..f1ad9a71b 100644
--- a/cv/detection/centernet/paddlepaddle/README.md
+++ b/cv/detection/centernet/paddlepaddle/README.md
@@ -1,52 +1,54 @@
 # CenterNet
 
-## Model description
-Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.
+## Model Description
 
-## 克隆代码
+CenterNet is an efficient object detection model that represents objects as single points (their bounding box centers)
+rather than traditional bounding boxes. It uses keypoint estimation to locate centers and regresses other object
+properties like size and orientation. This approach eliminates the need for anchor boxes and non-maximum suppression,
+making it simpler and faster. CenterNet achieves state-of-the-art speed-accuracy trade-offs on benchmarks like COCO and
+can be extended to 3D detection and pose estimation tasks.
 
-```
+## Model Preparation
+
+### Prepare Resources
+
+```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## 安装PaddleDetection
+### Install Dependencies
 
-```
-cd PaddleDetection
+```bash
 pip install -r requirements.txt
 python3 setup.py install
 ```
 
-## 下载COCO数据集
+## Model Training
 
-```
-python3 dataset/coco/download_coco.py
-```
-
-## 运行代码
-
-```
-# GPU多卡训练
+```bash
+# Multi-GPU
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/centernet/centernet_r50_140e_coco.yml --eval
 
-# GPU单卡训练
+# Single-GPU
 export CUDA_VISIBLE_DEVICES=0
-
 python3 tools/train.py -c configs/centernet/centernet_r50_140e_coco.yml --eval
 
-# finetune
+# Finetune
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/centernet/centernet_r50_140e_coco.yml -o pretrain_weights=https://bj.bcebos.com/v1/paddledet/models/centernet_r50_140e_coco.pdparams --eval
 
-# 注：默认学习率是适配多GPU训练(8x GPU)，若使用单GPU训练，须对应调整config中的学习率（例如，除以8）
+# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training,
+# you need to adjust the learning rate in the config accordingly (e.g., divide by 8).
 
 ```
 
-## finetune Results on BI-V100
+## Model Results
 
-| GPUs | learning rate | FPS | Train Epochs | mAP  |
-|------|------------|-----|--------------|------|
-| 1x8  | 0.00005        | 10.85 | 3           | 38.5 |
\ No newline at end of file
+| GPU        | learning rate | FPS   | Train Epochs | mAP  |
+|------------|---------------|-------|--------------|------|
+| BI-V100 x8 | 0.00005       | 10.85 | 3            | 38.5 |
diff --git a/cv/detection/centernet/pytorch/README.md b/cv/detection/centernet/pytorch/README.md
index d458aa219..2b97bd942 100644
--- a/cv/detection/centernet/pytorch/README.md
+++ b/cv/detection/centernet/pytorch/README.md
@@ -1,43 +1,22 @@
 # CenterNet
 
-## Model description
+## Model Description
 
-Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.
+CenterNet is an efficient object detection model that represents objects as single points (their bounding box centers)
+rather than traditional bounding boxes. It uses keypoint estimation to locate centers and regresses other object
+properties like size and orientation. This approach eliminates the need for anchor boxes and non-maximum suppression,
+making it simpler and faster. CenterNet achieves state-of-the-art speed-accuracy trade-offs on benchmarks like COCO and
+can be extended to 3D detection and pose estimation tasks.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+### Prepare Resources
 
-pip3 install -r requirements.txt
-git clone https://github.com/xingyizhou/CenterNet.git
-git checkout 4c50fd3a46bdf63dbf2082c5cbb3458d39579e6c
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-# Compile deformable convolutional(DCNv2)
-cd ./src/lib/models/networks/
-rm -rf DCNv2
-git clone -b pytorch_1.11 https://github.com/lbin/DCNv2.git
-cd ./DCNv2/
-python3 setup.py build develop
-```
-
-## Step 2: Preparing datasets
-
-### Go back to the "pytorch/" directory
-
-```bash
-cd ../../../../../
-```
-
-### Download coco2017
-
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -58,41 +37,52 @@ coco2017
 └── ...
 ```
 
-### Set up soft link to coco2017
-
 ```bash
 ln -s /path/to/coco2017 ./data/coco
 ```
 
-### Prepare offline file "resnet18-5c106cde.pth" if download fails
-
 ```bash
+# Prepare offline file "resnet18-5c106cde.pth" if download fails
 wget https://download.pytorch.org/models/resnet18-5c106cde.pth
 mkdir -p /root/.cache/torch/hub/checkpoints/
 mv resnet18-5c106cde.pth /root/.cache/torch/hub/checkpoints/
 ```
 
-## Step 3: Training
-
-### Setup CUDA_VISIBLE_DEVICES variable
+### Install Dependencies
 
 ```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+pip3 install -r requirements.txt
+git clone https://github.com/xingyizhou/CenterNet.git
+git checkout 4c50fd3a46bdf63dbf2082c5cbb3458d39579e6c
+
+# Compile deformable convolutional(DCNv2)
+cd ./src/lib/models/networks/
+rm -rf DCNv2
+git clone -b pytorch_1.11 https://github.com/lbin/DCNv2.git
+cd ./DCNv2/
+python3 setup.py build develop
 ```
 
-### On single GPU
+## Model Training
 
 ```bash
 cd ./cv/detection/centernet/pytorch/src
 touch lib/datasets/__init__.py
-python3 main.py ctdet --arch res_18 --batch_size 32 --master_batch 15 --lr 1.25e-4  --gpus 0
-```
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 
-### Multiple GPUs on one machine
+# On single GPU
+python3 main.py ctdet --arch res_18 --batch_size 32 --master_batch 15 --lr 1.25e-4  --gpus 0
 
-```bash
+# Multiple GPUs on one machine
 python3 main.py ctdet --arch res_18 --batch_size 128 --master_batch 60 --lr 1.25e-4  --gpus 0,1,2,3,4,5,6,7
 ```
 
-## Reference
-https://github.com/xingyizhou/CenterNet
+## References
+
+- [CenterNet](https://github.com/xingyizhou/CenterNet)
diff --git a/cv/detection/co-detr/pytorch/README.md b/cv/detection/co-detr/pytorch/README.md
index e864b540f..c79785662 100644
--- a/cv/detection/co-detr/pytorch/README.md
+++ b/cv/detection/co-detr/pytorch/README.md
@@ -1,44 +1,22 @@
 # Co-DETR (DETRs with Collaborative Hybrid Assignments Training)
-This repo is the official implementation of ["DETRs with Collaborative Hybrid Assignments Training"](https://arxiv.org/pdf/2211.12860.pdf) by Zhuofan Zong, Guanglu Song, and Yu Liu.
 
-## Model description
+## Model Description
 
-In this paper, we present a novel collaborative hybrid assignments training scheme, namely Co-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. 
-1. **Encoder optimization**: The proposed training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training multiple parallel auxiliary heads supervised by one-to-many label assignments. 
-2. **Decoder optimization**: We conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve attention learning of the decoder. 
-3. **State-of-the-art performance**: Co-DETR with [ViT-L](https://github.com/baaivision/EVA/tree/master/EVA-02) (304M parameters) is **the first model to achieve 66.0 AP on COCO test-dev.**
+Co-DETR is an advanced object detection model that enhances DETR (DEtection TRansformer) through collaborative hybrid
+assignments training. It improves encoder learning by training multiple auxiliary heads with one-to-many label
+assignments and optimizes decoder attention using customized positive queries. Co-DETR achieves state-of-the-art
+performance, being the first model to reach 66.0 AP on COCO test-dev with ViT-L. This approach significantly boosts
+detection accuracy and efficiency while maintaining end-to-end training simplicity.
 
-## Step 1: Installation
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+## Model Preparation
 
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
-### (2) install other
-```bash
-pip3 install -r requirements.txt
-pip3 install urllib3==1.26.15
-```
+### Prepare Resources
 
-### (3) download repo
-```bash
-git clone https://github.com/Sense-X/Co-DETR.git
-cd /path/to/Co-DETR
-git checkout bf3d49d7c02929788dfe2f251b6b01cbe196b736
-```
-
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco` to your COCO path in later training process, the unzipped
+dataset path structure sholud look like:
 
 ```bash
 coco
@@ -59,15 +37,37 @@ coco
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+
+# Install requirements
+pip3 install -r requirements.txt
+pip3 install urllib3==1.26.15
+
+# Download repo
+git clone https://github.com/Sense-X/Co-DETR.git
+cd /path/to/Co-DETR
+git checkout bf3d49d7c02929788dfe2f251b6b01cbe196b736
+```
+
+## Model Training
 
 ```bash
 # Make coco dataset path soft link to ./data/coco
 mkdir data/
 ln -s /path/to/coco ./data
-```
 
-```bash
 # One GPU
 export CUDA_VISIBLE_DEVICES=0
 python3 tools/train.py projects/configs/co_deformable_detr/co_deformable_detr_r50_1x_coco.py --work-dir path_to_exp --no-validate --auto-resume
@@ -80,11 +80,13 @@ export CUDA_VISIBLE_DEVICES=0
 PYTHONPATH=".:$PYTHONPATH" python3 tools/test.py projects/configs/co_deformable_detr/co_deformable_detr_r50_1x_coco.py path_to_exp/latest.pth --eval bbox
 ```
 
-## Results
+## Model Results
+
+| Model   | GPU        | FPS  | Train Epochs | Box AP |
+|---------|------------|------|--------------|--------|
+| Co-DETR | BI-V100 x8 | 9.02 | 12           | 0.428  |
 
-| GPUs | FPS | Train Epochs | Box AP |
-|------|---------|----------|--------|
-| BI-V100 x8 | 9.02 | 12  | 0.428    |
+## References
 
-## Reference
-- [Co-DETR](https://github.com/Sense-X/Co-DETR)
\ No newline at end of file
+- [Paper](https://arxiv.org/pdf/2211.12860.pdf)
+- [Co-DETR](https://github.com/Sense-X/Co-DETR)
diff --git a/cv/detection/cornernet_mmdet/pytorch/README.md b/cv/detection/cornernet_mmdet/pytorch/README.md
index 876cb1f98..169c9c8ce 100644
--- a/cv/detection/cornernet_mmdet/pytorch/README.md
+++ b/cv/detection/cornernet_mmdet/pytorch/README.md
@@ -1,29 +1,23 @@
 # CornerNet
 
-## Model description
+## Model Description
 
-CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.
+CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the
+top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired
+keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In
+addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network
+better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing
+one-stage detectors.
 
-## Step 1: Installation
-CornerNet model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+## Model Preparation
 
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -44,7 +38,24 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+CornerNet model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -61,11 +72,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py 8
 ```
 
-## Results
+## Model Results
+
+| Model     | GPU        | FP32     |
+|-----------|------------|----------|
+| CornerNet | BI-V100 x8 | MAP=41.2 |
 
-|     GPUs   | FP32     | 
-| ---------- | -------- |
-| BI-V100 x8 | MAP=41.2 |
+## References
 
-## Reference
-- [Cornernet: Detecting objects as paired keypoints](https://arxiv.org/abs/1808.01244)
+- [Paper](https://arxiv.org/abs/1808.01244)
diff --git a/cv/detection/dcnv2_mmdet/pytorch/README.md b/cv/detection/dcnv2_mmdet/pytorch/README.md
index 77246fee7..4f7f8595e 100644
--- a/cv/detection/dcnv2_mmdet/pytorch/README.md
+++ b/cv/detection/dcnv2_mmdet/pytorch/README.md
@@ -1,31 +1,23 @@
 # DCNv2
 
-## Model description
+## Model Description
 
-The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of RCNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.
+DCNv2 (Deformable Convolutional Networks v2) is an advanced convolutional neural network architecture that enhances
+spatial transformation capabilities. It improves upon DCN by introducing a modulation mechanism for more precise
+deformation modeling and better focus on relevant image regions. DCNv2 integrates deformable convolution more
+comprehensively throughout the network, enabling superior adaptation to object geometry. This results in
+state-of-the-art performance on object detection and instance segmentation tasks, particularly on benchmarks like COCO,
+while maintaining computational efficiency.
 
-## Step 1: Installation
+## Model Preparation
 
-DCNv2 model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
-
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+### Prepare Resources
 
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
-
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -46,7 +38,24 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+DCNv2 model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -63,12 +72,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/dcnv2/faster-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py 8
 ```
 
-## Results
+## Model Results
 
-|    GPUs    | FP32     |
-| ---------- | -------- |
-| BI-V100 x8 | MAP=41.2 |
+ | GPU   | FP32       | mAP  |
+ |-------|------------|------|
+ | DCNv2 | BI-V100 x8 | 41.2 |
 
-## Reference
+## References
 
-- [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168)
+- [Paper](https://arxiv.org/abs/1811.11168)
diff --git a/cv/detection/detr/paddlepaddle/README.md b/cv/detection/detr/paddlepaddle/README.md
index d09ff584d..326b1b5ac 100644
--- a/cv/detection/detr/paddlepaddle/README.md
+++ b/cv/detection/detr/paddlepaddle/README.md
@@ -1,52 +1,54 @@
 # DETR
 
-## Model description
-DETR is an object detection model based on transformer. We reproduced the model of the paper.
+## Model Description
 
-## 克隆代码
+DETR (DEtection TRansformer) is a novel object detection model that replaces traditional convolutional methods with a
+transformer-based architecture. It treats object detection as a direct set prediction problem, eliminating the need for
+anchor boxes and non-maximum suppression. DETR uses a transformer encoder-decoder structure to process image features
+and predict object bounding boxes and classes simultaneously. This end-to-end approach simplifies the detection pipeline
+while achieving competitive performance on benchmarks like COCO, offering a new paradigm for object detection tasks.
 
-```
+## Model Preparation
+
+### Prepare Resources
+
+```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## 安装PaddleDetection
+### Install Dependencies
 
-```
-cd PaddleDetection
+```bash
 pip install -r requirements.txt
 python3 setup.py install
 ```
 
-## 下载COCO数据集
+## Model Training
 
-```
-python3 dataset/coco/download_coco.py
-```
-
-## 运行代码
-
-```
-# GPU多卡训练
+```bash
+# Multi-GPU
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/detr/detr_r50_1x_coco.yml --eval
 
-# GPU单卡训练
+# Single-GPU
 export CUDA_VISIBLE_DEVICES=0
-
 python3 tools/train.py -c configs/detr/detr_r50_1x_coco.yml --eval
 
-# finetune
+# Finetune
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/detr/detr_r50_1x_coco.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/detr_r50_1x_coco.pdparams --eval
 
-# 注：默认学习率是适配多GPU训练(8x GPU)，若使用单GPU训练，须对应调整config中的学习率（例如，除以8）
+# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training,
+# you need to adjust the learning rate in the config accordingly (e.g., divide by 8).
 
 ```
 
-## finetune Results on BI-V100
+## Model Results
 
-| GPUs | learning rate | FPS | Train Epochs | Box AP  |
-|------|------------|-----|--------------|------|
-| 1x8  | 0.00001        | 14.64 | 1           | 42.0 |
\ No newline at end of file
+| Model | GPU        | learning rate | FPS   | Train Epochs | Box AP |
+|-------|------------|---------------|-------|--------------|--------|
+| DETR  | BI-V100 x8 | 0.00001       | 14.64 | 1            | 42.0   |
diff --git a/cv/detection/fasterrcnn/pytorch/README.md b/cv/detection/fasterrcnn/pytorch/README.md
index 7d58e01e0..9b071be03 100644
--- a/cv/detection/fasterrcnn/pytorch/README.md
+++ b/cv/detection/fasterrcnn/pytorch/README.md
@@ -1,26 +1,22 @@
 # Faster R-CNN
 
-## Model description
+## Model Description
 
-State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
+Faster R-CNN is a state-of-the-art object detection model that introduces a Region Proposal Network (RPN) to generate
+region proposals efficiently. It shares convolutional features between the RPN and detection network, enabling nearly
+cost-free region proposals. This architecture significantly improves detection speed and accuracy compared to its
+predecessors. Faster R-CNN achieves excellent performance on benchmarks like PASCAL VOC and COCO, and serves as the
+foundation for many winning entries in computer vision competitions.
 
-## Step 1: Installing packages
-```
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-dev
+## Model Preparation
 
-cd <project_path>/start_scripts
-bash init_torch.sh
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -41,19 +37,31 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training
+### Install Dependencies
 
-### On single GPU (AMP)
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-dev
+
+cd <project_path>/start_scripts
+bash init_torch.sh
 ```
+
+## Model Training
+
+```bash
+# On single GPU (AMP)
 cd <project_path>/start_scripts
 bash train_fasterrcnn_resnet50_amp_torch.sh --dataset coco --data-path /path/to/coco2017
-```
 
-### Multiple GPUs on one machine
-```
+## Multiple GPUs on one machine
 cd <project_path>/start_scripts
 bash train_fasterrcnn_resnet50_amp_dist_torch.sh --dataset coco --data-path /path/to/coco2017
 ```
 
-## Reference
-https://github.com/pytorch/vision
\ No newline at end of file
+## References
+
+- [vision](https://github.com/pytorch/vision)
diff --git a/cv/detection/fcos/paddlepaddle/README.md b/cv/detection/fcos/paddlepaddle/README.md
index 1924f0ecf..877b88a6e 100644
--- a/cv/detection/fcos/paddlepaddle/README.md
+++ b/cv/detection/fcos/paddlepaddle/README.md
@@ -1,47 +1,49 @@
 # FCOS
 
-## Model description
-FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free object detection framework with strong performance.
+## Model Description
 
-## Get PaddleDetection source code
+FCOS (Fully Convolutional One-Stage Object Detection) is an anchor-free object detection model that predicts bounding
+boxes directly without anchor boxes. It uses a fully convolutional network to detect objects by predicting per-pixel
+bounding boxes and class labels. FCOS simplifies the detection pipeline, reduces hyperparameters, and achieves
+competitive performance on benchmarks like COCO. Its center-ness branch helps suppress low-quality predictions, making
+it efficient and effective for various detection tasks.
 
-```
+## Model Preparation
+
+### Prepare Resources
+
+```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## Install PaddleDetection
+### Install Dependencies
 
-```
-cd PaddleDetection
+```bash
 pip install -r requirements.txt
 python3 setup.py install
 ```
 
-## Prepare datasets
+## Model Training
 
-```
-python3 dataset/coco/download_coco.py
-```
-
-## Train
-
-```
-# GPU多卡训练
+```bash
+# Multi-GPU
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --eval
 
-# GPU单卡训练
+# Single-GPU
 export CUDA_VISIBLE_DEVICES=0
-
 python3 tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --eval
 
-# 注：默认学习率是适配多GPU训练(8x GPU)，若使用单GPU训练，须对应调整config中的学习率（例如，除以8）
-
+# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training,
+# you need to adjust the learning rate in the config accordingly (e.g., divide by 8).
 ```
 
-## Results on BI-V100
+## Model Results
 
-| GPUs | FPS | Train Epochs | Box AP	  |
-|------|-----|--------------|------|
-| 1x8  | 8.24 | 12           | 39.7 |
+ | Model | GPU        | FPS  | Train Epochs | Box AP |
+ |-------|------------|------|--------------|--------|
+ | FCOS  | BI-V100 x8 | 8.24 | 12           | 39.7   |
diff --git a/cv/detection/fcos/pytorch/README.md b/cv/detection/fcos/pytorch/README.md
index 4ab40a13a..2236e5764 100755
--- a/cv/detection/fcos/pytorch/README.md
+++ b/cv/detection/fcos/pytorch/README.md
@@ -1,21 +1,22 @@
 # FCOS
 
-## Model description
-FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free object detection framework with strong performance.
-The full paper is available at: [https://arxiv.org/abs/1904.01355](https://arxiv.org/abs/1904.01355). 
+## Model Description
 
-## Step 1: Installation
+FCOS (Fully Convolutional One-Stage Object Detection) is an anchor-free object detection model that predicts bounding
+boxes directly without anchor boxes. It uses a fully convolutional network to detect objects by predicting per-pixel
+bounding boxes and class labels. FCOS simplifies the detection pipeline, reduces hyperparameters, and achieves
+competitive performance on benchmarks like COCO. Its center-ness branch helps suppress low-quality predictions, making
+it efficient and effective for various detection tasks.
 
-```
-pip3 install -r requirements.txt
-python3 setup.py develop
-```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -44,9 +45,17 @@ wget https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
 mv R-50.pkl /root/.torch/models/
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+pip3 install -r requirements.txt
+python3 setup.py develop
+```
+
+## Model Training
 
-The following command line will train FCOS_imprv_R_50_FPN_1x on 8 GPUs with Synchronous Stochastic Gradient Descent (SGD):
+The following command line will train FCOS_imprv_R_50_FPN_1x on 8 GPUs with Synchronous Stochastic Gradient Descent
+(SGD):
 
 ```bash
 python3 -m torch.distributed.launch \
@@ -56,20 +65,25 @@ python3 -m torch.distributed.launch \
     DATALOADER.NUM_WORKERS 2 \
     OUTPUT_DIR training_dir/fcos_imprv_R_50_FPN_1x
 ```
-        
+
 Note that:
-1) If you want to use fewer GPUs, please change `--nproc_per_node` to the number of GPUs. No other settings need to be changed. The total batch size does not depends on `nproc_per_node`. If you want to change the total batch size, please change `SOLVER.IMS_PER_BATCH` in [configs/fcos/fcos_R_50_FPN_1x.yaml](configs/fcos/fcos_R_50_FPN_1x.yaml).
+
+1) If you want to use fewer GPUs, please change `--nproc_per_node` to the number of GPUs. No other settings need to be
+   changed. The total batch size does not depends on `nproc_per_node`. If you want to change the total batch size,
+   please change `SOLVER.IMS_PER_BATCH` in [configs/fcos/fcos_R_50_FPN_1x.yaml](configs/fcos/fcos_R_50_FPN_1x.yaml).
 2) The models will be saved into `OUTPUT_DIR`.
 3) If you want to train FCOS with other backbones, please change `--config-file`.
-4) If you want to train FCOS on your own dataset, please follow this instruction [#54](https://github.com/tianzhi0549/FCOS/issues/54#issuecomment-497558687).
-5) Now, training with 8 GPUs and 4 GPUs can have the same performance. Previous performance gap was because we did not synchronize `num_pos` between GPUs when computing loss. 
+4) If you want to train FCOS on your own dataset, please follow this instruction
+   [#54](https://github.com/tianzhi0549/FCOS/issues/54#issuecomment-497558687).
+5) Now, training with 8 GPUs and 4 GPUs can have the same performance. Previous performance gap was because we did not
+   synchronize `num_pos` between GPUs when computing loss.
 
-## Results
+## Model Results
 
-| GPUs | FPS | Train Epochs | Box AP|
-|------|-----|--------------|-------|
-| BI-V100 x8  | 8.24 | 12          |  38.7 |
+ | Model | GPU        | FPS  | Train Epochs | Box AP |
+ |-------|------------|------|--------------|--------|
+ | FCOS  | BI-V100 x8 | 8.24 | 12           | 38.7   |
 
-## Reference
+## References
 
 - [FCOS](https://github.com/tianzhi0549/FCOS)
diff --git a/cv/detection/mamba_yolo/pytorch/README.md b/cv/detection/mamba_yolo/pytorch/README.md
index a650d435e..29d779941 100644
--- a/cv/detection/mamba_yolo/pytorch/README.md
+++ b/cv/detection/mamba_yolo/pytorch/README.md
@@ -1,26 +1,20 @@
 # Mamba-YOLO
 
-## Model description
+## Model Description
 
-Mamba-YOLO is an innovative object detection model that integrates State Space Models (SSMs) into the YOLO (You Only Look Once) architecture to enhance performance in complex visual tasks. This integration aims to improve the model's ability to capture global dependencies and process long-range information efficiently.
+Mamba-YOLO is an innovative object detection model that integrates State Space Models (SSMs) into the YOLO (You Only
+Look Once) architecture to enhance performance in complex visual tasks. This integration aims to improve the model's
+ability to capture global dependencies and process long-range information efficiently.
 
-## Step 1: Installation
+## Model Preparation
 
-```sh
-pip3 install seaborn thop timm einops
-
-git clone --depth 1 https://gitee.com/deep-spark/deepsparkhub-GPL.git
-cd cv/detection/mamba-yolo/pytorch
-
-cd selective_scan && pip install . && cd ..
-pip install -v -e .
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```sh
 coco2017
@@ -44,13 +38,26 @@ coco2017
 Modify the configuration file(data/coco.yaml)
 
 ```sh
-vim ultralytics/cfg/datasets/coco.yaml
 # path: the root of coco data
 # train: the relative path of train images
 # val: the relative path of valid images
+vim ultralytics/cfg/datasets/coco.yaml
+
+```
+
+### Install Dependencies
+
+```sh
+pip3 install seaborn thop timm einops
+
+git clone --depth 1 https://gitee.com/deep-spark/deepsparkhub-GPL.git
+cd cv/detection/mamba-yolo/pytorch
+
+cd selective_scan && pip install . && cd ..
+pip install -v -e .
 ```
 
-## Step 3: Training
+## Model Training
 
 ```sh
 python3 mbyolo_train.py --task train --data ultralytics/cfg/datasets/coco.yaml \
@@ -58,6 +65,6 @@ python3 mbyolo_train.py --task train --data ultralytics/cfg/datasets/coco.yaml \
 --amp  --project ./output_dir/mscoco --name mambayolo_n
 ```
 
-## Reference
+## References
 
 - [Mamba-YOLO](https://github.com/HZAI-ZJNU/Mamba-YOLO/tree/main)
diff --git a/cv/detection/maskrcnn/paddlepaddle/README.md b/cv/detection/maskrcnn/paddlepaddle/README.md
index 5ec4d0732..7efd04138 100644
--- a/cv/detection/maskrcnn/paddlepaddle/README.md
+++ b/cv/detection/maskrcnn/paddlepaddle/README.md
@@ -1,40 +1,44 @@
 # Mask R-CNN
 
-## Model description
+## Model Description
 
-Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance.
+Mask R-CNN is an advanced instance segmentation model that extends Faster R-CNN by adding a parallel branch for
+predicting object masks. It efficiently detects objects in an image while simultaneously generating high-quality
+segmentation masks for each instance. Mask R-CNN maintains the two-stage architecture of Faster R-CNN but introduces a
+fully convolutional network for mask prediction. This model achieves state-of-the-art performance on tasks like object
+detection, instance segmentation, and human pose estimation.
 
-## Step 1: Installing
+## Model Preparation
 
-```bash
-git clone --recursive https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip3 install -r requirements.txt
-python3 setup.py install --user
-```
+### Prepare Resources
 
-## Step 2: Download data
+```bash
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
 
-```
+cd PaddleDetection/
+# Get COCO Dataset
 python3 dataset/coco/download_coco.py
 ```
 
-## Step 3: Run Mask R-CNN
+### Install Dependencies
 
 ```bash
+pip install -r requirements.txt
+python3 setup.py install
+```
 
+## Model Training
+
+```bash
 export FLAGS_cudnn_exhaustive_search=True
 export FLAGS_cudnn_batchnorm_spatial_persistent=True
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+
 python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --use_vdl=true --eval
 ```
 
-## Results on BI-V100
-
-<div align="center">
-
-| GPU         | FP32                                 |
-| ----------- | ------------------------------------ |
-| 8 cards     | bbox=38.8,FPS=7.5,BatchSize=1        |
+## Model Results
 
-</div>
+| Model      | GPU        | FP32                          |
+|------------|------------|-------------------------------|
+| Mask R-CNN | BI-V100 x8 | bbox=38.8,FPS=7.5,BatchSize=1 |
diff --git a/cv/detection/maskrcnn/pytorch/README.md b/cv/detection/maskrcnn/pytorch/README.md
index 4c26992fb..9361b58f9 100644
--- a/cv/detection/maskrcnn/pytorch/README.md
+++ b/cv/detection/maskrcnn/pytorch/README.md
@@ -1,27 +1,22 @@
 # Mask R-CNN
 
-## Model description
+## Model Description
 
-Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance.
+Mask R-CNN is an advanced instance segmentation model that extends Faster R-CNN by adding a parallel branch for
+predicting object masks. It efficiently detects objects in an image while simultaneously generating high-quality
+segmentation masks for each instance. Mask R-CNN maintains the two-stage architecture of Faster R-CNN but introduces a
+fully convolutional network for mask prediction. This model achieves state-of-the-art performance on tasks like object
+detection, instance segmentation, and human pose estimation.
 
-## Step 1: Installing packages
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-dev
-
-pip3 install -r requirements.txt
-```
+## Model Preparation
 
-## Step 2: Preparing datasets and model
+### Prepare Resources
 
-Download from https://download.pytorch.org/models/resnet50-0676ba61.pth and mv to /root/.cache/torch/hub/checkpoints/
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -47,16 +42,36 @@ mkdir -p ./datasets/
 ln -s /path/to/coco2017 ./datasets/coco
 ```
 
-## Step 3: Training
+Download from <https://download.pytorch.org/models/resnet50-0676ba61.pth> and mv to /root/.cache/torch/hub/checkpoints/.
+
+```bash
+wget https://download.pytorch.org/models/resnet50-0676ba61.pth
+mkdir -p /root/.cache/torch/hub/checkpoints/
+mv resnet50-0676ba61.pth /root/.cache/torch/hub/checkpoints/
+```
+
+### Install Dependencies
 
-### Single Card
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-dev
+
+pip3 install -r requirements.txt
+```
+
+## Model Training
+
+```bash
+# Single Card
 python3 train.py --data-path ./datasets/coco --dataset coco --model maskrcnn_resnet50_fpn --lr 0.001 --batch-size 4
 
-### AMP 
+# AMP
 python3 train.py --data-path ./datasets/coco --dataset coco --model maskrcnn_resnet50_fpn --lr 0.001 --batch-size 1 --amp
 
-### DDP
-```
+# DDP
 python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
     --data-path ./datasets/coco --dataset coco --model maskrcnn_resnet50_fpn --wd 0.000001 --lr 0.001 --batch-size 4
-```
\ No newline at end of file
+```
diff --git a/cv/detection/oc_sort/paddlepaddle/README.md b/cv/detection/oc_sort/paddlepaddle/README.md
index abd14b6a7..3dcc31367 100644
--- a/cv/detection/oc_sort/paddlepaddle/README.md
+++ b/cv/detection/oc_sort/paddlepaddle/README.md
@@ -1,24 +1,17 @@
-# OC_SORT
+# OC-SORT
 
-## Model description
-Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes and when objects are in non-linear motion. It is designed by recognizing and fixing limitations in Kalman filter and SORT. It is flexible to integrate with different detectors and matching modules, such as appearance similarity. It remains, Simple, Online and Real-time.
+## Model Description
 
-## Step 1: Installation
-```bash
-git clone https://github.com/PaddlePaddle/PaddleDetection.git
-```
-
-```bash
-cd PaddleDetection
-yum install mesa-libGL -y
+OC-SORT (Observation-Centric SORT) is an advanced multi-object tracking algorithm that enhances traditional SORT by
+addressing limitations in Kalman filters and non-linear motion scenarios. It improves tracking robustness in crowded
+scenes and complex motion patterns while maintaining simplicity and real-time performance. OC_SORT focuses on
+observation-centric updates, making it more reliable for object tracking in challenging environments. It remains
+flexible for integration with various detectors and matching modules, offering improved accuracy without compromising
+speed.
 
-pip3 install -r requirements.txt
-pip3 install protobuf==3.20.1
-pip3 install urllib3==1.26.6
-pip3 install scikit-learn
-```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
 - **MOT17_ch datasets**
 
@@ -27,7 +20,11 @@ cd dataset/mot
 git clone https://github.com/ifzhang/ByteTrack.git
 ```
 
-Download [MOT17](https://motchallenge.net/), [MOT20](https://motchallenge.net/), [CrowdHuman](https://www.crowdhuman.org/), [Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), [ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) and put them under <ByteTrack_HOME>/datasets in the following structure:
+Download [MOT17](https://motchallenge.net/), [MOT20](https://motchallenge.net/),
+[CrowdHuman](https://www.crowdhuman.org/),
+[Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md),
+[ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) and put them under
+<ByteTrack_HOME>/datasets in the following structure:
 
 ```bash
 datasets/
@@ -115,7 +112,6 @@ ln -s ByteTrack/datasets/mix_mot_ch mix_mot_ch
 
 Download [MOT17](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip) and put them under <PaddleDetection_HOME>/datasets/mot/ in the following structure:
 
-
 ```bash
 MOT17/
 └──images
@@ -153,11 +149,23 @@ cd <PaddleDetection_HOME>/datasets/mot/MOT17
 ln -s ../ByteTrack/datasets/mot/annotations ./
 ```
 
-## Step 3: Training
+### Install Dependencies
 
 ```bash
-cd PaddleDetection
+yum install mesa-libGL -y
+
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+cd PaddleDetection/
+
+pip3 install -r requirements.txt
+pip3 install protobuf==3.20.1
+pip3 install urllib3==1.26.6
+pip3 install scikit-learn
+```
 
+## Model Training
+
+```bash
 # mix_mot_ch datasets
 python3 -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml --eval --amp
 
@@ -168,11 +176,12 @@ python3 -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 to
 CUDA_VISIBLE_DEVICES=0 python3 tools/eval_mot.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --scaled=True
 ```
 
-## Results
+## Model Results
+
+| Model   | GPU        | DATASET           | IPS    | MOTA |
+|---------|------------|-------------------|--------|------|
+| OC-SORT | BI-V100 x8 | MOT-17 half train | 6.5907 | 57.5 |
 
-| GPUs        | DATASET   | IPS       | MOTA     |
-|-------------|-----------|-----------|----------|
-| BI-V100 x8  | MOT-17 half train| 6.5907 | 57.5 | 
+## References
 
-## Reference
 - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
diff --git a/cv/detection/oriented_reppoints/pytorch/README.md b/cv/detection/oriented_reppoints/pytorch/README.md
index 24db5781a..da3c445d9 100644
--- a/cv/detection/oriented_reppoints/pytorch/README.md
+++ b/cv/detection/oriented_reppoints/pytorch/README.md
@@ -1,45 +1,20 @@
 # Oriented RepPoints
 
-## Model description
+## Model Description
 
-In contrast to the generic object, aerial targets are often non-axis aligned with arbitrary orientations having
-the cluttered surroundings. Unlike the mainstreamed approaches regressing the bounding box orientations, this paper
-proposes an effective adaptive points learning approach to aerial object detection by taking advantage of the adaptive
-points representation, which is able to capture the geometric information of the arbitrary-oriented instances.
-To this end, three oriented conversion functions are presented to facilitate the classification and localization
-with accurate orientation. Moreover, we propose an effective quality assessment and sample assignment scheme for
-adaptive points learning toward choosing the representative oriented reppoints samples during training, which is
-able to capture the non-axis aligned features from adjacent objects or background noises. A spatial constraint is
-introduced to penalize the outlier points for roust adaptive learning. Experimental results on four challenging
-aerial datasets including DOTA, HRSC2016, UCAS-AOD and DIOR-R, demonstrate the efficacy of our proposed approach.
+Oriented RepPoints is an innovative object detection model designed for aerial imagery, where objects often appear in
+arbitrary orientations. It uses adaptive points representation to capture geometric information of non-axis aligned
+instances, offering more precise detection than traditional bounding box approaches. The model incorporates three
+oriented conversion functions for accurate classification and localization, along with a quality assessment scheme to
+handle cluttered backgrounds. It achieves state-of-the-art performance on aerial datasets like DOTA and HRSC2016.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+### Prepare Resources
 
-# Install mmdetection
-pip install mmdet==3.3.0
+#### Get the DOTA dataset
 
-# Install mmrotate
-git clone -b v1.0.0rc1 https://gitee.com/open-mmlab/mmrotate.git --depth=1
-cd mmrotate/
-pip install -v -e .
-sed -i 's/python /python3 /g' tools/dist_train.sh
-sed -i 's/3.1.0/3.4.0/g' mmrotate/__init__.py
-sed -i 's@points_range\s*=\s*torch\.arange\s*(\s*points\.shape\[0\]\s*)@&.to(points.device)@' mmrotate/models/task_modules/assigners/convex_assigner.py
-sed -i 's/from collections import Sequence/from collections.abc import Sequence/g' mmrotate/models/detectors/refine_single_stage.py
-```
-
-## Step 2: Preparing datasets
-
-### Get the DOTA dataset
-
-The dota dataset can be downloaded from [here](https://captain-whu.github.io/DOTA/dataset.html).
+The DOTA dataset can be downloaded from [here](https://captain-whu.github.io/DOTA/dataset.html).
 The data structure is as follows:
 
 ```bash
@@ -58,7 +33,7 @@ mmrotate/data/DOTA/
     └── labelTxt-v1.5
 ```
 
-### Split dota dataset
+#### Split the DOTA dataset
 
 Please crop the original images into 1024×1024 patches with an overlap of 200.
 
@@ -70,7 +45,7 @@ python3 tools/data/dota/split/img_split.py --base-json \
   tools/data/dota/split/split_configs/ss_test.json
 ```
 
-### Change root path in base config
+#### Change root path in base config
 
 Please change `data_root` in `configs/_base_/datasets/dotav1.py` to split DOTA dataset.
 
@@ -78,7 +53,29 @@ Please change `data_root` in `configs/_base_/datasets/dotav1.py` to split DOTA d
 sed -i 's#data/split_ss_dota1_5/#data/split_ss_dota/#g' configs/_base_/datasets/dotav15.py
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# Install mmdetection
+pip install mmdet==3.3.0
+
+# Install mmrotate
+git clone -b v1.0.0rc1 https://gitee.com/open-mmlab/mmrotate.git --depth=1
+cd mmrotate/
+pip install -v -e .
+sed -i 's/python /python3 /g' tools/dist_train.sh
+sed -i 's/3.1.0/3.4.0/g' mmrotate/__init__.py
+sed -i 's@points_range\s*=\s*torch\.arange\s*(\s*points\.shape\[0\]\s*)@&.to(points.device)@' mmrotate/models/task_modules/assigners/convex_assigner.py
+sed -i 's/from collections import Sequence/from collections.abc import Sequence/g' mmrotate/models/detectors/refine_single_stage.py
+```
+
+## Model Training
 
 ```bash
 # On single GPU
@@ -88,11 +85,12 @@ python3 tools/train.py configs/oriented_reppoints/oriented-reppoints-qbox_r50_fp
 bash tools/dist_train.sh configs/oriented_reppoints/oriented-reppoints-qbox_r50_fpn_1x_dota.py 8
 ```
 
-## Results
+## Model Results
+
+| Model              | GPU        | ACC        |
+|--------------------|------------|------------|
+| Oriented RepPoints | BI-V100 x8 | MAP=0.8265 |
 
-|     GPUs     | ACC | 
-|----------| ----------- |
-| BI-V100 x8 | MAP=0.8265 |
+## References
 
-## Reference
-[mmrotate](https://github.com/open-mmlab/mmrotate)
\ No newline at end of file
+- [mmrotate](https://github.com/open-mmlab/mmrotate)
diff --git a/cv/detection/picodet/paddlepaddle/README.md b/cv/detection/picodet/paddlepaddle/README.md
index b1b117c30..5bcdedcb2 100644
--- a/cv/detection/picodet/paddlepaddle/README.md
+++ b/cv/detection/picodet/paddlepaddle/README.md
@@ -1,65 +1,58 @@
 # PP-PicoDet
 
-## Model description
-    PicoDet is an ultra lightweight real-time object detection model that includes four different sizes of XS/S/M/L. 
-    By using structures such as TAL, ETA Head, and PAN, the accuracy of the model is improved. When deploying model inference, 
-    the model supports including post-processing in the network, thereby supporting direct output of prediction results.
+## Model Description
+
+PP-PicoDet is an ultra-lightweight real-time object detection model designed for efficient deployment. It comes in four
+sizes (XS/S/M/L) and incorporates advanced structures like TAL, ETA Head, and PAN to boost accuracy. The model supports
+end-to-end inference by including post-processing in the network, enabling direct prediction outputs. PP-PicoDet
+achieves an excellent balance between speed and accuracy, making it ideal for applications requiring real-time detection
+on resource-constrained devices.
+
+## Model Preparation
+
+### Prepare Resources
 
-## Step 1:Installation
 ```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
+```
+
+### Install Dependencies
+
+```bash
+pip install -r requirements.txt
 python3 setup.py install
+
 pip3 install protobuf==3.20.3
 pip3 install numba==0.56.4
-yum install mesa-libGl -y
-```
 
-## Step 2:Preparing datasets
-    Download the Coco datasets, Specify coco2017 to your Coco path in later training process. 
-    The Coco datesets path structure should look like(assuming coco2017 is used):
-```bash
-coco2017
-├── annotations
-│   ├── instances_train2017.json
-│   ├── instances_val2017.json
-│   └── ...
-├── train2017
-│   ├── 000000000009.jpg
-│   ├── 000000000025.jpg
-│   └── ...
-├── val2017
-│   ├── 000000000139.jpg
-│   ├── 000000000285.jpg
-│   └── ...
-├── train2017.txt
-├── val2017.txt
-└── ...
+yum install mesa-libGl -y
 ```
 
-## Step 3:Training
+## Model Training
 
-assuming we are going to train picodet-l, the model config file is 'configs/picodet/picodet_l_640_coco_lcnet.yml'
-vim configs/datasets/coco_detection.yml, set 'dataset_dir' in the configuration file to coco2017, then start trainging.
+Assuming we are going to train picodet-l, the model config file is 'configs/picodet/picodet_l_640_coco_lcnet.yml' vim
+configs/datasets/coco_detection.yml, set 'dataset_dir' in the configuration file to coco2017, then start trainging.
 
-single gpu:
 ```bash
+# Single GPU
 export CUDA_VISIBLE_DEVICES=0
 python3 tools/train.py -c configs/picodet/picodet_l_640_coco_lcnet.yml --eval
-```
-multi-gpu:
-```bash
+
+# Multi GPU
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/picodet/picodet_l_640_coco_lcnet.yml --eval
 ```
 
-## Results
+## Model Results
 
-| GPUs        | IPS       | mAP0.5:0.95  | mAP0.5       |
-|-------------|-----------|--------------|--------------|
-| BI-V100 x 8 | 19.84     | 41.2         | 58.2         |
+| Model      | GPU        | IPS   | mAP0.5:0.95 | mAP0.5 |
+|------------|------------|-------|-------------|--------|
+| PP-PicoDet | BI-V100 x8 | 19.84 | 41.2        | 58.2   |
 
-## Reference:
+## References
 
-https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/picodet/README_en.md
\ No newline at end of file
+- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/picodet/README_en.md)
diff --git a/cv/detection/pp-yoloe/paddlepaddle/README.md b/cv/detection/pp-yoloe/paddlepaddle/README.md
index fc00e046c..817b1667c 100644
--- a/cv/detection/pp-yoloe/paddlepaddle/README.md
+++ b/cv/detection/pp-yoloe/paddlepaddle/README.md
@@ -1,27 +1,36 @@
 # PP-YOLOE
 
-## Model description
+## Model Description
 
-PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware.
+PP-YOLOE is a high-performance single-stage anchor-free object detection model built upon PP-YOLOv2. It outperforms
+various popular YOLO variants while maintaining deployment-friendly characteristics. The model comes in multiple sizes
+(s/m/l/x) configurable through width and depth multipliers. PP-YOLOE avoids special operators like Deformable
+Convolution, ensuring compatibility with diverse hardware. It achieves excellent speed-accuracy trade-offs, making it
+suitable for real-time applications. The model's efficient architecture and optimization techniques make it a top choice
+for object detection tasks.
 
-## Step 1: Installing
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip3 install -r requirements.txt
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## Step 2: Prepare Datasets
+### Install Dependencies
 
 ```bash
-python3 dataset/coco/download_coco.py
+pip install -r requirements.txt
+python3 setup.py install
 ```
 
-## Step 3: Training
+## Model Training
 
 ```bash
-# Train
 export FLAGS_cudnn_exhaustive_search=True
 export FLAGS_cudnn_batchnorm_spatial_persistent=True
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
@@ -33,6 +42,6 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 \
     -o log_iter=5
 ```
 
-## Reference
+## References
 
 - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
diff --git a/cv/detection/pp_yoloe+/paddlepaddle/README.md b/cv/detection/pp_yoloe+/paddlepaddle/README.md
index 5b4fbf6fc..851a0524e 100644
--- a/cv/detection/pp_yoloe+/paddlepaddle/README.md
+++ b/cv/detection/pp_yoloe+/paddlepaddle/README.md
@@ -1,29 +1,34 @@
 # PP-YOLOE+
 
-## Model description
+## Model Description
 
-PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware.
+PP-YOLOE+ is an enhanced version of PP-YOLOE, a high-performance anchor-free object detection model. It builds upon
+PP-YOLOv2's architecture, offering improved accuracy and efficiency. The model comes in multiple sizes (s/m/l/x)
+configurable through width and depth multipliers. PP-YOLOE+ maintains hardware compatibility by avoiding special
+operators while achieving state-of-the-art speed-accuracy trade-offs. Its optimized architecture makes it ideal for
+real-time applications, offering superior detection performance across various scenarios and hardware platforms.
 
-## Step 1: Installation
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
 git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleYOLO.git
 cd PaddleYOLO/
-pip3 install -r requirements.txt
+
+python3 dataset/coco/download_coco.py
 ```
 
-## Step 2: Preparing datasets
+### Install Dependencies
 
 ```bash
-python3 dataset/coco/download_coco.py
+pip3 install -r requirements.txt
 ```
 
-## Step 3: Training
+## Model Training
 
-> **HINT:**
-> 
+> HINT:
 > --eval : training with evaluation
->
 > --amp  : Mixed-precision training
 
 ```bash
@@ -49,14 +54,13 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c ${config} -o weights=${weights}
 # CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c ${config} -o weights=${weights} --infer_dir=demo/ --draw_threshold=0.5
 ```
 
-## Results
-
+## Model Results
 
-| GPUs       | FPS        | ACC                      |
-| ------------ | ------------ | -------------------------- |
-| BI-V100 x8 | ips:6.3293 | Best test bbox ap: 0.528 |
+| Model     | GPU        | FPS        | ACC                      |
+|-----------|------------|------------|--------------------------|
+| PP-YOLOE+ | BI-V100 x8 | ips:6.3293 | Best test bbox ap: 0.528 |
 
-## Reference
+## References
 
+- [Paper](https://arxiv.org/pdf/2203.16250v3.pdf)
 - [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)
-- [PP-YOLOE](https://arxiv.org/pdf/2203.16250v3.pdf)
diff --git a/cv/detection/pvanet/pytorch/README.md b/cv/detection/pvanet/pytorch/README.md
index 84a0bea55..ef0709db5 100755
--- a/cv/detection/pvanet/pytorch/README.md
+++ b/cv/detection/pvanet/pytorch/README.md
@@ -1,21 +1,22 @@
 # PVANet
 
-## Model description
+## Model Description
 
-In object detection, reducing computational cost is as important as improving accuracy for most practical usages. This paper proposes a novel network structure, which is an order of magnitude lighter than other state-of-the-art networks while maintaining the accuracy. Based on the basic principle of more layers with less channels, this new deep neural network minimizes its redundancy by adopting recent innovations including C.ReLU and Inception structure. We also show that this network can be trained efficiently to achieve solid results on well-known object detection benchmarks: 84.9% and 84.2% mAP on VOC2007 and VOC2012 while the required compute is less than 10% of the recent ResNet-101.
+PVANet is an efficient deep learning model for object detection, designed to minimize computational cost while
+maintaining high accuracy. It employs a lightweight architecture based on the principle of "more layers with fewer
+channels," incorporating innovations like C.ReLU and Inception structures. PVANet achieves competitive results on VOC
+benchmarks with significantly reduced computational requirements compared to heavier networks. Its optimized design
+makes it suitable for real-time applications where both speed and accuracy are crucial.
 
+## Model Preparation
 
-## Step 1: Installing packages
+### Prepare Resources
 
-```shell
-pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm'
-```
-
-## Step 2: Preparing COCO dataset
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -36,74 +37,23 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training on COCO dataset
-
-### Multiple GPUs on one machine
+### Install Dependencies
 
 ```shell
-bash train_pvanet_dist.sh --data-path /path/to/coco2017/ --dataset coco
+pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm'
 ```
 
-### On single GPU
+## Model Training
 
 ```shell
-python3 train.py --data-path /path/to/coco2017/ --dataset coco
-```
-### Arguments
-
-Ref: [torchvision](../../torchvision/pytorch/README.md)
-
-## Parameters
+# Multiple GPUs on one machine
+bash train_pvanet_dist.sh --data-path /path/to/coco2017/ --dataset coco
 
-```
-  --data-path DATA_PATH
-                        dataset
-  --dataset DATASET     dataset
-  --device DEVICE       device
-  -b BATCH_SIZE, --batch-size BATCH_SIZE
-                        images per gpu, the total batch size is $NGPU x batch_size
-  --epochs N            number of total epochs to run
-  -j N, --workers N     number of data loading workers (default: 4)
-  --lr LR               initial learning rate, 0.02 is the default value for training on 8 gpus and 2 images_per_gpu
-  --momentum M          momentum
-  --wd W, --weight-decay W
-                        weight decay (default: 1e-4)
-  --lr-scheduler LR_SCHEDULER
-                        the lr scheduler (default: multisteplr)
-  --lr-step-size LR_STEP_SIZE
-                        decrease lr every step-size epochs (multisteplr scheduler only)
-  --lr-steps LR_STEPS [LR_STEPS ...]
-                        decrease lr every step-size epochs (multisteplr scheduler only)
-  --lr-gamma LR_GAMMA   decrease lr by a factor of lr-gamma (multisteplr scheduler only)
-  --print-freq PRINT_FREQ
-                        print frequency
-  --output-dir OUTPUT_DIR
-                        path where to save
-  --resume RESUME       resume from checkpoint
-  --start_epoch START_EPOCH
-                        start epoch
-  --aspect-ratio-group-factor ASPECT_RATIO_GROUP_FACTOR
-  --rpn-score-thresh RPN_SCORE_THRESH
-                        rpn score threshold for faster-rcnn
-  --trainable-backbone-layers TRAINABLE_BACKBONE_LAYERS
-                        number of trainable layers of backbone
-  --data-augmentation DATA_AUGMENTATION
-                        data augmentation policy (default: hflip)
-  --sync-bn             Use sync batch norm
-  --test-only           Only test the model
-  --pretrained          Use pre-trained models from the modelzoo
-  --local_rank LOCAL_RANK
-                        Local rank
-  --world-size WORLD_SIZE
-                        number of distributed processes
-  --dist-url DIST_URL   url used to set up distributed training
-  --nhwc                Use NHWC
-  --padding-channel       Padding the channels of image to 4
-  --amp                 Automatic Mixed Precision training
-  --seed SEED           Random seed
+# On single GPU
+python3 train.py --data-path /path/to/coco2017/ --dataset coco
 ```
 
-## Reference
+## References
 
-https://github.com/sanghoon/pytorch_imagenet
-https://github.com/pytorch/vision
\ No newline at end of file
+- [pytorch_imagenet](https://github.com/sanghoon/pytorch_imagenet)
+- [vision](https://github.com/pytorch/vision)
\ No newline at end of file
diff --git a/cv/detection/reppoints_mmdet/pytorch/README.md b/cv/detection/reppoints_mmdet/pytorch/README.md
index 038872b58..b8849f69a 100644
--- a/cv/detection/reppoints_mmdet/pytorch/README.md
+++ b/cv/detection/reppoints_mmdet/pytorch/README.md
@@ -1,29 +1,22 @@
 # RepPoints
 
-## Model description
+## Model Description
 
-Modern object detectors rely heavily on rectangular bounding boxes, such as anchors, proposals and the final predictions, to represent objects at various recognition stages. The bounding box is convenient to use but provides only a coarse localization of objects and leads to a correspondingly coarse extraction of object features. In this paper, we present RepPoints(representative points), a new finer representation of objects as a set of sample points useful for both localization and recognition. Given ground truth localization and recognition targets for training, RepPoints learn to automatically arrange themselves in a manner that bounds the spatial extent of an object and indicates semantically significant local areas. They furthermore do not require the use of anchors to sample a space of bounding boxes. We show that an anchor-free object detector based on RepPoints can be as effective as the state-of-the-art anchor-based detection methods, with 46.5 AP and 67.4 AP50 on the COCO test-dev detection benchmark, using ResNet-101 model.
+RepPoints is an innovative object detection model that replaces traditional bounding boxes with a set of representative
+points for more precise object localization and feature extraction. This anchor-free approach learns to arrange points
+that bound objects and indicate semantically significant areas. RepPoints achieves state-of-the-art performance on COCO
+benchmarks while eliminating the need for anchor boxes. Its finer representation enables better object understanding and
+more accurate detection, particularly for complex shapes and overlapping objects.
 
-## Step 1: Installation
-RepPoints model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
+## Model Preparation
 
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -44,7 +37,24 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training 
+### Install Dependencies
+
+RepPoints model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first.
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -61,12 +71,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/reppoints/reppoints-moment_r101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py 8
 ```
 
-## Results
+## Model Results
 
-|     GPUs    | FP32     | 
-| ----------- | -------- |
-| BI-V100 x8  | MAP=43.2 |
+| Model     | GPU        | FP32     |
+|-----------|------------|----------|
+| RepPoints | BI-V100 x8 | MAP=43.2 |
 
-## Reference
+## References
 
-- [RepPoints: Point Set Representation for Object Detection](https://arxiv.org/abs/1904.11490)
+- [Paper](https://arxiv.org/abs/1904.11490)
diff --git a/cv/detection/retinanet/paddlepaddle/README.md b/cv/detection/retinanet/paddlepaddle/README.md
index aa227a349..11e4287ad 100644
--- a/cv/detection/retinanet/paddlepaddle/README.md
+++ b/cv/detection/retinanet/paddlepaddle/README.md
@@ -1,24 +1,33 @@
 # RetinaNet
 
-## Model description
-The paper proposes a method to convert a deep learning object detector into an equivalent spiking neural network. The aim is to provide a conversion framework that is not constrained to shallow network structures and classification problems as in state-of-the-art conversion libraries. The results show that models of higher complexity, such as the RetinaNet object detector, can be converted with limited loss in performance.
+## Model Description
 
-## Step 1: Installation
+RetinaNet is a state-of-the-art object detection model that addresses the class imbalance problem in dense detection
+through its novel Focal Loss. It uses a Feature Pyramid Network (FPN) backbone to detect objects at multiple scales
+efficiently. RetinaNet achieves high accuracy while maintaining competitive speed, making it suitable for various
+detection tasks. Its single-stage architecture combines the accuracy of two-stage detectors with the speed of
+single-stage approaches, offering an excellent balance between performance and efficiency.
+
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip install -r requirements.txt
-python3 setup.py install
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## Step 2: Preparing datasets
+### Install Dependencies
 
 ```bash
-python3 dataset/coco/download_coco.py
+pip install -r requirements.txt
+python3 setup.py install
 ```
 
-## Step 3: Training
+## Model Training
 
 ```bash
 # 8 GPUs
@@ -28,16 +37,16 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c co
 # 1 GPU
 export CUDA_VISIBLE_DEVICES=0
 python3 tools/train.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --eval
-## Hint：Default LR is for "8x GPU", modify it if you're using single card for training (e.g. divide by 8).
 
+## Hint：Default LR is for "8x GPU", modify it if you're using single card for training (e.g. divide by 8).
 ```
 
-## Results
+## Model Results
 
-| GPUs        | FPS  | Train Epochs | Box AP |
-|-------------|------|--------------|--------|
-| BI-V100 x 8 | 6.58 | 12           | 37.3   |
+| Model     | GPU        | FPS  | Train Epochs | Box AP |
+|-----------|------------|------|--------------|--------|
+| RetinaNet | BI-V100 x8 | 6.58 | 12           | 37.3   |
 
-## Reference
+## References
 
 - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
\ No newline at end of file
diff --git a/cv/detection/retinanet/pytorch/README.md b/cv/detection/retinanet/pytorch/README.md
index 47cfbd6f3..0b0dd12f7 100644
--- a/cv/detection/retinanet/pytorch/README.md
+++ b/cv/detection/retinanet/pytorch/README.md
@@ -1,22 +1,22 @@
 # RetinaNet
 
-## Model description
+## Model Description
 
-The paper proposes a method to convert a deep learning object detector into an equivalent spiking neural network. The aim is to provide a conversion framework that is not constrained to shallow network structures and classification problems as in state-of-the-art conversion libraries. The results show that models of higher complexity, such as the RetinaNet object detector, can be converted with limited loss in performance.
+RetinaNet is a state-of-the-art object detection model that addresses the class imbalance problem in dense detection
+through its novel Focal Loss. It uses a Feature Pyramid Network (FPN) backbone to detect objects at multiple scales
+efficiently. RetinaNet achieves high accuracy while maintaining competitive speed, making it suitable for various
+detection tasks. Its single-stage architecture combines the accuracy of two-stage detectors with the speed of
+single-stage approaches, offering an excellent balance between performance and efficiency.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```shell
-
-pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm'
-
-```
-
-## Step 2: Preparing COCO dataset
+### Prepare Resources
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -37,20 +37,19 @@ coco2017
 └── ...
 ```
 
-## Step 3: Training on COCO dataset
+### Install Dependencies
 
-Download the [COCO Dataset](https://cocodataset.org/#home) 
+```shell
+pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm'
+```
 
-### Multiple GPUs on one machine
+## Model Training
 
 ```shell
+# Multiple GPUs on one machine
 bash train_retinanet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco
 ```
 
-### Parameters
-
-Ref: [torchvision](../../torchvision/pytorch/README.md)
-
-## Reference
+## References
 
-https://github.com/pytorch/vision
\ No newline at end of file
+- [vision](https://github.com/pytorch/vision)
diff --git a/cv/detection/rt-detr/pytorch/README.md b/cv/detection/rt-detr/pytorch/README.md
index 86d59b70f..449d226d6 100644
--- a/cv/detection/rt-detr/pytorch/README.md
+++ b/cv/detection/rt-detr/pytorch/README.md
@@ -1,16 +1,16 @@
 # RT-DETR
 
-## Model description
+## Model Description
 
-RT-DETR is specifically optimized for real-time applications, making it suitable for scenarios where low latency is crucial. It achieves this by incorporating design modifications that improve efficiency without sacrificing accuracy.
+RT-DETR is a real-time variant of the DETR (DEtection TRansformer) model, optimized for efficient object detection. It
+maintains the transformer-based architecture of DETR while introducing modifications to reduce latency and improve
+speed. RT-DETR achieves competitive accuracy with significantly faster inference times, making it suitable for
+applications requiring real-time performance. The model preserves the end-to-end detection capabilities of DETR while
+addressing its computational challenges, offering a practical solution for time-sensitive detection tasks.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-pip3 install -r requirements.txt
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
@@ -27,34 +27,28 @@ Modify config "img_folder" and "ann_file" locaton in the configuration file(./co
 vim ./configs/dataset/coco_detection.yml
 ```
 
-## Step 3: Training
-
-### Training on a Single GPU
+### Install Dependencies
 
 ```bash
-# training on single-gpu
-export CUDA_VISIBLE_DEVICES=0
-python3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
+pip3 install -r requirements.txt
 ```
 
-### Training on Multiple GPUs
+## Model Training
 
 ```bash
-# train on multi-gpu
+# Training on single-gpu
+export CUDA_VISIBLE_DEVICES=0
+python3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
+
+# Train on multi-gpu
 export CUDA_VISIBLE_DEVICES=0,1,2,3
 torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
-```
 
-### Evaluation on Multiple GPUs
-
-```bash
-# val on multi-gpu
+# Evaluation on multi-gpu
 export CUDA_VISIBLE_DEVICES=0,1,2,3
 torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -r path/to/checkpoint --test-only
 ```
 
-## Results
-
-## Reference
+## References
 
 [RT-DERT](https://github.com/lyuwenyu/RT-DETR/tree/main/rtdetr_pytorch)
diff --git a/cv/detection/rtmdet/pytorch/README.md b/cv/detection/rtmdet/pytorch/README.md
index 1bfe80367..f5feef5d0 100644
--- a/cv/detection/rtmdet/pytorch/README.md
+++ b/cv/detection/rtmdet/pytorch/README.md
@@ -1,31 +1,16 @@
 # RTMDet
 
-> [RTMDet: An Empirical Study of Designing Real-Time Object Detectors](https://arxiv.org/pdf/2212.07784v2.pdf)
+## Model Description
 
-<!-- [ALGORITHM] -->
+RTMDet is a highly efficient real-time object detection model designed to surpass YOLO series performance. It features a
+balanced architecture with large-kernel depth-wise convolutions and dynamic label assignment using soft labels. RTMDet
+achieves state-of-the-art accuracy with exceptional speed, reaching 300+ FPS on modern GPUs. The model offers various
+sizes for different applications and excels in tasks like instance segmentation and rotated object detection. Its design
+provides insights for versatile real-time detection systems.
 
-## Model description
+## Model Preparation
 
-In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks.
-
-## Step 1: Installation
-
-RTMDet model uses the MMDetection toolbox. Before you run this model, you need to set up MMDetection first.
-
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
-
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
-
-## Step 2: Preparing datasets
+### Prepare Resources
 
 Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
 
@@ -49,7 +34,25 @@ coco2017
 ├── val2017.txt
 └── ...
 ```
-## Step 3: Training
+
+### Install Dependencies
+
+RTMDet model uses the MMDetection toolbox. Before you run this model, you need to set up MMDetection first.
+
+```bash
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
+```
+
+## Model Training
 
 ```bash
 # Make soft link to dataset
@@ -70,11 +73,13 @@ sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py 8
 ```
 
-## Results
+## Model Results
+
+| Model  | GPU     | FPS   | box AP |
+|--------|---------|-------|--------|
+| RTMDet | BI-V100 | 172.5 | 0.4090 |
 
-|GPUs|FPS|box AP|
-|:---:|:---:|:---:|
-|BI-V100|172.5|0.4090|
+## References
 
-## Reference
-- [mmdetection](https://github.com/open-mmlab/mmdetection)
\ No newline at end of file
+- [Paper](https://arxiv.org/pdf/2212.07784v2.pdf)
+- [mmdetection](https://github.com/open-mmlab/mmdetection)
diff --git a/cv/detection/solov2/paddlepaddle/README.md b/cv/detection/solov2/paddlepaddle/README.md
index 1c3241de0..2cda7326e 100644
--- a/cv/detection/solov2/paddlepaddle/README.md
+++ b/cv/detection/solov2/paddlepaddle/README.md
@@ -1,47 +1,49 @@
 # SOLOv2
 
-## Model description
-SOLOv2 (Segmenting Objects by Locations) is a fast instance segmentation framework with strong performance. 
+## Model Description
 
-## 克隆代码
+SOLOv2 is an efficient instance segmentation framework that directly segments objects by predicting instance masks based
+on their spatial locations. It eliminates the need for bounding box detection and mask refinement, offering a simpler
+and faster approach compared to traditional methods. SOLOv2 introduces dynamic convolutions and matrix non-maximum
+suppression to improve mask quality and processing speed. The model achieves strong performance on instance segmentation
+tasks while maintaining real-time capabilities, making it suitable for various computer vision applications.
 
-```
+## Model Preparation
+
+### Prepare Resources
+
+```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## 安装PaddleDetection
+### Install Dependencies
 
-```
-cd PaddleDetection
+```bash
 pip install -r requirements.txt
 python3 setup.py install
 ```
 
-## 下载COCO数据集
+## Model Training
 
-```
-python3 dataset/coco/download_coco.py
-```
-
-## 运行代码
-
-```
-# GPU多卡训练
+```bash
+# Multi GPU
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --eval
 
-# GPU单卡训练
+# Single GPU
 export CUDA_VISIBLE_DEVICES=0
-
 python3 tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --eval
 
-# 注：默认学习率是适配多GPU训练(8x GPU)，若使用单GPU训练，须对应调整config中的学习率（例如，除以8）
-
+# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training,
+# you need to adjust the learning rate in the config accordingly (e.g., divide by 8).
 ```
 
-## Results on BI-V100
+## Model Results
 
-| GPUs | FPS | Train Epochs | mAP  |
-|------|-----|--------------|------|
-| 1x8  | 6.39 | 12           | 35.4 |
\ No newline at end of file
+| Model  | GPU        | FPS  | Train Epochs | mAP  |
+|--------|------------|------|--------------|------|
+| SOLOv2 | BI-V100 x8 | 6.39 | 12           | 35.4 |
diff --git a/cv/detection/ssd/mindspore/README.md b/cv/detection/ssd/mindspore/README.md
index 92eb19e78..f114071d8 100755
--- a/cv/detection/ssd/mindspore/README.md
+++ b/cv/detection/ssd/mindspore/README.md
@@ -1,75 +1,90 @@
 # SSD
-## Model description
-SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape.Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.
 
-[Paper](https://arxiv.org/abs/1512.02325):   Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg.European Conference on Computer Vision (ECCV), 2016 (In press).
-## Step 1: Installing
-```
-pip3 install -r requirements.txt
-pip3 install easydict
-```
-## Step 2: Prepare Datasets
-download dataset in /home/datasets/cv/coco2017
+## Model Description
+
+SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and
+class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across
+multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle
+objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks.
+
+## Model Preparation
 
-Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
+### Prepare Resources
+
+Download dataset in /home/datasets/cv/coco2017
+
+Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant
+domain/network architecture. In the following sections, we will introduce how to run the scripts using the related
+dataset below.
 
 Dataset used: [COCO2017](<http://images.cocodataset.org/>)
 
 - Dataset size：19G
-    - Train：18G，118000 images  
-    - Val：1G，5000 images
-    - Annotations：241M，instances，captions，person_keypoints etc
+  - Train：18G，118000 images  
+  - Val：1G，5000 images
+  - Annotations：241M，instances，captions，person_keypoints etc
 - Data format：image and json files
-    - Note：Data will be processed in dataset.py
-
-  Change the `coco_root` and other settings you need in `src/config.py`. The directory structure is as follows:
-
-  ```shell
-  .
-  └─coco_dataset
-    ├─annotations
-      ├─instance_train2017.json
-      └─instance_val2017.json
-    ├─val2017
-    └─train2017
-  ```
-  If your own dataset is used. **Select dataset to other when run script.**
-      Organize the dataset information into a TXT file, each row in the file is as follows:
-
-      ```shell
-      train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2
-      ```
-
-      Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `image_dir`(dataset directory) and the relative path in `anno_path`(the TXT file path), `image_dir` and `anno_path` are setting in `src/config.py`.
-# [Pretrained models](#contents)
-Please [resnet50](https://pan.baidu.com/s/1rrhsZqDVmNxR-bCnMPvFIw?pwd=8766) download resnet50.ckpt here
+  - Note：Data will be processed in dataset.py
+
+Change the `coco_root` and other settings you need in `src/config.py`. The directory structure is as follows:
+
+```bash
+.
+└─coco_dataset
+  ├─annotations
+    ├─instance_train2017.json
+    └─instance_val2017.json
+  ├─val2017
+  └─train2017
+```
+
+If your own dataset is used. **Select dataset to other when run script.**
+    Organize the dataset information into a TXT file, each row in the file is as follows:
+
+```bash
+train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2
 ```
+
+Each row is an image annotation which split by space, the first column is a relative path of image, the others are box
+and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the
+`image_dir`(dataset directory) and the relative path in `anno_path`(the TXT file path), `image_dir` and `anno_path` are
+setting in `src/config.py`.
+
+Download [resnet50.ckpt](https://pan.baidu.com/s/1rrhsZqDVmNxR-bCnMPvFIw?pwd=8766).
+
+```bash
 mv resnet50.ckpt ./ckpt
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+pip3 install -r requirements.txt
+pip3 install easydict
 ```
+
+## Model Training
+
+```bash
 mpirun -allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-stdout \
-python3 train.py \
---run_distribute=True \
---lr=0.05 \
---dataset=coco \
---device_num=8 \
---loss_scale=1 \
---device_target="GPU" \
---epoch_size=60 \
---config_path=./config/ssd_resnet50_fpn_config_gpu.yaml \
---output_path './output' > log.txt 2>&1 &
+python3 train.py --run_distribute=True \
+                 --lr=0.05 \
+                 --dataset=coco \
+                 --device_num=8 \
+                 --loss_scale=1 \
+                 --device_target="GPU" \
+                 --epoch_size=60 \
+                 --config_path=./config/ssd_resnet50_fpn_config_gpu.yaml \
+                 --output_path './output' > log.txt 2>&1 &
 ```
-### [Evaluation result]
-## Results on BI-V100
 
-| GPUs | per step time  |  MAP  |
-|------|--------------  |-------|
-|  1*8 |   0.814s       | 0.374 |
+## Model Results on BI-V100
+
+| Model | GPU         | per step time | MAP   |
+|-------|-------------|---------------|-------|
+| SSD   | BI-V100 x8  | 0.814s        | 0.374 |
+| SSD   | NV-V100s x8 | 0.797s        | 0.369 |
 
-## Results on NV-V100s
+## References
 
-| GPUs | per step time  |  MAP  |
-|------|--------------  |-------|
-|  1*8 |   0.797s       | 0.369 |
\ No newline at end of file
+- [Paper](https://arxiv.org/abs/1512.02325)
diff --git a/cv/detection/ssd/paddlepaddle/README.md b/cv/detection/ssd/paddlepaddle/README.md
index 169991f7f..ce64580a5 100644
--- a/cv/detection/ssd/paddlepaddle/README.md
+++ b/cv/detection/ssd/paddlepaddle/README.md
@@ -1,28 +1,36 @@
 # SSD
 
-## Model description
+## Model Description
 
-We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd .
+SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and
+class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across
+multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle
+objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks.
 
-## Step 1: Installing
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip3 install -r requirements.txt
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## Step 2: Prepare Datasets
+### Install Dependencies
 
 ```bash
-python3 dataset/coco/download_coco.py
+pip install -r requirements.txt
+python3 setup.py install
 ```
 
-## Step 3: Training
+## Model Training
 
 Notice: modify configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml file, modify the datasets path as yours.
 
-```
+```bash
 # Train
 export FLAGS_cudnn_exhaustive_search=True
 export FLAGS_cudnn_batchnorm_spatial_persistent=True
@@ -30,17 +38,12 @@ export CUDA_VISIBLE_DEVICES=0,1
 python3 -u -m paddle.distributed.launch --gpus 0,1 tools/train.py -c configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml --eval
 ```
 
-## Results on BI-V100
-
-<div align="center">
-
-
-| GPU     | FP32                              |
-| --------- | ----------------------------------- |
-| 2 cards | bbox=73.62,FPS=45.49,BatchSize=32 |
+## Model Results
 
-</div>
+| Model | GPU        | FP32                              |
+|-------|------------|-----------------------------------|
+| SSD   | BI-V100 x2 | bbox=73.62,FPS=45.49,BatchSize=32 |
 
-## Reference
+## References
 
 - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)
diff --git a/cv/detection/ssd/pytorch/README.md b/cv/detection/ssd/pytorch/README.md
index cd2036491..7535f4023 100644
--- a/cv/detection/ssd/pytorch/README.md
+++ b/cv/detection/ssd/pytorch/README.md
@@ -1,16 +1,21 @@
 # SSD
 
-## Model description
+## Model Description
 
-We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd .
+SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and
+class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across
+multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle
+objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks.
 
-## Prepare
+## Model Preparation
 
-### Download dataset
+### Prepare Resources
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -37,33 +42,29 @@ cd /home/data/perf/ssd
 ln -s /path/to/coco/ /home/data/perf/ssd
 ```
 
-### Download backbone
+Download backbone.
 
 ```bash
 cd /home/data/perf/ssd
 wget https://download.pytorch.org/models/resnet34-333f7ec4.pth
 ```
 
-
-### Training
-
-### Multiple GPUs on one machine
+## Model Training
 
 ```bash
-## 'deepsparkhub_root_path' is the root path of deepsparkhub.
+# Multiple GPUs on one machine
 cd {deepsparkhub_root_path}/cv/detection/ssd/pytorch/base
 source ../iluvatar/config/environment_variables.sh
 python3  prepare.py --name iluvatar --data_dir /home/data/perf/ssd
 bash run_training.sh --name iluvatar --config V100x1x8 --data_dir /home/data/perf/ssd --backbone_path /home/data/perf/ssd/resnet34-333f7ec4.pth
 ```
 
-## Results on BI-V100
-
-| GPUs | Batch Size | FPS | Train Epochs | mAP  |
-|------|------------|-----|--------------|------|
-| 1x8  | 192        | 2858 | 65           | 0.23 |
+## Model Results
 
+| Model | GPU        | Batch Size | FPS  | Train Epochs | mAP  |
+|-------|------------|------------|------|--------------|------|
+| SSD   | BI-V100 x8 | 192        | 2858 | 65           | 0.23 |
 
+## References
 
-## Reference
-https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/ssd/implementations/pytorch
+- [mlcommons](https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/ssd/implementations/pytorch)
diff --git a/cv/detection/ssd/tensorflow/README.md b/cv/detection/ssd/tensorflow/README.md
index a36dbc7ce..fe9ff61da 100644
--- a/cv/detection/ssd/tensorflow/README.md
+++ b/cv/detection/ssd/tensorflow/README.md
@@ -1,50 +1,57 @@
 # SSD
 
-## Model description
-
-We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd .
-
-## Prepare
-
-### Download the VOC dataset
-```
-cd dataset
-```
-Download[ Pascal VOC Dataset](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) and reorganize the directory as follows:
-```
-VOCROOT/
-		   |->VOC2007/
-		   |    |->Annotations/
-		   |    |->ImageSets/
-		   |    |->...
-		   |->VOC2012/   # use it
-		   |    |->Annotations/
-		   |    |->ImageSets/
-		   |    |->...
-		   |->VOC2007TEST/
-		   |    |->Annotations/
-		   |    |->...
+## Model Description
+
+SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and
+class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across
+multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle
+objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks.
+
+## Model Preparation
+
+### Prepare Resources
+
+Download [Pascal VOC Dataset](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) and reorganize the
+directory as follows:
+
+```bash
+dataset/VOCROOT/
+     |->VOC2007/
+     |    |->Annotations/
+     |    |->ImageSets/
+     |    |->...
+     |->VOC2012/   # use it
+     |    |->Annotations/
+     |    |->ImageSets/
+     |    |->...
+     |->VOC2007TEST/
+     |    |->Annotations/
+     |    |->...
 ```
+
 VOCROOT is your path of the Pascal VOC Dataset.
-```
+
+```bash
 mkdir tfrecords
 pip3 install tf_slim
 python3 convert_voc_sample_tfrecords.py --dataset_directory=./ --output_directory=tfrecords --train_splits VOC2012_sample --validation_splits VOC2012_sample
 
-cd ..
+cd ../
 ```
-### Download the checkpoint
-Download the pre-trained VGG-16 model (reduced-fc) from [here](https://drive.google.com/drive/folders/184srhbt8_uvLKeWW_Yo8Mc5wTyc0lJT7) and put them into one sub-directory named 'model' (we support SaverDef.V2 by default, the V1 version is also available for sake of compatibility).
 
-### Train
-#### multi gpus
-```
-python3 train_ssd.py --batch_size 16
-````
+Download the pre-trained VGG-16 model (reduced-fc) from
+[Google Drive](https://drive.google.com/drive/folders/184srhbt8_uvLKeWW_Yo8Mc5wTyc0lJT7) and put them into one sub-directory
+named 'model' (we support SaverDef.V2 by default, the V1 version is also available for sake of compatibility).
 
+## Model Training
+
+```bash
+# multi gpus
+python3 train_ssd.py --batch_size 16
+```
 
-## Result
+## Model Results
 
-|               | acc      |       fps |
-| ---           | ---       | ---       |
-|    multi_card |  0.783513   | 3.177  |
+| Model | GPU     | acc      | fps   |
+|-------|---------|----------|-------|
+| SSD   | BI-V100 | 0.783513 | 3.177 |
diff --git a/cv/detection/ssd/tensorflow/readme_origin.md b/cv/detection/ssd/tensorflow/readme_origin.md
index f2b3a20a6..265f025aa 100644
--- a/cv/detection/ssd/tensorflow/readme_origin.md
+++ b/cv/detection/ssd/tensorflow/readme_origin.md
@@ -65,7 +65,7 @@ All the codes was tested under TensorFlow 1.6, Python 3.5, Ubuntu 16.04 with CUD
 
 ***This repo is just created recently, any contribution will be welcomed.***
 
-## Results (VOC07 Metric)
+## Model Results (VOC07 Metric)
 
 This implementation(SSD300-VGG16) yield **mAP 77.8%** on PASCAL VOC 2007 test dataset(the original performance described in the paper is 77.2%mAP), the details are as follows:
 
diff --git a/cv/detection/yolof/pytorch/README.md b/cv/detection/yolof/pytorch/README.md
index 83149b8c1..f71b4de5d 100755
--- a/cv/detection/yolof/pytorch/README.md
+++ b/cv/detection/yolof/pytorch/README.md
@@ -1,29 +1,22 @@
-# YOLOF(You Only Look One-level Feature)
+# YOLOF
 
-## Model description
+## Model Description
 
-This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion. From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {\em utilizing only one-level feature for detection}. Based on the simple and efficient solution, we present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated Encoder and Uniform Matching, are proposed and bring considerable improvements. Extensive experiments on the COCO benchmark prove the effectiveness of the proposed model. Our YOLOF achieves comparable results with its feature pyramids counterpart RetinaNet while being 2.5x faster. Without transformer layers, YOLOF can match the performance of DETR in a single-level feature manner with 7x less training epochs. With an image size of 608x608, YOLOF achieves 44.3 mAP running at 60 fps on 2080Ti, which is 13% faster than YOLOv4. Code is available at \url{https://github.com/megvii-model/YOLOF}.
+YOLOF (You Only Look One-level Feature) is an efficient object detection model that challenges the necessity of feature
+pyramids. It demonstrates that using a single-level feature with proper optimization can achieve comparable results to
+multi-level approaches. YOLOF introduces two key components: Dilated Encoder for capturing multi-scale context and
+Uniform Matching for balanced positive samples. The model achieves competitive accuracy with RetinaNet while being 2.5x
+faster, making it suitable for real-time detection tasks.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```bash
-# Install libGL
-## CentOS
-yum install -y mesa-libGL
-## Ubuntu
-apt install -y libgl1-mesa-glx
-
-# install MMDetection
-git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
-cd mmdetection
-pip install -v -e .
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -47,29 +40,41 @@ coco2017
 ```bash
 mkdir -p data
 ln -s /path/to/coco2017 data/coco
+```
 
-# Prepare resnet50_caffe-788b5fa3.pth, skip this if fast network
+Prepare resnet50_caffe-788b5fa3.pth, skip this if fast network
+
+```bash
 mkdir -p /root/.cache/torch/hub/checkpoints/
 wget -O /root/.cache/torch/hub/checkpoints/resnet50_caffe-788b5fa3.pth https://download.openmmlab.com/pretrain/third_party/resnet50_caffe-788b5fa3.pth
 ```
 
-## Step 3: Training
-
-#### Training on a single GPU
+### Install Dependencies
 
 ```bash
-python3 tools/train.py configs/yolof/yolof_r50-c5_8xb8-1x_coco.py
+# Install libGL
+## CentOS
+yum install -y mesa-libGL
+## Ubuntu
+apt install -y libgl1-mesa-glx
+
+# install MMDetection
+git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1
+cd mmdetection
+pip install -v -e .
 ```
 
-#### Training on multiple GPUs
+## Model Training
 
 ```bash
-sed -i 's/python /python3 /g' tools/dist_train.sh
+# Training on a single GPU
+python3 tools/train.py configs/yolof/yolof_r50-c5_8xb8-1x_coco.py
 
-# Multiple GPUs on one machine
+# Training on multiple GPUs
+sed -i 's/python /python3 /g' tools/dist_train.sh
 bash tools/dist_train.sh configs/yolof/yolof_r50-c5_8xb8-1x_coco.py 8
 ```
 
-## Reference
+## References
 
-- [mmdetection](https://github.com/open-mmlab/mmdetection)
\ No newline at end of file
+- [mmdetection](https://github.com/open-mmlab/mmdetection)
diff --git a/cv/detection/yolov10/pytorch/README.md b/cv/detection/yolov10/pytorch/README.md
index 81aa174bc..4392fa5f3 100644
--- a/cv/detection/yolov10/pytorch/README.md
+++ b/cv/detection/yolov10/pytorch/README.md
@@ -1,23 +1,22 @@
 # YOLOv10
 
-## Model description
+## Model Description
 
-YOLOv10, built on the Ultralytics Python package by researchers at Tsinghua University, introduces a new approach to real-time object detection, addressing both the post-processing and model architecture deficiencies found in previous YOLO versions. By eliminating non-maximum suppression (NMS) and optimizing various model components, YOLOv10 achieves state-of-the-art performance with significantly reduced computational overhead. Extensive experiments demonstrate its superior accuracy-latency trade-offs across multiple model scales.
+YOLOv10 is a cutting-edge object detection model developed by Tsinghua University researchers. It eliminates the need
+for non-maximum suppression (NMS) while optimizing model architecture for enhanced efficiency. YOLOv10 achieves
+state-of-the-art performance with reduced computational overhead, offering superior accuracy-latency trade-offs across
+various model scales. Built on the Ultralytics framework, it addresses limitations of previous YOLO versions, making it
+ideal for real-time applications requiring fast and accurate object detection in diverse scenarios.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# CentOS
-yum install -y mesa-libGL
-# Ubuntu
-apt install -y libgl1-mesa-glx
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -44,9 +43,14 @@ mkdir -p datasets/
 ln -s /PATH/TO/COCO ./datasets/coco
 ```
 
-## Step 3: Training
+### Install Dependencies
 
 ```bash
+# CentOS
+yum install -y mesa-libGL
+# Ubuntu
+apt install -y libgl1-mesa-glx
+
 # get yolov10 code
 git clone https://github.com/THU-MIG/yolov10.git
 cd yolov10
@@ -54,12 +58,13 @@ sed -i 's/^torch/# torch/g' requirements.txt
 pip install -r requirements.txt
 ```
 
-### Multiple GPU training
+## Model Training
 
 ```bash
+# Multiple GPU training
 yolo detect train data=coco.yaml model=yolov10n.yaml epochs=500 batch=256 imgsz=640 device=0,1,2,3,4,5,6,7
 ```
 
-## Reference
+## References
 
-[YOLOv10](https://github.com/THU-MIG/yolov10)
+- [YOLOv10](https://github.com/THU-MIG/yolov10)
diff --git a/cv/detection/yolov3/paddlepaddle/README.md b/cv/detection/yolov3/paddlepaddle/README.md
index 669932c19..5c70713a6 100644
--- a/cv/detection/yolov3/paddlepaddle/README.md
+++ b/cv/detection/yolov3/paddlepaddle/README.md
@@ -1,26 +1,33 @@
 # YOLOv3
 
-## Model description
+## Model Description
 
-We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/.
+YOLOv3 is a real-time object detection model that builds upon its predecessors with improved accuracy while maintaining
+speed. It uses a deeper backbone network and multi-scale predictions to detect objects of various sizes. YOLOv3 achieves
+competitive performance with faster inference times compared to other detectors. It processes images in a single forward
+pass, making it efficient for real-time applications. The model balances speed and accuracy, making it popular for
+practical detection tasks.
 
-## Step 1: Installing
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
-git clone --recursive https://github.com/PaddlePaddle/PaddleDetection.git
-cd PaddleDetection
-pip3 install -r requirements.txt
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+cd PaddleDetection/
+# Get COCO Dataset
+python3 dataset/coco/download_coco.py
 ```
 
-## Step 2: Download data
+### Install Dependencies
 
 ```bash
-python3 dataset/coco/download_coco.py
-
-ls dataset/coco/
+pip install -r requirements.txt
+python3 setup.py install
 ```
 
-## Step 3: Run YOLOv3
+## Model Training
 
 ```bash
 # Make sure your dataset path is the same as above.
diff --git a/cv/detection/yolov3/pytorch/README.md b/cv/detection/yolov3/pytorch/README.md
index 9a7cf4e18..7cdc40ca4 100755
--- a/cv/detection/yolov3/pytorch/README.md
+++ b/cv/detection/yolov3/pytorch/README.md
@@ -1,19 +1,16 @@
 # YOLOv3
 
-## Model description
+## Model Description
 
-We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at <https://pjreddie.com/yolo/>.
+YOLOv3 is a real-time object detection model that builds upon its predecessors with improved accuracy while maintaining
+speed. It uses a deeper backbone network and multi-scale predictions to detect objects of various sizes. YOLOv3 achieves
+competitive performance with faster inference times compared to other detectors. It processes images in a single forward
+pass, making it efficient for real-time applications. The model balances speed and accuracy, making it popular for
+practical detection tasks.
 
-## Step 1: Installing packages
+## Model Preparation
 
-```bash
-## clone yolov3 and install
-git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
-cd deepsparkhub-GPL/cv/detection/yolov3/pytorch/
-bash setup.sh
-```
-
-## Step 2: Preparing data
+### Prepare Resources
 
 ```bash
 bash weights/download_weights.sh
@@ -23,20 +20,25 @@ bash weights/download_weights.sh
 ./data/get_coco_dataset.sh
 ```
 
-## Step 3: Training
-
-### On single GPU
+### Install Dependencies
 
 ```bash
-bash run_training.sh
+## clone yolov3 and install
+git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
+cd deepsparkhub-GPL/cv/detection/yolov3/pytorch/
+bash setup.sh
 ```
 
-### Multiple GPUs on one machine
+## Model Training
 
 ```bash
+# On single GPU
+bash run_training.sh
+
+# Multiple GPUs on one machine
 bash run_dist_training.sh
 ```
 
-## Reference
+## References
 
 - [YOLOv3](https://github.com/eriklindernoren/PyTorch-YOLOv3)
diff --git a/cv/detection/yolov3/tensorflow/README.md b/cv/detection/yolov3/tensorflow/README.md
index 4f1618d0c..92d9619c3 100644
--- a/cv/detection/yolov3/tensorflow/README.md
+++ b/cv/detection/yolov3/tensorflow/README.md
@@ -1,27 +1,28 @@
 # YOLOv3
 
-## Model description
+## Model Description
 
-We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/.
+YOLOv3 is a real-time object detection model that builds upon its predecessors with improved accuracy while maintaining
+speed. It uses a deeper backbone network and multi-scale predictions to detect objects of various sizes. YOLOv3 achieves
+competitive performance with faster inference times compared to other detectors. It processes images in a single forward
+pass, making it efficient for real-time applications. The model balances speed and accuracy, making it popular for
+practical detection tasks.
 
-## Prepare
+## Model Preparation
 
-```
-bash init_tf.sh
-```
+### Prepare Resources
 
-## Download dataset and checkpoint
+Download VOC PASCAL trainval and test data.
 
-### Download VOC PASCAL trainval and test data
-
-```
+```bash
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
 ```
+
 Extract all of these tars into one directory and rename them, which should have the following basic structure.
 
-```
+```bash
 VOC           # path:  /home/yang/dataset/VOC
 ├── test
 |    └──VOCdevkit
@@ -31,19 +32,25 @@ VOC           # path:  /home/yang/dataset/VOC
          └──VOC2007 (from VOCtrainval_06-Nov-2007.tar)
          └──VOC2012 (from VOCtrainval_11-May-2012.tar)
 ```
-###  Download checkpoint
-Exporting loaded COCO weights as TF checkpoint(yolov3_coco.ckpt)[BaiduCloud](https://pan.baidu.com/s/11mwiUy8KotjUVQXqkGGPFQ&shfl=sharepset#list/path=%2F)
 
+Download checkpoint.
 
+Exporting loaded COCO weights as TF checkpoint(yolov3_coco.ckpt)[BaiduCloud](https://pan.baidu.com/s/11mwiUy8KotjUVQXqkGGPFQ&shfl=sharepset#list/path=%2F)
 
+### Install Dependencies
 
-## Run training 
+```bash
+bash init_tf.sh
 ```
+
+## Model Training
+
+```bash
 bash ./run_training.sh
 ```
-## Result
 
-|               | mAP       |       fps |
-| ---           | ---       | ---       |
-|    multi_card |  33.67%   | 4.34it/s  |
+## Model Results
 
+| Model  | GPU     | mAP    | fps      |
+|--------|---------|--------|----------|
+| YOLOv3 | BI-V100 | 33.67% | 4.34it/s |
diff --git a/cv/detection/yolov5/paddlepaddle/README.md b/cv/detection/yolov5/paddlepaddle/README.md
index 7baea336c..4e2b338ff 100644
--- a/cv/detection/yolov5/paddlepaddle/README.md
+++ b/cv/detection/yolov5/paddlepaddle/README.md
@@ -1,26 +1,32 @@
 # YOLOv5
 
-## Model description
+## Model Description
 
-YOLOv5 is the world's most loved vision AI, representing <a href="https://ultralytics.com">Ultralytics</a> open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.
+YOLOv5 is a state-of-the-art object detection model that builds upon the YOLO architecture, offering improved speed and
+accuracy. It features a streamlined design with enhanced data augmentation and anchor box strategies. YOLOv5 supports
+multiple model sizes (n/s/m/l/x) for different performance needs. The model is known for its ease of use, fast training,
+and efficient inference, making it popular for real-time detection tasks across various applications.
 
-## Step 1: Installation
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
 cd deepsparkhub/cv/detection/yolov5/paddlepaddle/
 git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleYOLO.git
 cd PaddleYOLO/
-pip3 install -r requirements.txt
-python3 setup.py develop
+
+python3 dataset/coco/download_coco.py
 ```
 
-## Step 2: Preparing datasets
+### Install Dependencies
 
 ```bash
-python3 dataset/coco/download_coco.py
+pip3 install -r requirements.txt
+python3 setup.py develop
 ```
 
-## Step 3: Training
+## Model Training
 
 ```bash
 # Make sure your dataset path is the same as above
@@ -30,12 +36,12 @@ config_yaml=configs/yolov5/yolov5_s_300e_coco.yml
 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config_yaml} --amp --eval
 ```
 
-## Results
+## Model Results
 
-GPUs|Model|FPS|ACC
-----|---|---|---
-BI-V100 x8|YOLOv5-n| 10.9788 images/s | bbox ap: 0.259
+| Model    | GPU        | FPS              | ACC            |
+|----------|------------|------------------|----------------|
+| YOLOv5-n | BI-V100 x8 | 10.9788 images/s | bbox ap: 0.259 |
 
-## Reference
+## References
 
 - [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO)
diff --git a/cv/detection/yolov5/pytorch/README.md b/cv/detection/yolov5/pytorch/README.md
index b202d415a..d44594250 100644
--- a/cv/detection/yolov5/pytorch/README.md
+++ b/cv/detection/yolov5/pytorch/README.md
@@ -1,21 +1,21 @@
 # YOLOv5
 
-YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.
+## Model Description
 
-## Step 1: Installing packages
+YOLOv5 is a state-of-the-art object detection model that builds upon the YOLO architecture, offering improved speed and
+accuracy. It features a streamlined design with enhanced data augmentation and anchor box strategies. YOLOv5 supports
+multiple model sizes (n/s/m/l/x) for different performance needs. The model is known for its ease of use, fast training,
+and efficient inference, making it popular for real-time detection tasks across various applications.
 
-```bash
-## clone yolov5 and install
-git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
-cd deepsparkhub-GPL/cv/detection/yolov5/pytorch/
-bash init.sh
-```
+## Model Preparation
 
-## Step 2: Preparing datasets
+### Prepare Resources
 
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -36,6 +36,15 @@ coco2017
 └── ...
 ```
 
+### Install Dependencies
+
+```bash
+## clone yolov5 and install
+git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
+cd deepsparkhub-GPL/cv/detection/yolov5/pytorch/
+bash init.sh
+```
+
 Modify the configuration file(data/coco.yaml)
 
 ```bash
@@ -45,27 +54,19 @@ vim data/coco.yaml
 # val: the relative path of valid images
 ```
 
-## Training the detector
+## Model Training
 
 Train the yolov5 model as follows, the train log is saved in ./runs/train/exp
 
-### On single GPU
-
 ```bash
+# On single GPU
 python3 train.py --data ./data/coco.yaml --batch-size 32 --cfg ./models/yolov5s.yaml --weights ''
-```
 
-### On single GPU (AMP)
-
-```bash
+# On single GPU (AMP)
 python3 train.py --data ./data/coco.yaml --batch-size 32 --cfg ./models/yolov5s.yaml --weights '' --amp
-```
 
-### Multiple GPUs on one machine
-
-```bash
-# eight cards
-# YOLOv5s
+# Multiple GPUs on one machine
+## YOLOv5s
 python3 -m torch.distributed.launch --nproc_per_node 8 \
     train.py \
     --data ./data/coco.yaml \
@@ -73,14 +74,11 @@ python3 -m torch.distributed.launch --nproc_per_node 8 \
     --cfg ./models/yolov5s.yaml --weights '' \
     --device 0,1,2,3,4,5,6,7
 
-# YOLOv5m
+## YOLOv5m
 bash run.sh
-```
 
-### Multiple GPUs on one machine (AMP)
-
-```bash
-# eight cards 
+# Multiple GPUs on one machine (AMP)
+## eight cards 
 python3 -m torch.distributed.launch --nproc_per_node 8 \
     train.py \
     --data ./data/coco.yaml \
@@ -89,9 +87,7 @@ python3 -m torch.distributed.launch --nproc_per_node 8 \
     --device 0,1,2,3,4,5,6,7 --amp
 ```
 
-## Test the detector
-
-Test the yolov5 model as follows, the result is saved in ./runs/detect:
+Test the YOLOv5 model as follows, the results are saved in ./runs/detect.
 
 ```bash
 python3 detect.py --source ./data/images/bus.jpg --weights yolov5s.pt --img 640
@@ -99,18 +95,17 @@ python3 detect.py --source ./data/images/bus.jpg --weights yolov5s.pt --img 640
 python3 detect.py --source ./data/images/zidane.jpg --weights yolov5s.pt --img 640
 ```
 
-## Results on BI-V100
+## Model Results
 
 
-| GPUs | FP16 | Batch size | FPS | E2E | mAP@.5 |
-| ---- | ---- | ---------- | --- | --- | ------ |
-| 1x1  | True | 64         | 81  | N/A | N/A    |
-| 1x8  | True | 64         | 598 | 24h | 0.632  |
+| GPU        | FP16 | Batch size | FPS | E2E | mAP@.5 |
+|------------|------|------------|-----|-----|--------|
+| BI-V100 x8 | True | 64         | 598 | 24h | 0.632  |
 
 | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power（W） | Scalability | Memory utilization（G） | Stability |
 | -------------------- | ---------------------------------------- | ----------- | -------- | ---------- | ----------- | ----------------------- | --------- |
 | mAP:0.5              | SDK V2.2, bs:128, 8x, AMP                | 1228        | 0.56     | 140\*8     | 0.92        | 27.3\*8                 | 1         |
 
-## Reference
+## References
 
 - [YOLOv5](https://github.com/ultralytics/yolov5)
diff --git a/cv/detection/yolov6/pytorch/README.md b/cv/detection/yolov6/pytorch/README.md
index 6c6f4e951..819900288 100644
--- a/cv/detection/yolov6/pytorch/README.md
+++ b/cv/detection/yolov6/pytorch/README.md
@@ -1,37 +1,20 @@
 # YOLOv6
 
-## Model description
+## Model Description
 
-For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application.
-Considering the diverse requirements for speed and accuracy in the real environment, we extensively examine the up-to-date object detection advancements either from industry or academia. Specifically, we heavily assimilate ideas from recent network design, training strategies, testing techniques, quantization, and optimization methods. On top of this, we integrate our thoughts and practice to build a suite of deployment-ready networks at various scales to accommodate diversified use cases. With the generous permission of YOLO authors, we name it YOLOv6. We also express our warm welcome to users and contributors for further enhancement. For a glimpse of performance, our YOLOv6-N hits 35.9% AP on the COCO dataset at a throughput of 1234 FPS on an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495 FPS, outperforming other mainstream detectors at the same scale~(YOLOv5-S, YOLOX-S, and PPYOLOE-S). Our quantized version of YOLOv6-S even brings a new state-of-the-art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also achieves better accuracy performance (i.e., 49.5%/52.3%) than other detectors with a similar inference speed. We carefully conducted experiments to validate the effectiveness of each component.
-Implementation of paper:
+YOLOv6 is an industrial-grade object detection model that pushes the boundaries of the YOLO series. It incorporates
+advanced network design, training strategies, and optimization techniques to achieve state-of-the-art performance.
+YOLOv6 offers multiple model sizes for various speed-accuracy trade-offs, excelling in both accuracy and inference
+speed. It introduces innovative quantization methods for efficient deployment. The model demonstrates superior
+performance compared to other YOLO variants, making it suitable for diverse real-world applications requiring fast and
+accurate object detection.
 
-- [YOLOv6 v3.0: A Full-Scale Reloading](https://arxiv.org/abs/2301.05586) 🔥
-- [YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications](https://arxiv.org/abs/2209.02976)
+## Model Preparation
 
-## Installing packages
+### Prepare Resources
 
-```bash
-## install libGL
-yum install mesa-libGL
-
-## install zlib
-wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
-tar xvf zlib-1.2.9.tar.gz
-cd zlib-1.2.9/
-./configure && make install
-cd ..
-rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
-
-## clone yolov6
-git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
-cd deepsparkhub-GPL/cv/detection/yolov6/pytorch/
-pip3 install -r requirements.txt
-```
-
-## Preparing datasets
-
-- data: prepare dataset and specify dataset paths in data.yaml ( [COCO](http://cocodataset.org), [YOLO format coco labels](https://github.com/meituan/YOLOv6/releases/download/0.1.0/coco2017labels.zip) )
+- data: prepare dataset and specify dataset paths in data.yaml ( [COCO](http://cocodataset.org), [YOLO format coco
+  labels](https://github.com/meituan/YOLOv6/releases/download/0.1.0/coco2017labels.zip) )
 - make sure your dataset structure as follows:
 
 ```bash
@@ -49,28 +32,45 @@ pip3 install -r requirements.txt
 │   ├── README.txt
 ```
 
-## Training
+### Install Dependencies
 
-> After training, reporting "AttributeError: 'NoneType' object has no attribute 'python_exit_status'" is a [known issue](https://github.com/meituan/YOLOv6/issues/506), add "--workers 0" if you want to avoid.
+```bash
+## install libGL
+yum install mesa-libGL
 
-Single gpu train
+## install zlib
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+cd ..
+rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/
 
-```bash
-python3 tools/train.py --batch 32 --conf configs/yolov6s.py --data data/coco.yaml --epoch 300 --name yolov6s_coco
+## clone yolov6
+git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
+cd deepsparkhub-GPL/cv/detection/yolov6/pytorch/
+pip3 install -r requirements.txt
 ```
 
-Multiple gpu train
+## Model Training
+
+> After training, reporting "AttributeError: 'NoneType' object has no attribute 'python_exit_status'" is a [known
+> issue](https://github.com/meituan/YOLOv6/issues/506), add "--workers 0" if you want to avoid.
 
 ```bash
+# Single gpu training
+python3 tools/train.py --batch 32 --conf configs/yolov6s.py --data data/coco.yaml --epoch 300 --name yolov6s_coco
+
+# Multiple gpu training
 python3 -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 256 --conf configs/yolov6s.py --data data/coco.yaml --epoch 300 --name yolov6s_coco --device 0,1,2,3,4,5,6,7
 ```
 
-## Training Results
+## Model Results
 
 | Model    | Size | mAP<sup>val<br/>0.5:0.95 | mAP<sup>val<br/>0.5 |
-| :------- | ---- | :----------------------- | ------------------- |
+|----------|------|--------------------------|---------------------|
 | YOLOv6-S | 640  | 44.3                     | 61.3                |
 
-## Reference
+## References
 
 - [YOLOv6](https://github.com/meituan/YOLOv6)
diff --git a/cv/detection/yolov7/pytorch/README.md b/cv/detection/yolov7/pytorch/README.md
index 0457b7fdc..468570543 100644
--- a/cv/detection/yolov7/pytorch/README.md
+++ b/cv/detection/yolov7/pytorch/README.md
@@ -1,23 +1,28 @@
 # YOLOv7
 
-## Model description
+## Model Description
 
-Implementation of paper - [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696)
+YOLOv7 is a state-of-the-art real-time object detection model that introduces innovative trainable bag-of-freebies
+techniques. It achieves superior accuracy and speed compared to previous YOLO versions and other detectors. YOLOv7
+optimizes model architecture, training strategies, and inference efficiency without increasing computational costs. The
+model supports various scales for different performance needs and demonstrates exceptional results on COCO benchmarks.
+Its efficient design makes it suitable for real-world applications requiring fast and accurate object detection.
 
-## Step 1: Installing packages
+## Model Preparation
+
+### Prepare Resources
 
 ```bash
-## clone yolov7 and install
+# clone yolov7
 git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git
 cd deepsparkhub-GPL/cv/detection/yolov7/pytorch/
-pip3 install -r requirements.txt
 ```
 
-## Step 2: Preparing datasets
-
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -68,50 +73,55 @@ The datasets format as follows:
 
 ```
 
-## Step 3: Training
-
-Train the yolov7 model as follows, the train log is saved in ./runs/train/exp
-
-### Single GPU training
+### Install Dependencies
 
 ```bash
-python3 train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
+pip3 install -r requirements.txt
 ```
 
-### Multiple GPU training
+## Model Training
+
+Train the yolov7 model as follows, the train log is saved in ./runs/train/exp.
 
 ```bash
+# Single GPU training
+python3 train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
+
+# Multiple GPU training
 python3 -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch-size 64 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml
 ```
 
-## Transfer learning
+Transfer learning.
 
-[`yolov7_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt) [`yolov7x_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x_training.pt) [`yolov7-w6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6_training.pt) [`yolov7-e6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6_training.pt) [`yolov7-d6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6_training.pt) [`yolov7-e6e_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e_training.pt)
+- [`yolov7_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt)
+- [`yolov7x_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x_training.pt)
+- [`yolov7-w6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6_training.pt)
+- [`yolov7-e6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6_training.pt)
+- [`yolov7-d6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6_training.pt)
+- [`yolov7-e6e_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e_training.pt)
 
 ```bash
 python3 train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml
 ```
 
-## Inference
-
-On video:
+Inference on video:
 
 ```bash
 python3 detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source yourvideo.mp4
 ```
 
-On image:
+Inference on image:
 
 ```bash
 python3 detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg
 ```
 
-## Results
+## Model Results
 
-| Model | Test Size | AP<sup>test</sup> | AP<sub>50</sub><sup>test</sup> |
-| :-- | :-: | :-: | :-: |
-| [**YOLOv7**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt) | 640 | **49.4%** | **68.6%** |
+| Model  | Test Size | AP<sup>test</sup> | AP<sub>50</sub><sup>test</sup> |
+|:-------|:---------:|:-----------------:|:------------------------------:|
+| YOLOv7 |    640    |       49.4%       |             68.6%              |
 
-## Reference
+## References
 
-= [YOLOv7](https://github.com/WongKinYiu/yolov7)
+- [YOLOv7](https://github.com/WongKinYiu/yolov7)
diff --git a/cv/detection/yolov8/pytorch/README.md b/cv/detection/yolov8/pytorch/README.md
index 66673f403..0f209d864 100644
--- a/cv/detection/yolov8/pytorch/README.md
+++ b/cv/detection/yolov8/pytorch/README.md
@@ -1,30 +1,22 @@
 # YOLOv8
 
-## Model description
+## Model Description
 
-[Ultralytics](https://ultralytics.com) [YOLOv8](https://github.com/ultralytics/ultralytics) is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.
+YOLOv8 is the latest iteration in the YOLO series, offering state-of-the-art performance in object detection and
+tracking. It introduces enhanced architecture and training techniques for improved accuracy and speed. YOLOv8 supports
+multiple tasks including instance segmentation, pose estimation, and image classification. The model is designed for
+efficiency and ease of use, making it suitable for real-time applications. It maintains the YOLO tradition of fast
+inference while delivering superior detection performance across various scenarios.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-# Install zlib 1.2.9
-wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
-tar xvf zlib-1.2.9.tar.gz
-cd zlib-1.2.9/
-./configure && make install
+### Prepare Resources
 
-# Install libGL
-yum install mesa-libGL
+Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to
+download.
 
-# Install ultralytics
-pip3 install ultralytics
-```
-
-## Step 2: Preparing datasets
-
-Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download.
-
-Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like:
+Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the
+unzipped dataset path structure sholud look like:
 
 ```bash
 coco2017
@@ -50,17 +42,34 @@ mkdir -p <project_path>/datasets/
 ln -s /path/to/coco2017 <project_path>/datasets/
 ```
 
-## Step 3: Training
+### Install Dependencies
+
+```bash
+# Install zlib 1.2.9
+wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz
+tar xvf zlib-1.2.9.tar.gz
+cd zlib-1.2.9/
+./configure && make install
+
+# Install libGL
+yum install mesa-libGL
+
+# Install ultralytics
+pip3 install ultralytics
+```
+
+## Model Training
 
 ```bash
 python3 test.py
 ```
 
-## Results
+## Model Results
+
+| Model  | GPU        | FP32     |
+|--------|------------|----------|
+| YOLOv8 | BI-V100 x8 | MAP=36.3 |
 
-|    GPUs    | FP32     | 
-| ---------- | ---------|
-| BI-V100 x8 | MAP=36.3 |
+## References
 
-## Reference
 - [ultralytics](https://github.com/ultralytics/ultralytics)
diff --git a/cv/detection/yolov9/pytorch/README.md b/cv/detection/yolov9/pytorch/README.md
index a492f5691..2b7c01fac 100644
--- a/cv/detection/yolov9/pytorch/README.md
+++ b/cv/detection/yolov9/pytorch/README.md
@@ -1,29 +1,35 @@
 # YOLOv9
 
-## Model description
+## Model Description
 
-YOLOv9 is a state-of-the-art object detection algorithm that belongs to the YOLO (You Only Look Once) family of models. It is an improved version of the original YOLO algorithm with better accuracy and performance.
+YOLOv9 is the latest advancement in the YOLO series, offering state-of-the-art object detection capabilities. It builds
+upon previous versions with enhanced architecture and training techniques for improved accuracy and speed. YOLOv9
+introduces innovative features that optimize performance across various hardware platforms. The model maintains the YOLO
+tradition of real-time detection while delivering superior results in complex scenarios. Its efficient design makes it
+suitable for applications requiring fast and accurate object recognition in diverse environments.
 
-## Step 1: Installation
+## Model Preparation
 
-```bash
-yum install -y mesa-libGL
-```
+### Prepare Resources
 
-## Step 2: Preparing datasets
+Download MS COCO dataset images ([train](http://images.cocodataset.org/zips/train2017.zip),
+[val](http://images.cocodataset.org/zips/val2017.zip), [test](http://images.cocodataset.org/zips/test2017.zip)) and
+[labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip). If you have
+previously used a different version of YOLO, we strongly recommend that you delete train2017.cache and val2017.cache
+files, and redownload [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip)
 
 ```bash
 bash scripts/get_coco.sh
-```
 
-Download MS COCO dataset images ([train](http://images.cocodataset.org/zips/train2017.zip), [val](http://images.cocodataset.org/zips/val2017.zip), [test](http://images.cocodataset.org/zips/test2017.zip)) and [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip). If you have previously used a different version of YOLO, we strongly recommend that you delete train2017.cache and val2017.cache files, and redownload [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip)
-
-## Step 3: Training
-
-```bash
 # make soft link to coco dataset
 mkdir -p datasets/
 ln -s /PATH/TO/COCO ./datasets/coco
+```
+
+### Install Dependencies
+
+```bash
+yum install -y mesa-libGL
 
 # get yolov9 code
 git clone https://github.com/WongKinYiu/yolov9.git
@@ -32,20 +38,16 @@ pip install -r requirements.txt
 pip install Pillow==9.5.0
 ```
 
-### Training on a Single GPU
+## Model Training
 
 ```bash
+# Training on a Single GPU
 python3 train_dual.py --workers 8 --device 0 --batch 16 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 300 --close-mosaic 15
-```
-
-### Multiple GPU training
 
-```bash
+# Multiple GPU training
 torchrun --nproc_per_node 8 --master_port 9527 train_dual.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch 128 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 300 --close-mosaic 15
 ```
 
-## Results
-
-## Reference
+## References
 
-[YOLOv9](https://github.com/WongKinYiu/yolov9)
+- [YOLOv9](https://github.com/WongKinYiu/yolov9)
-- 
Gitee