diff --git a/cv/detection/README.md b/cv/detection/README.md deleted file mode 100644 index 11c2f4b905d80bb477e648ea467753f7f7e0cc19..0000000000000000000000000000000000000000 --- a/cv/detection/README.md +++ /dev/null @@ -1 +0,0 @@ -# Object Detection diff --git a/cv/detection/atss_mmdet/pytorch/README.md b/cv/detection/atss_mmdet/pytorch/README.md index 4c756bc6e10ed6427c675e3a8178d2b2ef36aeed..f5c0fc7b547aafb9bd6cdc650020ae685c23b20e 100644 --- a/cv/detection/atss_mmdet/pytorch/README.md +++ b/cv/detection/atss_mmdet/pytorch/README.md @@ -1,31 +1,22 @@ # ATSS -## Model description +## Model Description -Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to 50.7% AP without introducing any overhead. +ATSS (Adaptive Training Sample Selection) is an innovative object detection framework that bridges the gap between +anchor-based and anchor-free methods. It introduces an adaptive mechanism to select positive and negative training +samples based on statistical characteristics of objects, improving detection accuracy. ATSS automatically determines +optimal sample selection thresholds, eliminating the need for manual tuning. This approach enhances both anchor-based +and anchor-free detectors, achieving state-of-the-art performance on benchmarks like COCO without additional +computational overhead. -## Step 1: Installation +## Model Preparation -ATSS model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. - -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` - -## Step 2: Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -46,7 +37,24 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies + +ATSS model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . +``` + +## Model Training ```bash # Make soft link to dataset @@ -67,11 +75,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/atss/atss_r50_fpn_1x_coco.py 8 ``` -## Results +## Model Results + +| GPU | FP32 | +|------------|----------| +| BI-V100 x8 | MAP=39.5 | -| GPUs | FP32 | -| ----------- | ------------------------------------ | -| BI-V100 x8 | MAP=39.5 | +## References -## Reference -[mmdetection](https://github.com/open-mmlab/mmdetection) +- [mmdetection](https://github.com/open-mmlab/mmdetection) diff --git a/cv/detection/autoassign/pytorch/README.md b/cv/detection/autoassign/pytorch/README.md index 69e79423674bdc0de0ed8eedd2fa1eeda9da5f39..f874fe8bd8e8f9423d9f4a95e29a84e4e62547ad 100755 --- a/cv/detection/autoassign/pytorch/README.md +++ b/cv/detection/autoassign/pytorch/README.md @@ -1,26 +1,17 @@ # AutoAssign -## Model description +## Model Description -Determining positive/negative samples for object detection is known as label assignment. Here we present an anchor-free detector named AutoAssign. It requires little human knowledge and achieves appearance-aware through a fully differentiable weighting mechanism. During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions. To adapt to object appearances, Confidence Weighting is proposed to adjust the specific assign strategy of each instance. The two weighting modules are then combined to generate positive and negative weights to adjust each location's confidence. +AutoAssign is an anchor-free object detection model that introduces a fully differentiable label assignment mechanism. +It combines Center Weighting and Confidence Weighting to adaptively determine positive and negative samples during +training. Center Weighting adjusts category-specific prior distributions, while Confidence Weighting customizes +assignment strategies for each instance. This approach eliminates the need for manual anchor design and achieves +appearance-aware detection through automatic sample selection, resulting in improved performance and reduced human +intervention in the detection process. +## Model Preparation -## Step 1: Installing packages - -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` - -## Step 2: Preparing datasets +### Prepare Resources ```bash mkdir -p data @@ -31,9 +22,11 @@ mkdir -p /root/.cache/torch/hub/checkpoints/ wget https://download.openmmlab.com/pretrain/third_party/resnet50_msra-5891d200.pth -O /root/.cache/torch/hub/checkpoints/resnet50_msra-5891d200.pth ``` -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -54,19 +47,32 @@ coco2017 └── ... ``` -## Step 3: Training - -### One single GPU +### Install Dependencies ```bash -python3 tools/train.py configs/autoassign/autoassign_r50-caffe_fpn_1x_coco.py +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . ``` -### Multiple GPUs on one machine +## Model Training + ```bash +# One single GPU +python3 tools/train.py configs/autoassign/autoassign_r50-caffe_fpn_1x_coco.py + +# Multiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/autoassign/autoassign_r50-caffe_fpn_1x_coco.py 8 ``` -## Reference +## References + [mmdetection](https://github.com/open-mmlab/mmdetection) diff --git a/cv/detection/cascade_rcnn_mmdet/pytorch/README.md b/cv/detection/cascade_rcnn_mmdet/pytorch/README.md index 756bc2f08248a8612d064b809da2255f9a1f543f..e673a73d2d6a3e1166f5acef34d46caa7ef1175e 100644 --- a/cv/detection/cascade_rcnn_mmdet/pytorch/README.md +++ b/cv/detection/cascade_rcnn_mmdet/pytorch/README.md @@ -1,30 +1,22 @@ # Cascade R-CNN -## Model description +## Model Description -In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN. +Cascade R-CNN is a multi-stage object detection framework that progressively improves detection quality through a +sequence of detectors trained with increasing IoU thresholds. Each stage refines the bounding boxes from the previous +stage, addressing the paradox of high-quality detection by minimizing overfitting and ensuring quality consistency +between training and inference. This architecture achieves state-of-the-art performance on various datasets, including +COCO, and can be extended to instance segmentation tasks, outperforming models like Mask R-CNN. -## Step 1: Installation -Cascade R-CNN model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +### Prepare Resources -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -## Step 2: Preparing datasets - -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -45,7 +37,24 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies + +Cascade R-CNN model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . +``` + +## Model Training ```bash # Make soft link to dataset @@ -66,12 +75,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.py 8 ``` -## Results +## Model Results -| GPUs | FP32 | -| ----------- | ------------------------------------ | -| BI-V100 x8 | MAP=40.4 | +| GPU | FP32 | +|------------|----------| +| BI-V100 x8 | MAP=40.4 | -## Reference +## References -- [Cascade R-CNN: High Quality Object Detection and Instance Segmentation](https://arxiv.org/abs/1906.09756) +- [Paper](https://arxiv.org/abs/1906.09756) diff --git a/cv/detection/centermask2/pytorch/README.md b/cv/detection/centermask2/pytorch/README.md index 8938a825477aabbc16601caf16d15919cfe7d29c..6992d06982e4e2353e9d42aa7dc7b1388bee3c8a 100644 --- a/cv/detection/centermask2/pytorch/README.md +++ b/cv/detection/centermask2/pytorch/README.md @@ -1,20 +1,17 @@ # CenterMask2 -CenterMask2 is an upgraded implementation on top of detectron2 beyond original CenterMask based on maskrcnn-benchmark. +## Model Description -## Step 1: Installation +CenterMask2 is an advanced instance segmentation model built on Detectron2, extending the original CenterMask +architecture. It improves mask prediction accuracy by incorporating spatial attention mechanisms and VoVNet backbones. +CenterMask2 enhances object localization and segmentation through its dual-branch design, combining mask and box +predictions effectively. The model achieves state-of-the-art performance on COCO dataset benchmarks, offering efficient +training and inference capabilities. It's particularly effective for complex scenes with overlapping objects and varying +scales. -All you need to use centermask2 is [detectron2](https://github.com/facebookresearch/detectron2). It's easy! - -you just install [detectron2](https://github.com/facebookresearch/detectron2) following [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md). - -```bash -# Install detectron2 -git clone https://github.com/facebookresearch/detectron2.git -python3 -m pip install -e detectron2 -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -44,7 +41,19 @@ mkdir -p /datasets/ ln -s /path/to/coco2017 /datasets/ ``` -## Step 3: Training +### Install Dependencies + +All you need to use centermask2 is [detectron2](https://github.com/facebookresearch/detectron2). It's easy! + +you just install [detectron2](https://github.com/facebookresearch/detectron2) following [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md). + +```bash +# Install detectron2 +git clone https://github.com/facebookresearch/detectron2.git +python3 -m pip install -e detectron2 +``` + +## Model Training For example, to launch CenterMask training with VoVNetV2-39 backbone on 8 GPUs, one should execute: @@ -55,6 +64,6 @@ cd centermask2 python3 train_net.py --config-file "configs/centermask/centermask_V_39_eSE_FPN_ms_3x.yaml" --num-gpus 8 ``` -## Reference +## References - [CenterMask2](https://github.com/youngwanLEE/centermask2) \ No newline at end of file diff --git a/cv/detection/centernet/paddlepaddle/README.md b/cv/detection/centernet/paddlepaddle/README.md index 485c006e901e60e94920ffcaa221082c39188a7b..f1ad9a71b7b8d5855f5fd167c26b4bb3a777d403 100644 --- a/cv/detection/centernet/paddlepaddle/README.md +++ b/cv/detection/centernet/paddlepaddle/README.md @@ -1,52 +1,54 @@ # CenterNet -## Model description -Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time. +## Model Description -## 克隆代码 +CenterNet is an efficient object detection model that represents objects as single points (their bounding box centers) +rather than traditional bounding boxes. It uses keypoint estimation to locate centers and regresses other object +properties like size and orientation. This approach eliminates the need for anchor boxes and non-maximum suppression, +making it simpler and faster. CenterNet achieves state-of-the-art speed-accuracy trade-offs on benchmarks like COCO and +can be extended to 3D detection and pose estimation tasks. -``` +## Model Preparation + +### Prepare Resources + +```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## 安装PaddleDetection +### Install Dependencies -``` -cd PaddleDetection +```bash pip install -r requirements.txt python3 setup.py install ``` -## 下载COCO数据集 +## Model Training -``` -python3 dataset/coco/download_coco.py -``` - -## 运行代码 - -``` -# GPU多卡训练 +```bash +# Multi-GPU export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/centernet/centernet_r50_140e_coco.yml --eval -# GPU单卡训练 +# Single-GPU export CUDA_VISIBLE_DEVICES=0 - python3 tools/train.py -c configs/centernet/centernet_r50_140e_coco.yml --eval -# finetune +# Finetune export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/centernet/centernet_r50_140e_coco.yml -o pretrain_weights=https://bj.bcebos.com/v1/paddledet/models/centernet_r50_140e_coco.pdparams --eval -# 注:默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整config中的学习率(例如,除以8) +# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training, +# you need to adjust the learning rate in the config accordingly (e.g., divide by 8). ``` -## finetune Results on BI-V100 +## Model Results -| GPUs | learning rate | FPS | Train Epochs | mAP | -|------|------------|-----|--------------|------| -| 1x8 | 0.00005 | 10.85 | 3 | 38.5 | \ No newline at end of file +| GPU | learning rate | FPS | Train Epochs | mAP | +|------------|---------------|-------|--------------|------| +| BI-V100 x8 | 0.00005 | 10.85 | 3 | 38.5 | diff --git a/cv/detection/centernet/pytorch/README.md b/cv/detection/centernet/pytorch/README.md index d458aa219c20b27d098a2741f90d0ee79f2dcf2e..2b97bd94229e483030cb50f72241f0c0bf1a882a 100644 --- a/cv/detection/centernet/pytorch/README.md +++ b/cv/detection/centernet/pytorch/README.md @@ -1,43 +1,22 @@ # CenterNet -## Model description +## Model Description -Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time. +CenterNet is an efficient object detection model that represents objects as single points (their bounding box centers) +rather than traditional bounding boxes. It uses keypoint estimation to locate centers and regresses other object +properties like size and orientation. This approach eliminates the need for anchor boxes and non-maximum suppression, +making it simpler and faster. CenterNet achieves state-of-the-art speed-accuracy trade-offs on benchmarks like COCO and +can be extended to 3D detection and pose estimation tasks. -## Step 1: Installing packages +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +### Prepare Resources -pip3 install -r requirements.txt -git clone https://github.com/xingyizhou/CenterNet.git -git checkout 4c50fd3a46bdf63dbf2082c5cbb3458d39579e6c +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -# Compile deformable convolutional(DCNv2) -cd ./src/lib/models/networks/ -rm -rf DCNv2 -git clone -b pytorch_1.11 https://github.com/lbin/DCNv2.git -cd ./DCNv2/ -python3 setup.py build develop -``` - -## Step 2: Preparing datasets - -### Go back to the "pytorch/" directory - -```bash -cd ../../../../../ -``` - -### Download coco2017 - -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -58,41 +37,52 @@ coco2017 └── ... ``` -### Set up soft link to coco2017 - ```bash ln -s /path/to/coco2017 ./data/coco ``` -### Prepare offline file "resnet18-5c106cde.pth" if download fails - ```bash +# Prepare offline file "resnet18-5c106cde.pth" if download fails wget https://download.pytorch.org/models/resnet18-5c106cde.pth mkdir -p /root/.cache/torch/hub/checkpoints/ mv resnet18-5c106cde.pth /root/.cache/torch/hub/checkpoints/ ``` -## Step 3: Training - -### Setup CUDA_VISIBLE_DEVICES variable +### Install Dependencies ```bash -export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +pip3 install -r requirements.txt +git clone https://github.com/xingyizhou/CenterNet.git +git checkout 4c50fd3a46bdf63dbf2082c5cbb3458d39579e6c + +# Compile deformable convolutional(DCNv2) +cd ./src/lib/models/networks/ +rm -rf DCNv2 +git clone -b pytorch_1.11 https://github.com/lbin/DCNv2.git +cd ./DCNv2/ +python3 setup.py build develop ``` -### On single GPU +## Model Training ```bash cd ./cv/detection/centernet/pytorch/src touch lib/datasets/__init__.py -python3 main.py ctdet --arch res_18 --batch_size 32 --master_batch 15 --lr 1.25e-4 --gpus 0 -``` +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -### Multiple GPUs on one machine +# On single GPU +python3 main.py ctdet --arch res_18 --batch_size 32 --master_batch 15 --lr 1.25e-4 --gpus 0 -```bash +# Multiple GPUs on one machine python3 main.py ctdet --arch res_18 --batch_size 128 --master_batch 60 --lr 1.25e-4 --gpus 0,1,2,3,4,5,6,7 ``` -## Reference -https://github.com/xingyizhou/CenterNet +## References + +- [CenterNet](https://github.com/xingyizhou/CenterNet) diff --git a/cv/detection/co-detr/pytorch/README.md b/cv/detection/co-detr/pytorch/README.md index e864b540f6451f8e7c5efbbd0c82b6068c13636b..c79785662da66eaa06b570a2fb0436e208a371f5 100644 --- a/cv/detection/co-detr/pytorch/README.md +++ b/cv/detection/co-detr/pytorch/README.md @@ -1,44 +1,22 @@ # Co-DETR (DETRs with Collaborative Hybrid Assignments Training) -This repo is the official implementation of ["DETRs with Collaborative Hybrid Assignments Training"](https://arxiv.org/pdf/2211.12860.pdf) by Zhuofan Zong, Guanglu Song, and Yu Liu. -## Model description +## Model Description -In this paper, we present a novel collaborative hybrid assignments training scheme, namely Co-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. -1. **Encoder optimization**: The proposed training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training multiple parallel auxiliary heads supervised by one-to-many label assignments. -2. **Decoder optimization**: We conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve attention learning of the decoder. -3. **State-of-the-art performance**: Co-DETR with [ViT-L](https://github.com/baaivision/EVA/tree/master/EVA-02) (304M parameters) is **the first model to achieve 66.0 AP on COCO test-dev.** +Co-DETR is an advanced object detection model that enhances DETR (DEtection TRansformer) through collaborative hybrid +assignments training. It improves encoder learning by training multiple auxiliary heads with one-to-many label +assignments and optimizes decoder attention using customized positive queries. Co-DETR achieves state-of-the-art +performance, being the first model to reach 66.0 AP on COCO test-dev with ViT-L. This approach significantly boosts +detection accuracy and efficiency while maintaining end-to-end training simplicity. -## Step 1: Installation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +## Model Preparation -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` -### (2) install other -```bash -pip3 install -r requirements.txt -pip3 install urllib3==1.26.15 -``` +### Prepare Resources -### (3) download repo -```bash -git clone https://github.com/Sense-X/Co-DETR.git -cd /path/to/Co-DETR -git checkout bf3d49d7c02929788dfe2f251b6b01cbe196b736 -``` - -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco` to your COCO path in later training process, the unzipped +dataset path structure sholud look like: ```bash coco @@ -59,15 +37,37 @@ coco └── ... ``` -## Step 3: Training +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . + +# Install requirements +pip3 install -r requirements.txt +pip3 install urllib3==1.26.15 + +# Download repo +git clone https://github.com/Sense-X/Co-DETR.git +cd /path/to/Co-DETR +git checkout bf3d49d7c02929788dfe2f251b6b01cbe196b736 +``` + +## Model Training ```bash # Make coco dataset path soft link to ./data/coco mkdir data/ ln -s /path/to/coco ./data -``` -```bash # One GPU export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py projects/configs/co_deformable_detr/co_deformable_detr_r50_1x_coco.py --work-dir path_to_exp --no-validate --auto-resume @@ -80,11 +80,13 @@ export CUDA_VISIBLE_DEVICES=0 PYTHONPATH=".:$PYTHONPATH" python3 tools/test.py projects/configs/co_deformable_detr/co_deformable_detr_r50_1x_coco.py path_to_exp/latest.pth --eval bbox ``` -## Results +## Model Results + +| Model | GPU | FPS | Train Epochs | Box AP | +|---------|------------|------|--------------|--------| +| Co-DETR | BI-V100 x8 | 9.02 | 12 | 0.428 | -| GPUs | FPS | Train Epochs | Box AP | -|------|---------|----------|--------| -| BI-V100 x8 | 9.02 | 12 | 0.428 | +## References -## Reference -- [Co-DETR](https://github.com/Sense-X/Co-DETR) \ No newline at end of file +- [Paper](https://arxiv.org/pdf/2211.12860.pdf) +- [Co-DETR](https://github.com/Sense-X/Co-DETR) diff --git a/cv/detection/cornernet_mmdet/pytorch/README.md b/cv/detection/cornernet_mmdet/pytorch/README.md index 876cb1f98d8602d39e0d73ef9d53f217fd47818b..169c9c8ce7abbbceeb2a99cb1f8f508b7a3d6338 100644 --- a/cv/detection/cornernet_mmdet/pytorch/README.md +++ b/cv/detection/cornernet_mmdet/pytorch/README.md @@ -1,29 +1,23 @@ # CornerNet -## Model description +## Model Description -CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors. +CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the +top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired +keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In +addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network +better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing +one-stage detectors. -## Step 1: Installation -CornerNet model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +## Model Preparation -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` +### Prepare Resources -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,7 +38,24 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies + +CornerNet model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . +``` + +## Model Training ```bash # Make soft link to dataset @@ -61,11 +72,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/cornernet/cornernet_hourglass104_8xb6-210e-mstest_coco.py 8 ``` -## Results +## Model Results + +| Model | GPU | FP32 | +|-----------|------------|----------| +| CornerNet | BI-V100 x8 | MAP=41.2 | -| GPUs | FP32 | -| ---------- | -------- | -| BI-V100 x8 | MAP=41.2 | +## References -## Reference -- [Cornernet: Detecting objects as paired keypoints](https://arxiv.org/abs/1808.01244) +- [Paper](https://arxiv.org/abs/1808.01244) diff --git a/cv/detection/dcnv2_mmdet/pytorch/README.md b/cv/detection/dcnv2_mmdet/pytorch/README.md index 77246fee78198a2ba4bfc385ca93a09bcbcfae83..4f7f8595ea25439a5e396245c6b16a819afa6f02 100644 --- a/cv/detection/dcnv2_mmdet/pytorch/README.md +++ b/cv/detection/dcnv2_mmdet/pytorch/README.md @@ -1,31 +1,23 @@ # DCNv2 -## Model description +## Model Description -The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of RCNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation. +DCNv2 (Deformable Convolutional Networks v2) is an advanced convolutional neural network architecture that enhances +spatial transformation capabilities. It improves upon DCN by introducing a modulation mechanism for more precise +deformation modeling and better focus on relevant image regions. DCNv2 integrates deformable convolution more +comprehensively throughout the network, enabling superior adaptation to object geometry. This results in +state-of-the-art performance on object detection and instance segmentation tasks, particularly on benchmarks like COCO, +while maintaining computational efficiency. -## Step 1: Installation +## Model Preparation -DCNv2 model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. - -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +### Prepare Resources -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` - -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -46,7 +38,24 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies + +DCNv2 model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . +``` + +## Model Training ```bash # Make soft link to dataset @@ -63,12 +72,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/dcnv2/faster-rcnn_r50-mdconv-c3-c5_fpn_1x_coco.py 8 ``` -## Results +## Model Results -| GPUs | FP32 | -| ---------- | -------- | -| BI-V100 x8 | MAP=41.2 | + | GPU | FP32 | mAP | + |-------|------------|------| + | DCNv2 | BI-V100 x8 | 41.2 | -## Reference +## References -- [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168) +- [Paper](https://arxiv.org/abs/1811.11168) diff --git a/cv/detection/detr/paddlepaddle/README.md b/cv/detection/detr/paddlepaddle/README.md index d09ff584d4c9c116167bdd44e9410c6ea6485769..326b1b5acd4ada485200c4590527231063e25282 100644 --- a/cv/detection/detr/paddlepaddle/README.md +++ b/cv/detection/detr/paddlepaddle/README.md @@ -1,52 +1,54 @@ # DETR -## Model description -DETR is an object detection model based on transformer. We reproduced the model of the paper. +## Model Description -## 克隆代码 +DETR (DEtection TRansformer) is a novel object detection model that replaces traditional convolutional methods with a +transformer-based architecture. It treats object detection as a direct set prediction problem, eliminating the need for +anchor boxes and non-maximum suppression. DETR uses a transformer encoder-decoder structure to process image features +and predict object bounding boxes and classes simultaneously. This end-to-end approach simplifies the detection pipeline +while achieving competitive performance on benchmarks like COCO, offering a new paradigm for object detection tasks. -``` +## Model Preparation + +### Prepare Resources + +```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## 安装PaddleDetection +### Install Dependencies -``` -cd PaddleDetection +```bash pip install -r requirements.txt python3 setup.py install ``` -## 下载COCO数据集 +## Model Training -``` -python3 dataset/coco/download_coco.py -``` - -## 运行代码 - -``` -# GPU多卡训练 +```bash +# Multi-GPU export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/detr/detr_r50_1x_coco.yml --eval -# GPU单卡训练 +# Single-GPU export CUDA_VISIBLE_DEVICES=0 - python3 tools/train.py -c configs/detr/detr_r50_1x_coco.yml --eval -# finetune +# Finetune export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/detr/detr_r50_1x_coco.yml -o pretrain_weights=https://paddledet.bj.bcebos.com/models/detr_r50_1x_coco.pdparams --eval -# 注:默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整config中的学习率(例如,除以8) +# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training, +# you need to adjust the learning rate in the config accordingly (e.g., divide by 8). ``` -## finetune Results on BI-V100 +## Model Results -| GPUs | learning rate | FPS | Train Epochs | Box AP | -|------|------------|-----|--------------|------| -| 1x8 | 0.00001 | 14.64 | 1 | 42.0 | \ No newline at end of file +| Model | GPU | learning rate | FPS | Train Epochs | Box AP | +|-------|------------|---------------|-------|--------------|--------| +| DETR | BI-V100 x8 | 0.00001 | 14.64 | 1 | 42.0 | diff --git a/cv/detection/fasterrcnn/pytorch/README.md b/cv/detection/fasterrcnn/pytorch/README.md index 7d58e01e0cb3d07dab5fa1bf40d5e83f46603d01..9b071be039bdca5e0ebb433148bf0d056fdf6e2b 100644 --- a/cv/detection/fasterrcnn/pytorch/README.md +++ b/cv/detection/fasterrcnn/pytorch/README.md @@ -1,26 +1,22 @@ # Faster R-CNN -## Model description +## Model Description -State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available. +Faster R-CNN is a state-of-the-art object detection model that introduces a Region Proposal Network (RPN) to generate +region proposals efficiently. It shares convolutional features between the RPN and detection network, enabling nearly +cost-free region proposals. This architecture significantly improves detection speed and accuracy compared to its +predecessors. Faster R-CNN achieves excellent performance on benchmarks like PASCAL VOC and COCO, and serves as the +foundation for many winning entries in computer vision competitions. -## Step 1: Installing packages -``` -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-dev +## Model Preparation -cd /start_scripts -bash init_torch.sh -``` - -## Step 2: Preparing datasets +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -41,19 +37,31 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies -### On single GPU (AMP) +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-dev + +cd /start_scripts +bash init_torch.sh ``` + +## Model Training + +```bash +# On single GPU (AMP) cd /start_scripts bash train_fasterrcnn_resnet50_amp_torch.sh --dataset coco --data-path /path/to/coco2017 -``` -### Multiple GPUs on one machine -``` +## Multiple GPUs on one machine cd /start_scripts bash train_fasterrcnn_resnet50_amp_dist_torch.sh --dataset coco --data-path /path/to/coco2017 ``` -## Reference -https://github.com/pytorch/vision \ No newline at end of file +## References + +- [vision](https://github.com/pytorch/vision) diff --git a/cv/detection/fcos/paddlepaddle/README.md b/cv/detection/fcos/paddlepaddle/README.md index 1924f0ecfd95a6f2064578f810d1d36713baa34b..877b88a6ecdf4d5c29bac39453a3b903bec17c73 100644 --- a/cv/detection/fcos/paddlepaddle/README.md +++ b/cv/detection/fcos/paddlepaddle/README.md @@ -1,47 +1,49 @@ # FCOS -## Model description -FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free object detection framework with strong performance. +## Model Description -## Get PaddleDetection source code +FCOS (Fully Convolutional One-Stage Object Detection) is an anchor-free object detection model that predicts bounding +boxes directly without anchor boxes. It uses a fully convolutional network to detect objects by predicting per-pixel +bounding boxes and class labels. FCOS simplifies the detection pipeline, reduces hyperparameters, and achieves +competitive performance on benchmarks like COCO. Its center-ness branch helps suppress low-quality predictions, making +it efficient and effective for various detection tasks. -``` +## Model Preparation + +### Prepare Resources + +```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## Install PaddleDetection +### Install Dependencies -``` -cd PaddleDetection +```bash pip install -r requirements.txt python3 setup.py install ``` -## Prepare datasets +## Model Training -``` -python3 dataset/coco/download_coco.py -``` - -## Train - -``` -# GPU多卡训练 +```bash +# Multi-GPU export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --eval -# GPU单卡训练 +# Single-GPU export CUDA_VISIBLE_DEVICES=0 - python3 tools/train.py -c configs/fcos/fcos_r50_fpn_1x_coco.yml --eval -# 注:默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整config中的学习率(例如,除以8) - +# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training, +# you need to adjust the learning rate in the config accordingly (e.g., divide by 8). ``` -## Results on BI-V100 +## Model Results -| GPUs | FPS | Train Epochs | Box AP | -|------|-----|--------------|------| -| 1x8 | 8.24 | 12 | 39.7 | + | Model | GPU | FPS | Train Epochs | Box AP | + |-------|------------|------|--------------|--------| + | FCOS | BI-V100 x8 | 8.24 | 12 | 39.7 | diff --git a/cv/detection/fcos/pytorch/README.md b/cv/detection/fcos/pytorch/README.md index 4ab40a13a275e70d0a37876da8ac96177974d976..2236e5764295baa0199242c8b9cbf3aa3b40113c 100755 --- a/cv/detection/fcos/pytorch/README.md +++ b/cv/detection/fcos/pytorch/README.md @@ -1,21 +1,22 @@ # FCOS -## Model description -FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free object detection framework with strong performance. -The full paper is available at: [https://arxiv.org/abs/1904.01355](https://arxiv.org/abs/1904.01355). +## Model Description -## Step 1: Installation +FCOS (Fully Convolutional One-Stage Object Detection) is an anchor-free object detection model that predicts bounding +boxes directly without anchor boxes. It uses a fully convolutional network to detect objects by predicting per-pixel +bounding boxes and class labels. FCOS simplifies the detection pipeline, reduces hyperparameters, and achieves +competitive performance on benchmarks like COCO. Its center-ness branch helps suppress low-quality predictions, making +it efficient and effective for various detection tasks. -``` -pip3 install -r requirements.txt -python3 setup.py develop -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,9 +45,17 @@ wget https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl mv R-50.pkl /root/.torch/models/ ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +python3 setup.py develop +``` + +## Model Training -The following command line will train FCOS_imprv_R_50_FPN_1x on 8 GPUs with Synchronous Stochastic Gradient Descent (SGD): +The following command line will train FCOS_imprv_R_50_FPN_1x on 8 GPUs with Synchronous Stochastic Gradient Descent +(SGD): ```bash python3 -m torch.distributed.launch \ @@ -56,20 +65,25 @@ python3 -m torch.distributed.launch \ DATALOADER.NUM_WORKERS 2 \ OUTPUT_DIR training_dir/fcos_imprv_R_50_FPN_1x ``` - + Note that: -1) If you want to use fewer GPUs, please change `--nproc_per_node` to the number of GPUs. No other settings need to be changed. The total batch size does not depends on `nproc_per_node`. If you want to change the total batch size, please change `SOLVER.IMS_PER_BATCH` in [configs/fcos/fcos_R_50_FPN_1x.yaml](configs/fcos/fcos_R_50_FPN_1x.yaml). + +1) If you want to use fewer GPUs, please change `--nproc_per_node` to the number of GPUs. No other settings need to be + changed. The total batch size does not depends on `nproc_per_node`. If you want to change the total batch size, + please change `SOLVER.IMS_PER_BATCH` in [configs/fcos/fcos_R_50_FPN_1x.yaml](configs/fcos/fcos_R_50_FPN_1x.yaml). 2) The models will be saved into `OUTPUT_DIR`. 3) If you want to train FCOS with other backbones, please change `--config-file`. -4) If you want to train FCOS on your own dataset, please follow this instruction [#54](https://github.com/tianzhi0549/FCOS/issues/54#issuecomment-497558687). -5) Now, training with 8 GPUs and 4 GPUs can have the same performance. Previous performance gap was because we did not synchronize `num_pos` between GPUs when computing loss. +4) If you want to train FCOS on your own dataset, please follow this instruction + [#54](https://github.com/tianzhi0549/FCOS/issues/54#issuecomment-497558687). +5) Now, training with 8 GPUs and 4 GPUs can have the same performance. Previous performance gap was because we did not + synchronize `num_pos` between GPUs when computing loss. -## Results +## Model Results -| GPUs | FPS | Train Epochs | Box AP| -|------|-----|--------------|-------| -| BI-V100 x8 | 8.24 | 12 | 38.7 | + | Model | GPU | FPS | Train Epochs | Box AP | + |-------|------------|------|--------------|--------| + | FCOS | BI-V100 x8 | 8.24 | 12 | 38.7 | -## Reference +## References - [FCOS](https://github.com/tianzhi0549/FCOS) diff --git a/cv/detection/mamba_yolo/pytorch/README.md b/cv/detection/mamba_yolo/pytorch/README.md index a650d435e686da92a22fba20f4d0e98db1fa3908..29d77994172c687b62448c6501d16805bb05fb1e 100644 --- a/cv/detection/mamba_yolo/pytorch/README.md +++ b/cv/detection/mamba_yolo/pytorch/README.md @@ -1,26 +1,20 @@ # Mamba-YOLO -## Model description +## Model Description -Mamba-YOLO is an innovative object detection model that integrates State Space Models (SSMs) into the YOLO (You Only Look Once) architecture to enhance performance in complex visual tasks. This integration aims to improve the model's ability to capture global dependencies and process long-range information efficiently. +Mamba-YOLO is an innovative object detection model that integrates State Space Models (SSMs) into the YOLO (You Only +Look Once) architecture to enhance performance in complex visual tasks. This integration aims to improve the model's +ability to capture global dependencies and process long-range information efficiently. -## Step 1: Installation +## Model Preparation -```sh -pip3 install seaborn thop timm einops - -git clone --depth 1 https://gitee.com/deep-spark/deepsparkhub-GPL.git -cd cv/detection/mamba-yolo/pytorch - -cd selective_scan && pip install . && cd .. -pip install -v -e . -``` +### Prepare Resources -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```sh coco2017 @@ -44,13 +38,26 @@ coco2017 Modify the configuration file(data/coco.yaml) ```sh -vim ultralytics/cfg/datasets/coco.yaml # path: the root of coco data # train: the relative path of train images # val: the relative path of valid images +vim ultralytics/cfg/datasets/coco.yaml + +``` + +### Install Dependencies + +```sh +pip3 install seaborn thop timm einops + +git clone --depth 1 https://gitee.com/deep-spark/deepsparkhub-GPL.git +cd cv/detection/mamba-yolo/pytorch + +cd selective_scan && pip install . && cd .. +pip install -v -e . ``` -## Step 3: Training +## Model Training ```sh python3 mbyolo_train.py --task train --data ultralytics/cfg/datasets/coco.yaml \ @@ -58,6 +65,6 @@ python3 mbyolo_train.py --task train --data ultralytics/cfg/datasets/coco.yaml \ --amp --project ./output_dir/mscoco --name mambayolo_n ``` -## Reference +## References - [Mamba-YOLO](https://github.com/HZAI-ZJNU/Mamba-YOLO/tree/main) diff --git a/cv/detection/maskrcnn/paddlepaddle/README.md b/cv/detection/maskrcnn/paddlepaddle/README.md index 5ec4d0732c64aa5f66d0f505cdf528685931d9b1..7efd041385ca1ce3afbdac8deb22b7c115d2d8dd 100644 --- a/cv/detection/maskrcnn/paddlepaddle/README.md +++ b/cv/detection/maskrcnn/paddlepaddle/README.md @@ -1,40 +1,44 @@ # Mask R-CNN -## Model description +## Model Description -Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance. +Mask R-CNN is an advanced instance segmentation model that extends Faster R-CNN by adding a parallel branch for +predicting object masks. It efficiently detects objects in an image while simultaneously generating high-quality +segmentation masks for each instance. Mask R-CNN maintains the two-stage architecture of Faster R-CNN but introduces a +fully convolutional network for mask prediction. This model achieves state-of-the-art performance on tasks like object +detection, instance segmentation, and human pose estimation. -## Step 1: Installing +## Model Preparation -```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip3 install -r requirements.txt -python3 setup.py install --user -``` +### Prepare Resources -## Step 2: Download data +```bash +git clone https://github.com/PaddlePaddle/PaddleDetection.git -``` +cd PaddleDetection/ +# Get COCO Dataset python3 dataset/coco/download_coco.py ``` -## Step 3: Run Mask R-CNN +### Install Dependencies ```bash +pip install -r requirements.txt +python3 setup.py install +``` +## Model Training + +```bash export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 + python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml --use_vdl=true --eval ``` -## Results on BI-V100 - -
- -| GPU | FP32 | -| ----------- | ------------------------------------ | -| 8 cards | bbox=38.8,FPS=7.5,BatchSize=1 | +## Model Results -
+| Model | GPU | FP32 | +|------------|------------|-------------------------------| +| Mask R-CNN | BI-V100 x8 | bbox=38.8,FPS=7.5,BatchSize=1 | diff --git a/cv/detection/maskrcnn/pytorch/README.md b/cv/detection/maskrcnn/pytorch/README.md index 4c26992fb93a576b2b660e923de94f36e300e7c0..9361b58f9951bae5d738fedd5c840c106f0c119f 100644 --- a/cv/detection/maskrcnn/pytorch/README.md +++ b/cv/detection/maskrcnn/pytorch/README.md @@ -1,27 +1,22 @@ # Mask R-CNN -## Model description +## Model Description -Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance. +Mask R-CNN is an advanced instance segmentation model that extends Faster R-CNN by adding a parallel branch for +predicting object masks. It efficiently detects objects in an image while simultaneously generating high-quality +segmentation masks for each instance. Mask R-CNN maintains the two-stage architecture of Faster R-CNN but introduces a +fully convolutional network for mask prediction. This model achieves state-of-the-art performance on tasks like object +detection, instance segmentation, and human pose estimation. -## Step 1: Installing packages -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-dev - -pip3 install -r requirements.txt -``` +## Model Preparation -## Step 2: Preparing datasets and model +### Prepare Resources -Download from https://download.pytorch.org/models/resnet50-0676ba61.pth and mv to /root/.cache/torch/hub/checkpoints/ +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -47,16 +42,36 @@ mkdir -p ./datasets/ ln -s /path/to/coco2017 ./datasets/coco ``` -## Step 3: Training +Download from and mv to /root/.cache/torch/hub/checkpoints/. + +```bash +wget https://download.pytorch.org/models/resnet50-0676ba61.pth +mkdir -p /root/.cache/torch/hub/checkpoints/ +mv resnet50-0676ba61.pth /root/.cache/torch/hub/checkpoints/ +``` + +### Install Dependencies -### Single Card +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-dev + +pip3 install -r requirements.txt +``` + +## Model Training + +```bash +# Single Card python3 train.py --data-path ./datasets/coco --dataset coco --model maskrcnn_resnet50_fpn --lr 0.001 --batch-size 4 -### AMP +# AMP python3 train.py --data-path ./datasets/coco --dataset coco --model maskrcnn_resnet50_fpn --lr 0.001 --batch-size 1 --amp -### DDP -``` +# DDP python3 -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\ --data-path ./datasets/coco --dataset coco --model maskrcnn_resnet50_fpn --wd 0.000001 --lr 0.001 --batch-size 4 -``` \ No newline at end of file +``` diff --git a/cv/detection/oc_sort/paddlepaddle/README.md b/cv/detection/oc_sort/paddlepaddle/README.md index abd14b6a74291a21730e6ab40ee83b53af4d3569..3dcc31367202da22953a6aa8818c30cc09c1c0cb 100644 --- a/cv/detection/oc_sort/paddlepaddle/README.md +++ b/cv/detection/oc_sort/paddlepaddle/README.md @@ -1,24 +1,17 @@ -# OC_SORT +# OC-SORT -## Model description -Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes and when objects are in non-linear motion. It is designed by recognizing and fixing limitations in Kalman filter and SORT. It is flexible to integrate with different detectors and matching modules, such as appearance similarity. It remains, Simple, Online and Real-time. +## Model Description -## Step 1: Installation -```bash -git clone https://github.com/PaddlePaddle/PaddleDetection.git -``` - -```bash -cd PaddleDetection -yum install mesa-libGL -y +OC-SORT (Observation-Centric SORT) is an advanced multi-object tracking algorithm that enhances traditional SORT by +addressing limitations in Kalman filters and non-linear motion scenarios. It improves tracking robustness in crowded +scenes and complex motion patterns while maintaining simplicity and real-time performance. OC_SORT focuses on +observation-centric updates, making it more reliable for object tracking in challenging environments. It remains +flexible for integration with various detectors and matching modules, offering improved accuracy without compromising +speed. -pip3 install -r requirements.txt -pip3 install protobuf==3.20.1 -pip3 install urllib3==1.26.6 -pip3 install scikit-learn -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources - **MOT17_ch datasets** @@ -27,7 +20,11 @@ cd dataset/mot git clone https://github.com/ifzhang/ByteTrack.git ``` -Download [MOT17](https://motchallenge.net/), [MOT20](https://motchallenge.net/), [CrowdHuman](https://www.crowdhuman.org/), [Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), [ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) and put them under /datasets in the following structure: +Download [MOT17](https://motchallenge.net/), [MOT20](https://motchallenge.net/), +[CrowdHuman](https://www.crowdhuman.org/), +[Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), +[ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) and put them under +/datasets in the following structure: ```bash datasets/ @@ -115,7 +112,6 @@ ln -s ByteTrack/datasets/mix_mot_ch mix_mot_ch Download [MOT17](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip) and put them under /datasets/mot/ in the following structure: - ```bash MOT17/ └──images @@ -153,11 +149,23 @@ cd /datasets/mot/MOT17 ln -s ../ByteTrack/datasets/mot/annotations ./ ``` -## Step 3: Training +### Install Dependencies ```bash -cd PaddleDetection +yum install mesa-libGL -y + +git clone https://github.com/PaddlePaddle/PaddleDetection.git +cd PaddleDetection/ + +pip3 install -r requirements.txt +pip3 install protobuf==3.20.1 +pip3 install urllib3==1.26.6 +pip3 install scikit-learn +``` +## Model Training + +```bash # mix_mot_ch datasets python3 -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/yolox_x_24e_800x1440_mix_mot_ch.yml --eval --amp @@ -168,11 +176,12 @@ python3 -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 to CUDA_VISIBLE_DEVICES=0 python3 tools/eval_mot.py -c configs/mot/ocsort/ocsort_ppyoloe.yml --scaled=True ``` -## Results +## Model Results + +| Model | GPU | DATASET | IPS | MOTA | +|---------|------------|-------------------|--------|------| +| OC-SORT | BI-V100 x8 | MOT-17 half train | 6.5907 | 57.5 | -| GPUs | DATASET | IPS | MOTA | -|-------------|-----------|-----------|----------| -| BI-V100 x8 | MOT-17 half train| 6.5907 | 57.5 | +## References -## Reference - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) diff --git a/cv/detection/oriented_reppoints/pytorch/README.md b/cv/detection/oriented_reppoints/pytorch/README.md index 24db5781ad7001b1babfa02a54ab3bafe82c672f..da3c445d9c8b4915bdc0f7ed23dd05beb3d39946 100644 --- a/cv/detection/oriented_reppoints/pytorch/README.md +++ b/cv/detection/oriented_reppoints/pytorch/README.md @@ -1,45 +1,20 @@ # Oriented RepPoints -## Model description +## Model Description -In contrast to the generic object, aerial targets are often non-axis aligned with arbitrary orientations having -the cluttered surroundings. Unlike the mainstreamed approaches regressing the bounding box orientations, this paper -proposes an effective adaptive points learning approach to aerial object detection by taking advantage of the adaptive -points representation, which is able to capture the geometric information of the arbitrary-oriented instances. -To this end, three oriented conversion functions are presented to facilitate the classification and localization -with accurate orientation. Moreover, we propose an effective quality assessment and sample assignment scheme for -adaptive points learning toward choosing the representative oriented reppoints samples during training, which is -able to capture the non-axis aligned features from adjacent objects or background noises. A spatial constraint is -introduced to penalize the outlier points for roust adaptive learning. Experimental results on four challenging -aerial datasets including DOTA, HRSC2016, UCAS-AOD and DIOR-R, demonstrate the efficacy of our proposed approach. +Oriented RepPoints is an innovative object detection model designed for aerial imagery, where objects often appear in +arbitrary orientations. It uses adaptive points representation to capture geometric information of non-axis aligned +instances, offering more precise detection than traditional bounding box approaches. The model incorporates three +oriented conversion functions for accurate classification and localization, along with a quality assessment scheme to +handle cluttered backgrounds. It achieves state-of-the-art performance on aerial datasets like DOTA and HRSC2016. -## Step 1: Installation +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +### Prepare Resources -# Install mmdetection -pip install mmdet==3.3.0 +#### Get the DOTA dataset -# Install mmrotate -git clone -b v1.0.0rc1 https://gitee.com/open-mmlab/mmrotate.git --depth=1 -cd mmrotate/ -pip install -v -e . -sed -i 's/python /python3 /g' tools/dist_train.sh -sed -i 's/3.1.0/3.4.0/g' mmrotate/__init__.py -sed -i 's@points_range\s*=\s*torch\.arange\s*(\s*points\.shape\[0\]\s*)@&.to(points.device)@' mmrotate/models/task_modules/assigners/convex_assigner.py -sed -i 's/from collections import Sequence/from collections.abc import Sequence/g' mmrotate/models/detectors/refine_single_stage.py -``` - -## Step 2: Preparing datasets - -### Get the DOTA dataset - -The dota dataset can be downloaded from [here](https://captain-whu.github.io/DOTA/dataset.html). +The DOTA dataset can be downloaded from [here](https://captain-whu.github.io/DOTA/dataset.html). The data structure is as follows: ```bash @@ -58,7 +33,7 @@ mmrotate/data/DOTA/ └── labelTxt-v1.5 ``` -### Split dota dataset +#### Split the DOTA dataset Please crop the original images into 1024×1024 patches with an overlap of 200. @@ -70,7 +45,7 @@ python3 tools/data/dota/split/img_split.py --base-json \ tools/data/dota/split/split_configs/ss_test.json ``` -### Change root path in base config +#### Change root path in base config Please change `data_root` in `configs/_base_/datasets/dotav1.py` to split DOTA dataset. @@ -78,7 +53,29 @@ Please change `data_root` in `configs/_base_/datasets/dotav1.py` to split DOTA d sed -i 's#data/split_ss_dota1_5/#data/split_ss_dota/#g' configs/_base_/datasets/dotav15.py ``` -## Step 3: Training +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# Install mmdetection +pip install mmdet==3.3.0 + +# Install mmrotate +git clone -b v1.0.0rc1 https://gitee.com/open-mmlab/mmrotate.git --depth=1 +cd mmrotate/ +pip install -v -e . +sed -i 's/python /python3 /g' tools/dist_train.sh +sed -i 's/3.1.0/3.4.0/g' mmrotate/__init__.py +sed -i 's@points_range\s*=\s*torch\.arange\s*(\s*points\.shape\[0\]\s*)@&.to(points.device)@' mmrotate/models/task_modules/assigners/convex_assigner.py +sed -i 's/from collections import Sequence/from collections.abc import Sequence/g' mmrotate/models/detectors/refine_single_stage.py +``` + +## Model Training ```bash # On single GPU @@ -88,11 +85,12 @@ python3 tools/train.py configs/oriented_reppoints/oriented-reppoints-qbox_r50_fp bash tools/dist_train.sh configs/oriented_reppoints/oriented-reppoints-qbox_r50_fpn_1x_dota.py 8 ``` -## Results +## Model Results + +| Model | GPU | ACC | +|--------------------|------------|------------| +| Oriented RepPoints | BI-V100 x8 | MAP=0.8265 | -| GPUs | ACC | -|----------| ----------- | -| BI-V100 x8 | MAP=0.8265 | +## References -## Reference -[mmrotate](https://github.com/open-mmlab/mmrotate) \ No newline at end of file +- [mmrotate](https://github.com/open-mmlab/mmrotate) diff --git a/cv/detection/picodet/paddlepaddle/README.md b/cv/detection/picodet/paddlepaddle/README.md index b1b117c307a4890ffbfe4a5b3226770fa3accecb..5bcdedcb2932620c37a2631c01b11625c0d5f705 100644 --- a/cv/detection/picodet/paddlepaddle/README.md +++ b/cv/detection/picodet/paddlepaddle/README.md @@ -1,65 +1,58 @@ # PP-PicoDet -## Model description - PicoDet is an ultra lightweight real-time object detection model that includes four different sizes of XS/S/M/L. - By using structures such as TAL, ETA Head, and PAN, the accuracy of the model is improved. When deploying model inference, - the model supports including post-processing in the network, thereby supporting direct output of prediction results. +## Model Description + +PP-PicoDet is an ultra-lightweight real-time object detection model designed for efficient deployment. It comes in four +sizes (XS/S/M/L) and incorporates advanced structures like TAL, ETA Head, and PAN to boost accuracy. The model supports +end-to-end inference by including post-processing in the network, enabling direct prediction outputs. PP-PicoDet +achieves an excellent balance between speed and accuracy, making it ideal for applications requiring real-time detection +on resource-constrained devices. + +## Model Preparation + +### Prepare Resources -## Step 1:Installation ```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py +``` + +### Install Dependencies + +```bash +pip install -r requirements.txt python3 setup.py install + pip3 install protobuf==3.20.3 pip3 install numba==0.56.4 -yum install mesa-libGl -y -``` -## Step 2:Preparing datasets - Download the Coco datasets, Specify coco2017 to your Coco path in later training process. - The Coco datesets path structure should look like(assuming coco2017 is used): -```bash -coco2017 -├── annotations -│ ├── instances_train2017.json -│ ├── instances_val2017.json -│ └── ... -├── train2017 -│ ├── 000000000009.jpg -│ ├── 000000000025.jpg -│ └── ... -├── val2017 -│ ├── 000000000139.jpg -│ ├── 000000000285.jpg -│ └── ... -├── train2017.txt -├── val2017.txt -└── ... +yum install mesa-libGl -y ``` -## Step 3:Training +## Model Training -assuming we are going to train picodet-l, the model config file is 'configs/picodet/picodet_l_640_coco_lcnet.yml' -vim configs/datasets/coco_detection.yml, set 'dataset_dir' in the configuration file to coco2017, then start trainging. +Assuming we are going to train picodet-l, the model config file is 'configs/picodet/picodet_l_640_coco_lcnet.yml' vim +configs/datasets/coco_detection.yml, set 'dataset_dir' in the configuration file to coco2017, then start trainging. -single gpu: ```bash +# Single GPU export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/picodet/picodet_l_640_coco_lcnet.yml --eval -``` -multi-gpu: -```bash + +# Multi GPU export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/picodet/picodet_l_640_coco_lcnet.yml --eval ``` -## Results +## Model Results -| GPUs | IPS | mAP0.5:0.95 | mAP0.5 | -|-------------|-----------|--------------|--------------| -| BI-V100 x 8 | 19.84 | 41.2 | 58.2 | +| Model | GPU | IPS | mAP0.5:0.95 | mAP0.5 | +|------------|------------|-------|-------------|--------| +| PP-PicoDet | BI-V100 x8 | 19.84 | 41.2 | 58.2 | -## Reference: +## References -https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/picodet/README_en.md \ No newline at end of file +- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.6/configs/picodet/README_en.md) diff --git a/cv/detection/pp-yoloe/paddlepaddle/README.md b/cv/detection/pp-yoloe/paddlepaddle/README.md index fc00e046cb5f9a0debe6af83ac2831a6da1b4acb..817b1667c405f93fb4b07b01b6717abf8359f56d 100644 --- a/cv/detection/pp-yoloe/paddlepaddle/README.md +++ b/cv/detection/pp-yoloe/paddlepaddle/README.md @@ -1,27 +1,36 @@ # PP-YOLOE -## Model description +## Model Description -PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware. +PP-YOLOE is a high-performance single-stage anchor-free object detection model built upon PP-YOLOv2. It outperforms +various popular YOLO variants while maintaining deployment-friendly characteristics. The model comes in multiple sizes +(s/m/l/x) configurable through width and depth multipliers. PP-YOLOE avoids special operators like Deformable +Convolution, ensuring compatibility with diverse hardware. It achieves excellent speed-accuracy trade-offs, making it +suitable for real-time applications. The model's efficient architecture and optimization techniques make it a top choice +for object detection tasks. -## Step 1: Installing +## Model Preparation + +### Prepare Resources ```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip3 install -r requirements.txt + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## Step 2: Prepare Datasets +### Install Dependencies ```bash -python3 dataset/coco/download_coco.py +pip install -r requirements.txt +python3 setup.py install ``` -## Step 3: Training +## Model Training ```bash -# Train export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 @@ -33,6 +42,6 @@ python3 -u -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 \ -o log_iter=5 ``` -## Reference +## References - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) diff --git a/cv/detection/pp_yoloe+/paddlepaddle/README.md b/cv/detection/pp_yoloe+/paddlepaddle/README.md index 5b4fbf6fc5b806baa9a96987ac9e99eb10b585cb..851a0524ef8730e559377927ee1775f06bf5132f 100644 --- a/cv/detection/pp_yoloe+/paddlepaddle/README.md +++ b/cv/detection/pp_yoloe+/paddlepaddle/README.md @@ -1,29 +1,34 @@ # PP-YOLOE+ -## Model description +## Model Description -PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware. +PP-YOLOE+ is an enhanced version of PP-YOLOE, a high-performance anchor-free object detection model. It builds upon +PP-YOLOv2's architecture, offering improved accuracy and efficiency. The model comes in multiple sizes (s/m/l/x) +configurable through width and depth multipliers. PP-YOLOE+ maintains hardware compatibility by avoiding special +operators while achieving state-of-the-art speed-accuracy trade-offs. Its optimized architecture makes it ideal for +real-time applications, offering superior detection performance across various scenarios and hardware platforms. -## Step 1: Installation +## Model Preparation + +### Prepare Resources ```bash git clone -b release/2.7 https://github.com/PaddlePaddle/PaddleYOLO.git cd PaddleYOLO/ -pip3 install -r requirements.txt + +python3 dataset/coco/download_coco.py ``` -## Step 2: Preparing datasets +### Install Dependencies ```bash -python3 dataset/coco/download_coco.py +pip3 install -r requirements.txt ``` -## Step 3: Training +## Model Training -> **HINT:** -> +> HINT: > --eval : training with evaluation -> > --amp : Mixed-precision training ```bash @@ -49,14 +54,13 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c ${config} -o weights=${weights} # CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c ${config} -o weights=${weights} --infer_dir=demo/ --draw_threshold=0.5 ``` -## Results - +## Model Results -| GPUs | FPS | ACC | -| ------------ | ------------ | -------------------------- | -| BI-V100 x8 | ips:6.3293 | Best test bbox ap: 0.528 | +| Model | GPU | FPS | ACC | +|-----------|------------|------------|--------------------------| +| PP-YOLOE+ | BI-V100 x8 | ips:6.3293 | Best test bbox ap: 0.528 | -## Reference +## References +- [Paper](https://arxiv.org/pdf/2203.16250v3.pdf) - [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO) -- [PP-YOLOE](https://arxiv.org/pdf/2203.16250v3.pdf) diff --git a/cv/detection/pvanet/pytorch/README.md b/cv/detection/pvanet/pytorch/README.md index 84a0bea557eccc149e6d4da89c406f0535743698..ef0709db5eae4c9af0a7e21332d3597d173f402b 100755 --- a/cv/detection/pvanet/pytorch/README.md +++ b/cv/detection/pvanet/pytorch/README.md @@ -1,21 +1,22 @@ # PVANet -## Model description +## Model Description -In object detection, reducing computational cost is as important as improving accuracy for most practical usages. This paper proposes a novel network structure, which is an order of magnitude lighter than other state-of-the-art networks while maintaining the accuracy. Based on the basic principle of more layers with less channels, this new deep neural network minimizes its redundancy by adopting recent innovations including C.ReLU and Inception structure. We also show that this network can be trained efficiently to achieve solid results on well-known object detection benchmarks: 84.9% and 84.2% mAP on VOC2007 and VOC2012 while the required compute is less than 10% of the recent ResNet-101. +PVANet is an efficient deep learning model for object detection, designed to minimize computational cost while +maintaining high accuracy. It employs a lightweight architecture based on the principle of "more layers with fewer +channels," incorporating innovations like C.ReLU and Inception structures. PVANet achieves competitive results on VOC +benchmarks with significantly reduced computational requirements compared to heavier networks. Its optimized design +makes it suitable for real-time applications where both speed and accuracy are crucial. +## Model Preparation -## Step 1: Installing packages +### Prepare Resources -```shell -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' -``` - -## Step 2: Preparing COCO dataset +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -36,74 +37,23 @@ coco2017 └── ... ``` -## Step 3: Training on COCO dataset - -### Multiple GPUs on one machine +### Install Dependencies ```shell -bash train_pvanet_dist.sh --data-path /path/to/coco2017/ --dataset coco +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' ``` -### On single GPU +## Model Training ```shell -python3 train.py --data-path /path/to/coco2017/ --dataset coco -``` -### Arguments - -Ref: [torchvision](../../torchvision/pytorch/README.md) - -## Parameters +# Multiple GPUs on one machine +bash train_pvanet_dist.sh --data-path /path/to/coco2017/ --dataset coco -``` - --data-path DATA_PATH - dataset - --dataset DATASET dataset - --device DEVICE device - -b BATCH_SIZE, --batch-size BATCH_SIZE - images per gpu, the total batch size is $NGPU x batch_size - --epochs N number of total epochs to run - -j N, --workers N number of data loading workers (default: 4) - --lr LR initial learning rate, 0.02 is the default value for training on 8 gpus and 2 images_per_gpu - --momentum M momentum - --wd W, --weight-decay W - weight decay (default: 1e-4) - --lr-scheduler LR_SCHEDULER - the lr scheduler (default: multisteplr) - --lr-step-size LR_STEP_SIZE - decrease lr every step-size epochs (multisteplr scheduler only) - --lr-steps LR_STEPS [LR_STEPS ...] - decrease lr every step-size epochs (multisteplr scheduler only) - --lr-gamma LR_GAMMA decrease lr by a factor of lr-gamma (multisteplr scheduler only) - --print-freq PRINT_FREQ - print frequency - --output-dir OUTPUT_DIR - path where to save - --resume RESUME resume from checkpoint - --start_epoch START_EPOCH - start epoch - --aspect-ratio-group-factor ASPECT_RATIO_GROUP_FACTOR - --rpn-score-thresh RPN_SCORE_THRESH - rpn score threshold for faster-rcnn - --trainable-backbone-layers TRAINABLE_BACKBONE_LAYERS - number of trainable layers of backbone - --data-augmentation DATA_AUGMENTATION - data augmentation policy (default: hflip) - --sync-bn Use sync batch norm - --test-only Only test the model - --pretrained Use pre-trained models from the modelzoo - --local_rank LOCAL_RANK - Local rank - --world-size WORLD_SIZE - number of distributed processes - --dist-url DIST_URL url used to set up distributed training - --nhwc Use NHWC - --padding-channel Padding the channels of image to 4 - --amp Automatic Mixed Precision training - --seed SEED Random seed +# On single GPU +python3 train.py --data-path /path/to/coco2017/ --dataset coco ``` -## Reference +## References -https://github.com/sanghoon/pytorch_imagenet -https://github.com/pytorch/vision \ No newline at end of file +- [pytorch_imagenet](https://github.com/sanghoon/pytorch_imagenet) +- [vision](https://github.com/pytorch/vision) \ No newline at end of file diff --git a/cv/detection/reppoints_mmdet/pytorch/README.md b/cv/detection/reppoints_mmdet/pytorch/README.md index 038872b58064874f4a01782154dd0e562b61f137..b8849f69a677a4f6f83d71841d44c9daa1ef4931 100644 --- a/cv/detection/reppoints_mmdet/pytorch/README.md +++ b/cv/detection/reppoints_mmdet/pytorch/README.md @@ -1,29 +1,22 @@ # RepPoints -## Model description +## Model Description -Modern object detectors rely heavily on rectangular bounding boxes, such as anchors, proposals and the final predictions, to represent objects at various recognition stages. The bounding box is convenient to use but provides only a coarse localization of objects and leads to a correspondingly coarse extraction of object features. In this paper, we present RepPoints(representative points), a new finer representation of objects as a set of sample points useful for both localization and recognition. Given ground truth localization and recognition targets for training, RepPoints learn to automatically arrange themselves in a manner that bounds the spatial extent of an object and indicates semantically significant local areas. They furthermore do not require the use of anchors to sample a space of bounding boxes. We show that an anchor-free object detector based on RepPoints can be as effective as the state-of-the-art anchor-based detection methods, with 46.5 AP and 67.4 AP50 on the COCO test-dev detection benchmark, using ResNet-101 model. +RepPoints is an innovative object detection model that replaces traditional bounding boxes with a set of representative +points for more precise object localization and feature extraction. This anchor-free approach learns to arrange points +that bound objects and indicate semantically significant areas. RepPoints achieves state-of-the-art performance on COCO +benchmarks while eliminating the need for anchor boxes. Its finer representation enables better object understanding and +more accurate detection, particularly for complex shapes and overlapping objects. -## Step 1: Installation -RepPoints model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +## Model Preparation -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` +### Prepare Resources -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,7 +37,24 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies + +RepPoints model is using MMDetection toolbox. Before you run this model, you need to setup MMDetection first. + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . +``` + +## Model Training ```bash # Make soft link to dataset @@ -61,12 +71,12 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/reppoints/reppoints-moment_r101-dconv-c3-c5_fpn-gn_head-gn_2x_coco.py 8 ``` -## Results +## Model Results -| GPUs | FP32 | -| ----------- | -------- | -| BI-V100 x8 | MAP=43.2 | +| Model | GPU | FP32 | +|-----------|------------|----------| +| RepPoints | BI-V100 x8 | MAP=43.2 | -## Reference +## References -- [RepPoints: Point Set Representation for Object Detection](https://arxiv.org/abs/1904.11490) +- [Paper](https://arxiv.org/abs/1904.11490) diff --git a/cv/detection/retinanet/paddlepaddle/README.md b/cv/detection/retinanet/paddlepaddle/README.md index aa227a34960b217cd88fa3712bde253c8c3ce5a4..11e4287ad78fa5b25b836dc4a7e956830d31b741 100644 --- a/cv/detection/retinanet/paddlepaddle/README.md +++ b/cv/detection/retinanet/paddlepaddle/README.md @@ -1,24 +1,33 @@ # RetinaNet -## Model description -The paper proposes a method to convert a deep learning object detector into an equivalent spiking neural network. The aim is to provide a conversion framework that is not constrained to shallow network structures and classification problems as in state-of-the-art conversion libraries. The results show that models of higher complexity, such as the RetinaNet object detector, can be converted with limited loss in performance. +## Model Description -## Step 1: Installation +RetinaNet is a state-of-the-art object detection model that addresses the class imbalance problem in dense detection +through its novel Focal Loss. It uses a Feature Pyramid Network (FPN) backbone to detect objects at multiple scales +efficiently. RetinaNet achieves high accuracy while maintaining competitive speed, making it suitable for various +detection tasks. Its single-stage architecture combines the accuracy of two-stage detectors with the speed of +single-stage approaches, offering an excellent balance between performance and efficiency. + +## Model Preparation + +### Prepare Resources ```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip install -r requirements.txt -python3 setup.py install + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## Step 2: Preparing datasets +### Install Dependencies ```bash -python3 dataset/coco/download_coco.py +pip install -r requirements.txt +python3 setup.py install ``` -## Step 3: Training +## Model Training ```bash # 8 GPUs @@ -28,16 +37,16 @@ python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c co # 1 GPU export CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/retinanet/retinanet_r50_fpn_1x_coco.yml --eval -## Hint:Default LR is for "8x GPU", modify it if you're using single card for training (e.g. divide by 8). +## Hint:Default LR is for "8x GPU", modify it if you're using single card for training (e.g. divide by 8). ``` -## Results +## Model Results -| GPUs | FPS | Train Epochs | Box AP | -|-------------|------|--------------|--------| -| BI-V100 x 8 | 6.58 | 12 | 37.3 | +| Model | GPU | FPS | Train Epochs | Box AP | +|-----------|------------|------|--------------|--------| +| RetinaNet | BI-V100 x8 | 6.58 | 12 | 37.3 | -## Reference +## References - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) \ No newline at end of file diff --git a/cv/detection/retinanet/pytorch/README.md b/cv/detection/retinanet/pytorch/README.md index 47cfbd6f3becfce9abca191dd61a472bfc8cf624..0b0dd12f7d57c58d4030ebaa8643fa74a1cc5c64 100644 --- a/cv/detection/retinanet/pytorch/README.md +++ b/cv/detection/retinanet/pytorch/README.md @@ -1,22 +1,22 @@ # RetinaNet -## Model description +## Model Description -The paper proposes a method to convert a deep learning object detector into an equivalent spiking neural network. The aim is to provide a conversion framework that is not constrained to shallow network structures and classification problems as in state-of-the-art conversion libraries. The results show that models of higher complexity, such as the RetinaNet object detector, can be converted with limited loss in performance. +RetinaNet is a state-of-the-art object detection model that addresses the class imbalance problem in dense detection +through its novel Focal Loss. It uses a Feature Pyramid Network (FPN) backbone to detect objects at multiple scales +efficiently. RetinaNet achieves high accuracy while maintaining competitive speed, making it suitable for various +detection tasks. Its single-stage architecture combines the accuracy of two-stage detectors with the speed of +single-stage approaches, offering an excellent balance between performance and efficiency. -## Step 1: Installing packages +## Model Preparation -```shell - -pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' - -``` - -## Step 2: Preparing COCO dataset +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -37,20 +37,19 @@ coco2017 └── ... ``` -## Step 3: Training on COCO dataset +### Install Dependencies -Download the [COCO Dataset](https://cocodataset.org/#home) +```shell +pip3 install 'scipy' 'matplotlib' 'pycocotools' 'opencv-python' 'easydict' 'tqdm' +``` -### Multiple GPUs on one machine +## Model Training ```shell +# Multiple GPUs on one machine bash train_retinanet_r50_dist.sh --data-path /path/to/coco2017/ --dataset coco ``` -### Parameters - -Ref: [torchvision](../../torchvision/pytorch/README.md) - -## Reference +## References -https://github.com/pytorch/vision \ No newline at end of file +- [vision](https://github.com/pytorch/vision) diff --git a/cv/detection/rt-detr/pytorch/README.md b/cv/detection/rt-detr/pytorch/README.md index 86d59b70fce2c4f1dbb8cbd9419205c61d46c093..449d226d60efdf3ad56d1f301316c1b80d06aed7 100644 --- a/cv/detection/rt-detr/pytorch/README.md +++ b/cv/detection/rt-detr/pytorch/README.md @@ -1,16 +1,16 @@ # RT-DETR -## Model description +## Model Description -RT-DETR is specifically optimized for real-time applications, making it suitable for scenarios where low latency is crucial. It achieves this by incorporating design modifications that improve efficiency without sacrificing accuracy. +RT-DETR is a real-time variant of the DETR (DEtection TRansformer) model, optimized for efficient object detection. It +maintains the transformer-based architecture of DETR while introducing modifications to reduce latency and improve +speed. RT-DETR achieves competitive accuracy with significantly faster inference times, making it suitable for +applications requiring real-time performance. The model preserves the end-to-end detection capabilities of DETR while +addressing its computational challenges, offering a practical solution for time-sensitive detection tasks. -## Step 1: Installation +## Model Preparation -```bash -pip3 install -r requirements.txt -``` - -## Step 2: Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -27,34 +27,28 @@ Modify config "img_folder" and "ann_file" locaton in the configuration file(./co vim ./configs/dataset/coco_detection.yml ``` -## Step 3: Training - -### Training on a Single GPU +### Install Dependencies ```bash -# training on single-gpu -export CUDA_VISIBLE_DEVICES=0 -python3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml +pip3 install -r requirements.txt ``` -### Training on Multiple GPUs +## Model Training ```bash -# train on multi-gpu +# Training on single-gpu +export CUDA_VISIBLE_DEVICES=0 +python3 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml + +# Train on multi-gpu export CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -``` -### Evaluation on Multiple GPUs - -```bash -# val on multi-gpu +# Evaluation on multi-gpu export CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml -r path/to/checkpoint --test-only ``` -## Results - -## Reference +## References [RT-DERT](https://github.com/lyuwenyu/RT-DETR/tree/main/rtdetr_pytorch) diff --git a/cv/detection/rtmdet/pytorch/README.md b/cv/detection/rtmdet/pytorch/README.md index 1bfe803671f5695387ba864d00376391f3bec92e..f5feef5d09256f3a90beb4f8e30e922eede8c0e8 100644 --- a/cv/detection/rtmdet/pytorch/README.md +++ b/cv/detection/rtmdet/pytorch/README.md @@ -1,31 +1,16 @@ # RTMDet -> [RTMDet: An Empirical Study of Designing Real-Time Object Detectors](https://arxiv.org/pdf/2212.07784v2.pdf) +## Model Description - +RTMDet is a highly efficient real-time object detection model designed to surpass YOLO series performance. It features a +balanced architecture with large-kernel depth-wise convolutions and dynamic label assignment using soft labels. RTMDet +achieves state-of-the-art accuracy with exceptional speed, reaching 300+ FPS on modern GPUs. The model offers various +sizes for different applications and excels in tasks like instance segmentation and rotated object detection. Its design +provides insights for versatile real-time detection systems. -## Model description +## Model Preparation -In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks. - -## Step 1: Installation - -RTMDet model uses the MMDetection toolbox. Before you run this model, you need to set up MMDetection first. - -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` - -## Step 2: Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -49,7 +34,25 @@ coco2017 ├── val2017.txt └── ... ``` -## Step 3: Training + +### Install Dependencies + +RTMDet model uses the MMDetection toolbox. Before you run this model, you need to set up MMDetection first. + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . +``` + +## Model Training ```bash # Make soft link to dataset @@ -70,11 +73,13 @@ sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/rtmdet/rtmdet_tiny_8xb32-300e_coco.py 8 ``` -## Results +## Model Results + +| Model | GPU | FPS | box AP | +|--------|---------|-------|--------| +| RTMDet | BI-V100 | 172.5 | 0.4090 | -|GPUs|FPS|box AP| -|:---:|:---:|:---:| -|BI-V100|172.5|0.4090| +## References -## Reference -- [mmdetection](https://github.com/open-mmlab/mmdetection) \ No newline at end of file +- [Paper](https://arxiv.org/pdf/2212.07784v2.pdf) +- [mmdetection](https://github.com/open-mmlab/mmdetection) diff --git a/cv/detection/solov2/paddlepaddle/README.md b/cv/detection/solov2/paddlepaddle/README.md index 1c3241de078a2eb77fee0cd3fcfab7416b3c085f..2cda7326e8b96100234809bdd3d605532a5ec23b 100644 --- a/cv/detection/solov2/paddlepaddle/README.md +++ b/cv/detection/solov2/paddlepaddle/README.md @@ -1,47 +1,49 @@ # SOLOv2 -## Model description -SOLOv2 (Segmenting Objects by Locations) is a fast instance segmentation framework with strong performance. +## Model Description -## 克隆代码 +SOLOv2 is an efficient instance segmentation framework that directly segments objects by predicting instance masks based +on their spatial locations. It eliminates the need for bounding box detection and mask refinement, offering a simpler +and faster approach compared to traditional methods. SOLOv2 introduces dynamic convolutions and matrix non-maximum +suppression to improve mask quality and processing speed. The model achieves strong performance on instance segmentation +tasks while maintaining real-time capabilities, making it suitable for various computer vision applications. -``` +## Model Preparation + +### Prepare Resources + +```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## 安装PaddleDetection +### Install Dependencies -``` -cd PaddleDetection +```bash pip install -r requirements.txt python3 setup.py install ``` -## 下载COCO数据集 +## Model Training -``` -python3 dataset/coco/download_coco.py -``` - -## 运行代码 - -``` -# GPU多卡训练 +```bash +# Multi GPU export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --eval -# GPU单卡训练 +# Single GPU export CUDA_VISIBLE_DEVICES=0 - python3 tools/train.py -c configs/solov2/solov2_r50_fpn_1x_coco.yml --eval -# 注:默认学习率是适配多GPU训练(8x GPU),若使用单GPU训练,须对应调整config中的学习率(例如,除以8) - +# Note: The default learning rate is optimized for multi-GPU training (8x GPU). If using single GPU training, +# you need to adjust the learning rate in the config accordingly (e.g., divide by 8). ``` -## Results on BI-V100 +## Model Results -| GPUs | FPS | Train Epochs | mAP | -|------|-----|--------------|------| -| 1x8 | 6.39 | 12 | 35.4 | \ No newline at end of file +| Model | GPU | FPS | Train Epochs | mAP | +|--------|------------|------|--------------|------| +| SOLOv2 | BI-V100 x8 | 6.39 | 12 | 35.4 | diff --git a/cv/detection/ssd/mindspore/README.md b/cv/detection/ssd/mindspore/README.md index 92eb19e788f86f05c9995914af1b092c01d335f4..f114071d85f0e3551b03c7167d81645821a6b3c8 100755 --- a/cv/detection/ssd/mindspore/README.md +++ b/cv/detection/ssd/mindspore/README.md @@ -1,75 +1,90 @@ # SSD -## Model description -SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape.Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. -[Paper](https://arxiv.org/abs/1512.02325): Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg.European Conference on Computer Vision (ECCV), 2016 (In press). -## Step 1: Installing -``` -pip3 install -r requirements.txt -pip3 install easydict -``` -## Step 2: Prepare Datasets -download dataset in /home/datasets/cv/coco2017 +## Model Description + +SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and +class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across +multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle +objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks. + +## Model Preparation -Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below. +### Prepare Resources + +Download dataset in /home/datasets/cv/coco2017 + +Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant +domain/network architecture. In the following sections, we will introduce how to run the scripts using the related +dataset below. Dataset used: [COCO2017]() - Dataset size:19G - - Train:18G,118000 images - - Val:1G,5000 images - - Annotations:241M,instances,captions,person_keypoints etc + - Train:18G,118000 images + - Val:1G,5000 images + - Annotations:241M,instances,captions,person_keypoints etc - Data format:image and json files - - Note:Data will be processed in dataset.py - - Change the `coco_root` and other settings you need in `src/config.py`. The directory structure is as follows: - - ```shell - . - └─coco_dataset - ├─annotations - ├─instance_train2017.json - └─instance_val2017.json - ├─val2017 - └─train2017 - ``` - If your own dataset is used. **Select dataset to other when run script.** - Organize the dataset information into a TXT file, each row in the file is as follows: - - ```shell - train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2 - ``` - - Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `image_dir`(dataset directory) and the relative path in `anno_path`(the TXT file path), `image_dir` and `anno_path` are setting in `src/config.py`. -# [Pretrained models](#contents) -Please [resnet50](https://pan.baidu.com/s/1rrhsZqDVmNxR-bCnMPvFIw?pwd=8766) download resnet50.ckpt here + - Note:Data will be processed in dataset.py + +Change the `coco_root` and other settings you need in `src/config.py`. The directory structure is as follows: + +```bash +. +└─coco_dataset + ├─annotations + ├─instance_train2017.json + └─instance_val2017.json + ├─val2017 + └─train2017 +``` + +If your own dataset is used. **Select dataset to other when run script.** + Organize the dataset information into a TXT file, each row in the file is as follows: + +```bash +train2017/0000001.jpg 0,259,401,459,7 35,28,324,201,2 0,30,59,80,2 ``` + +Each row is an image annotation which split by space, the first column is a relative path of image, the others are box +and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the +`image_dir`(dataset directory) and the relative path in `anno_path`(the TXT file path), `image_dir` and `anno_path` are +setting in `src/config.py`. + +Download [resnet50.ckpt](https://pan.baidu.com/s/1rrhsZqDVmNxR-bCnMPvFIw?pwd=8766). + +```bash mv resnet50.ckpt ./ckpt ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +pip3 install easydict ``` + +## Model Training + +```bash mpirun -allow-run-as-root -n 8 --output-filename log_output --merge-stderr-to-stdout \ -python3 train.py \ ---run_distribute=True \ ---lr=0.05 \ ---dataset=coco \ ---device_num=8 \ ---loss_scale=1 \ ---device_target="GPU" \ ---epoch_size=60 \ ---config_path=./config/ssd_resnet50_fpn_config_gpu.yaml \ ---output_path './output' > log.txt 2>&1 & +python3 train.py --run_distribute=True \ + --lr=0.05 \ + --dataset=coco \ + --device_num=8 \ + --loss_scale=1 \ + --device_target="GPU" \ + --epoch_size=60 \ + --config_path=./config/ssd_resnet50_fpn_config_gpu.yaml \ + --output_path './output' > log.txt 2>&1 & ``` -### [Evaluation result] -## Results on BI-V100 -| GPUs | per step time | MAP | -|------|-------------- |-------| -| 1*8 | 0.814s | 0.374 | +## Model Results on BI-V100 + +| Model | GPU | per step time | MAP | +|-------|-------------|---------------|-------| +| SSD | BI-V100 x8 | 0.814s | 0.374 | +| SSD | NV-V100s x8 | 0.797s | 0.369 | -## Results on NV-V100s +## References -| GPUs | per step time | MAP | -|------|-------------- |-------| -| 1*8 | 0.797s | 0.369 | \ No newline at end of file +- [Paper](https://arxiv.org/abs/1512.02325) diff --git a/cv/detection/ssd/paddlepaddle/README.md b/cv/detection/ssd/paddlepaddle/README.md index 169991f7f7d708533b83aba2087287411789d57b..ce64580a5a1a794e50e04831b330707f4d1a8e16 100644 --- a/cv/detection/ssd/paddlepaddle/README.md +++ b/cv/detection/ssd/paddlepaddle/README.md @@ -1,28 +1,36 @@ # SSD -## Model description +## Model Description -We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd . +SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and +class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across +multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle +objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks. -## Step 1: Installing +## Model Preparation + +### Prepare Resources ```bash git clone https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip3 install -r requirements.txt + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## Step 2: Prepare Datasets +### Install Dependencies ```bash -python3 dataset/coco/download_coco.py +pip install -r requirements.txt +python3 setup.py install ``` -## Step 3: Training +## Model Training Notice: modify configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml file, modify the datasets path as yours. -``` +```bash # Train export FLAGS_cudnn_exhaustive_search=True export FLAGS_cudnn_batchnorm_spatial_persistent=True @@ -30,17 +38,12 @@ export CUDA_VISIBLE_DEVICES=0,1 python3 -u -m paddle.distributed.launch --gpus 0,1 tools/train.py -c configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml --eval ``` -## Results on BI-V100 - -
- - -| GPU | FP32 | -| --------- | ----------------------------------- | -| 2 cards | bbox=73.62,FPS=45.49,BatchSize=32 | +## Model Results -
+| Model | GPU | FP32 | +|-------|------------|-----------------------------------| +| SSD | BI-V100 x2 | bbox=73.62,FPS=45.49,BatchSize=32 | -## Reference +## References - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) diff --git a/cv/detection/ssd/pytorch/README.md b/cv/detection/ssd/pytorch/README.md index cd20364912eeed655a12d9c57e941a548429bc80..7535f4023424563748662115559465df3f797396 100644 --- a/cv/detection/ssd/pytorch/README.md +++ b/cv/detection/ssd/pytorch/README.md @@ -1,16 +1,21 @@ # SSD -## Model description +## Model Description -We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd . +SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and +class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across +multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle +objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks. -## Prepare +## Model Preparation -### Download dataset +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -37,33 +42,29 @@ cd /home/data/perf/ssd ln -s /path/to/coco/ /home/data/perf/ssd ``` -### Download backbone +Download backbone. ```bash cd /home/data/perf/ssd wget https://download.pytorch.org/models/resnet34-333f7ec4.pth ``` - -### Training - -### Multiple GPUs on one machine +## Model Training ```bash -## 'deepsparkhub_root_path' is the root path of deepsparkhub. +# Multiple GPUs on one machine cd {deepsparkhub_root_path}/cv/detection/ssd/pytorch/base source ../iluvatar/config/environment_variables.sh python3 prepare.py --name iluvatar --data_dir /home/data/perf/ssd bash run_training.sh --name iluvatar --config V100x1x8 --data_dir /home/data/perf/ssd --backbone_path /home/data/perf/ssd/resnet34-333f7ec4.pth ``` -## Results on BI-V100 - -| GPUs | Batch Size | FPS | Train Epochs | mAP | -|------|------------|-----|--------------|------| -| 1x8 | 192 | 2858 | 65 | 0.23 | +## Model Results +| Model | GPU | Batch Size | FPS | Train Epochs | mAP | +|-------|------------|------------|------|--------------|------| +| SSD | BI-V100 x8 | 192 | 2858 | 65 | 0.23 | +## References -## Reference -https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/ssd/implementations/pytorch +- [mlcommons](https://github.com/mlcommons/training_results_v0.7/tree/master/NVIDIA/benchmarks/ssd/implementations/pytorch) diff --git a/cv/detection/ssd/tensorflow/README.md b/cv/detection/ssd/tensorflow/README.md index a36dbc7ce8f540acabe86677661c229035c78364..fe9ff61da37bbd3cc7c5ce7fcecf49f3c7354089 100644 --- a/cv/detection/ssd/tensorflow/README.md +++ b/cv/detection/ssd/tensorflow/README.md @@ -1,50 +1,57 @@ # SSD -## Model description - -We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300x300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500x500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd . - -## Prepare - -### Download the VOC dataset -``` -cd dataset -``` -Download[ Pascal VOC Dataset](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) and reorganize the directory as follows: -``` -VOCROOT/ - |->VOC2007/ - | |->Annotations/ - | |->ImageSets/ - | |->... - |->VOC2012/ # use it - | |->Annotations/ - | |->ImageSets/ - | |->... - |->VOC2007TEST/ - | |->Annotations/ - | |->... +## Model Description + +SSD (Single Shot MultiBox Detector) is a fast and efficient object detection model that predicts bounding boxes and +class scores in a single forward pass. It uses a set of default boxes at different scales and aspect ratios across +multiple feature maps to detect objects of various sizes. SSD combines predictions from different layers to handle +objects at different resolutions, offering a good balance between speed and accuracy for real-time detection tasks. + +## Model Preparation + +### Prepare Resources + +Download [Pascal VOC Dataset](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) and reorganize the +directory as follows: + +```bash +dataset/VOCROOT/ + |->VOC2007/ + | |->Annotations/ + | |->ImageSets/ + | |->... + |->VOC2012/ # use it + | |->Annotations/ + | |->ImageSets/ + | |->... + |->VOC2007TEST/ + | |->Annotations/ + | |->... ``` + VOCROOT is your path of the Pascal VOC Dataset. -``` + +```bash mkdir tfrecords pip3 install tf_slim python3 convert_voc_sample_tfrecords.py --dataset_directory=./ --output_directory=tfrecords --train_splits VOC2012_sample --validation_splits VOC2012_sample -cd .. +cd ../ ``` -### Download the checkpoint -Download the pre-trained VGG-16 model (reduced-fc) from [here](https://drive.google.com/drive/folders/184srhbt8_uvLKeWW_Yo8Mc5wTyc0lJT7) and put them into one sub-directory named 'model' (we support SaverDef.V2 by default, the V1 version is also available for sake of compatibility). -### Train -#### multi gpus -``` -python3 train_ssd.py --batch_size 16 -```` +Download the pre-trained VGG-16 model (reduced-fc) from +[Google Drive](https://drive.google.com/drive/folders/184srhbt8_uvLKeWW_Yo8Mc5wTyc0lJT7) and put them into one sub-directory +named 'model' (we support SaverDef.V2 by default, the V1 version is also available for sake of compatibility). +## Model Training + +```bash +# multi gpus +python3 train_ssd.py --batch_size 16 +``` -## Result +## Model Results -| | acc | fps | -| --- | --- | --- | -| multi_card | 0.783513 | 3.177 | +| Model | GPU | acc | fps | +|-------|---------|----------|-------| +| SSD | BI-V100 | 0.783513 | 3.177 | diff --git a/cv/detection/ssd/tensorflow/readme_origin.md b/cv/detection/ssd/tensorflow/readme_origin.md index f2b3a20a623d1e3f80373651868edb350eadb4d4..265f025aa251f74dd4aafa6a403802e08e521032 100644 --- a/cv/detection/ssd/tensorflow/readme_origin.md +++ b/cv/detection/ssd/tensorflow/readme_origin.md @@ -65,7 +65,7 @@ All the codes was tested under TensorFlow 1.6, Python 3.5, Ubuntu 16.04 with CUD ***This repo is just created recently, any contribution will be welcomed.*** -## Results (VOC07 Metric) +## Model Results (VOC07 Metric) This implementation(SSD300-VGG16) yield **mAP 77.8%** on PASCAL VOC 2007 test dataset(the original performance described in the paper is 77.2%mAP), the details are as follows: diff --git a/cv/detection/yolof/pytorch/README.md b/cv/detection/yolof/pytorch/README.md index 83149b8c1fe033a3dd34a2784ac5a33abec79de9..f71b4de5d8eb095af2e25ded235134173306aebf 100755 --- a/cv/detection/yolof/pytorch/README.md +++ b/cv/detection/yolof/pytorch/README.md @@ -1,29 +1,22 @@ -# YOLOF(You Only Look One-level Feature) +# YOLOF -## Model description +## Model Description -This paper revisits feature pyramids networks (FPN) for one-stage detectors and points out that the success of FPN is due to its divide-and-conquer solution to the optimization problem in object detection rather than multi-scale feature fusion. From the perspective of optimization, we introduce an alternative way to address the problem instead of adopting the complex feature pyramids - {\em utilizing only one-level feature for detection}. Based on the simple and efficient solution, we present You Only Look One-level Feature (YOLOF). In our method, two key components, Dilated Encoder and Uniform Matching, are proposed and bring considerable improvements. Extensive experiments on the COCO benchmark prove the effectiveness of the proposed model. Our YOLOF achieves comparable results with its feature pyramids counterpart RetinaNet while being 2.5x faster. Without transformer layers, YOLOF can match the performance of DETR in a single-level feature manner with 7x less training epochs. With an image size of 608x608, YOLOF achieves 44.3 mAP running at 60 fps on 2080Ti, which is 13% faster than YOLOv4. Code is available at \url{https://github.com/megvii-model/YOLOF}. +YOLOF (You Only Look One-level Feature) is an efficient object detection model that challenges the necessity of feature +pyramids. It demonstrates that using a single-level feature with proper optimization can achieve comparable results to +multi-level approaches. YOLOF introduces two key components: Dilated Encoder for capturing multi-scale context and +Uniform Matching for balanced positive samples. The model achieves competitive accuracy with RetinaNet while being 2.5x +faster, making it suitable for real-time detection tasks. -## Step 1: Installing packages +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . -``` +### Prepare Resources -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -47,29 +40,41 @@ coco2017 ```bash mkdir -p data ln -s /path/to/coco2017 data/coco +``` -# Prepare resnet50_caffe-788b5fa3.pth, skip this if fast network +Prepare resnet50_caffe-788b5fa3.pth, skip this if fast network + +```bash mkdir -p /root/.cache/torch/hub/checkpoints/ wget -O /root/.cache/torch/hub/checkpoints/resnet50_caffe-788b5fa3.pth https://download.openmmlab.com/pretrain/third_party/resnet50_caffe-788b5fa3.pth ``` -## Step 3: Training - -#### Training on a single GPU +### Install Dependencies ```bash -python3 tools/train.py configs/yolof/yolof_r50-c5_8xb8-1x_coco.py +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . ``` -#### Training on multiple GPUs +## Model Training ```bash -sed -i 's/python /python3 /g' tools/dist_train.sh +# Training on a single GPU +python3 tools/train.py configs/yolof/yolof_r50-c5_8xb8-1x_coco.py -# Multiple GPUs on one machine +# Training on multiple GPUs +sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/yolof/yolof_r50-c5_8xb8-1x_coco.py 8 ``` -## Reference +## References -- [mmdetection](https://github.com/open-mmlab/mmdetection) \ No newline at end of file +- [mmdetection](https://github.com/open-mmlab/mmdetection) diff --git a/cv/detection/yolov10/pytorch/README.md b/cv/detection/yolov10/pytorch/README.md index 81aa174bc8a250036c1497820e5a9b0ca3ffa6b2..4392fa5f32e19ee78ed667c625521e9950b97c01 100644 --- a/cv/detection/yolov10/pytorch/README.md +++ b/cv/detection/yolov10/pytorch/README.md @@ -1,23 +1,22 @@ # YOLOv10 -## Model description +## Model Description -YOLOv10, built on the Ultralytics Python package by researchers at Tsinghua University, introduces a new approach to real-time object detection, addressing both the post-processing and model architecture deficiencies found in previous YOLO versions. By eliminating non-maximum suppression (NMS) and optimizing various model components, YOLOv10 achieves state-of-the-art performance with significantly reduced computational overhead. Extensive experiments demonstrate its superior accuracy-latency trade-offs across multiple model scales. +YOLOv10 is a cutting-edge object detection model developed by Tsinghua University researchers. It eliminates the need +for non-maximum suppression (NMS) while optimizing model architecture for enhanced efficiency. YOLOv10 achieves +state-of-the-art performance with reduced computational overhead, offering superior accuracy-latency trade-offs across +various model scales. Built on the Ultralytics framework, it addresses limitations of previous YOLO versions, making it +ideal for real-time applications requiring fast and accurate object detection in diverse scenarios. -## Step 1: Installation +## Model Preparation -```bash -# CentOS -yum install -y mesa-libGL -# Ubuntu -apt install -y libgl1-mesa-glx -``` +### Prepare Resources -## Step 2: Preparing datasets +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -44,9 +43,14 @@ mkdir -p datasets/ ln -s /PATH/TO/COCO ./datasets/coco ``` -## Step 3: Training +### Install Dependencies ```bash +# CentOS +yum install -y mesa-libGL +# Ubuntu +apt install -y libgl1-mesa-glx + # get yolov10 code git clone https://github.com/THU-MIG/yolov10.git cd yolov10 @@ -54,12 +58,13 @@ sed -i 's/^torch/# torch/g' requirements.txt pip install -r requirements.txt ``` -### Multiple GPU training +## Model Training ```bash +# Multiple GPU training yolo detect train data=coco.yaml model=yolov10n.yaml epochs=500 batch=256 imgsz=640 device=0,1,2,3,4,5,6,7 ``` -## Reference +## References -[YOLOv10](https://github.com/THU-MIG/yolov10) +- [YOLOv10](https://github.com/THU-MIG/yolov10) diff --git a/cv/detection/yolov3/paddlepaddle/README.md b/cv/detection/yolov3/paddlepaddle/README.md index 669932c19cd059f0c26880b514659c27fd5b562b..5c70713a60e821cef3b19ca0474f1190062e6970 100644 --- a/cv/detection/yolov3/paddlepaddle/README.md +++ b/cv/detection/yolov3/paddlepaddle/README.md @@ -1,26 +1,33 @@ # YOLOv3 -## Model description +## Model Description -We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/. +YOLOv3 is a real-time object detection model that builds upon its predecessors with improved accuracy while maintaining +speed. It uses a deeper backbone network and multi-scale predictions to detect objects of various sizes. YOLOv3 achieves +competitive performance with faster inference times compared to other detectors. It processes images in a single forward +pass, making it efficient for real-time applications. The model balances speed and accuracy, making it popular for +practical detection tasks. -## Step 1: Installing +## Model Preparation + +### Prepare Resources ```bash -git clone --recursive https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip3 install -r requirements.txt +git clone https://github.com/PaddlePaddle/PaddleDetection.git + +cd PaddleDetection/ +# Get COCO Dataset +python3 dataset/coco/download_coco.py ``` -## Step 2: Download data +### Install Dependencies ```bash -python3 dataset/coco/download_coco.py - -ls dataset/coco/ +pip install -r requirements.txt +python3 setup.py install ``` -## Step 3: Run YOLOv3 +## Model Training ```bash # Make sure your dataset path is the same as above. diff --git a/cv/detection/yolov3/pytorch/README.md b/cv/detection/yolov3/pytorch/README.md index 9a7cf4e1876b54301c2145a82ef56e4eaae46d79..7cdc40ca400233a9232c3b8df9386ed24468772b 100755 --- a/cv/detection/yolov3/pytorch/README.md +++ b/cv/detection/yolov3/pytorch/README.md @@ -1,19 +1,16 @@ # YOLOv3 -## Model description +## Model Description -We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at . +YOLOv3 is a real-time object detection model that builds upon its predecessors with improved accuracy while maintaining +speed. It uses a deeper backbone network and multi-scale predictions to detect objects of various sizes. YOLOv3 achieves +competitive performance with faster inference times compared to other detectors. It processes images in a single forward +pass, making it efficient for real-time applications. The model balances speed and accuracy, making it popular for +practical detection tasks. -## Step 1: Installing packages +## Model Preparation -```bash -## clone yolov3 and install -git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git -cd deepsparkhub-GPL/cv/detection/yolov3/pytorch/ -bash setup.sh -``` - -## Step 2: Preparing data +### Prepare Resources ```bash bash weights/download_weights.sh @@ -23,20 +20,25 @@ bash weights/download_weights.sh ./data/get_coco_dataset.sh ``` -## Step 3: Training - -### On single GPU +### Install Dependencies ```bash -bash run_training.sh +## clone yolov3 and install +git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git +cd deepsparkhub-GPL/cv/detection/yolov3/pytorch/ +bash setup.sh ``` -### Multiple GPUs on one machine +## Model Training ```bash +# On single GPU +bash run_training.sh + +# Multiple GPUs on one machine bash run_dist_training.sh ``` -## Reference +## References - [YOLOv3](https://github.com/eriklindernoren/PyTorch-YOLOv3) diff --git a/cv/detection/yolov3/tensorflow/README.md b/cv/detection/yolov3/tensorflow/README.md index 4f1618d0c2d991d991e506213bbd64528693c4b6..92d9619c354152cf111ba75fa8626796c560a1a5 100644 --- a/cv/detection/yolov3/tensorflow/README.md +++ b/cv/detection/yolov3/tensorflow/README.md @@ -1,27 +1,28 @@ # YOLOv3 -## Model description +## Model Description -We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https://pjreddie.com/yolo/. +YOLOv3 is a real-time object detection model that builds upon its predecessors with improved accuracy while maintaining +speed. It uses a deeper backbone network and multi-scale predictions to detect objects of various sizes. YOLOv3 achieves +competitive performance with faster inference times compared to other detectors. It processes images in a single forward +pass, making it efficient for real-time applications. The model balances speed and accuracy, making it popular for +practical detection tasks. -## Prepare +## Model Preparation -``` -bash init_tf.sh -``` +### Prepare Resources -## Download dataset and checkpoint +Download VOC PASCAL trainval and test data. -### Download VOC PASCAL trainval and test data - -``` +```bash wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar ``` + Extract all of these tars into one directory and rename them, which should have the following basic structure. -``` +```bash VOC # path: /home/yang/dataset/VOC ├── test | └──VOCdevkit @@ -31,19 +32,25 @@ VOC # path: /home/yang/dataset/VOC └──VOC2007 (from VOCtrainval_06-Nov-2007.tar) └──VOC2012 (from VOCtrainval_11-May-2012.tar) ``` -### Download checkpoint -Exporting loaded COCO weights as TF checkpoint(yolov3_coco.ckpt)[BaiduCloud](https://pan.baidu.com/s/11mwiUy8KotjUVQXqkGGPFQ&shfl=sharepset#list/path=%2F) +Download checkpoint. +Exporting loaded COCO weights as TF checkpoint(yolov3_coco.ckpt)[BaiduCloud](https://pan.baidu.com/s/11mwiUy8KotjUVQXqkGGPFQ&shfl=sharepset#list/path=%2F) +### Install Dependencies -## Run training +```bash +bash init_tf.sh ``` + +## Model Training + +```bash bash ./run_training.sh ``` -## Result -| | mAP | fps | -| --- | --- | --- | -| multi_card | 33.67% | 4.34it/s | +## Model Results +| Model | GPU | mAP | fps | +|--------|---------|--------|----------| +| YOLOv3 | BI-V100 | 33.67% | 4.34it/s | diff --git a/cv/detection/yolov5/paddlepaddle/README.md b/cv/detection/yolov5/paddlepaddle/README.md index 7baea336cc0ea81c1df3759cf5499dc49f7c5f4b..4e2b338ffc23b77ab445da2348d6e574008c9f62 100644 --- a/cv/detection/yolov5/paddlepaddle/README.md +++ b/cv/detection/yolov5/paddlepaddle/README.md @@ -1,26 +1,32 @@ # YOLOv5 -## Model description +## Model Description -YOLOv5 is the world's most loved vision AI, representing Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development. +YOLOv5 is a state-of-the-art object detection model that builds upon the YOLO architecture, offering improved speed and +accuracy. It features a streamlined design with enhanced data augmentation and anchor box strategies. YOLOv5 supports +multiple model sizes (n/s/m/l/x) for different performance needs. The model is known for its ease of use, fast training, +and efficient inference, making it popular for real-time detection tasks across various applications. -## Step 1: Installation +## Model Preparation + +### Prepare Resources ```bash cd deepsparkhub/cv/detection/yolov5/paddlepaddle/ git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleYOLO.git cd PaddleYOLO/ -pip3 install -r requirements.txt -python3 setup.py develop + +python3 dataset/coco/download_coco.py ``` -## Step 2: Preparing datasets +### Install Dependencies ```bash -python3 dataset/coco/download_coco.py +pip3 install -r requirements.txt +python3 setup.py develop ``` -## Step 3: Training +## Model Training ```bash # Make sure your dataset path is the same as above @@ -30,12 +36,12 @@ config_yaml=configs/yolov5/yolov5_s_300e_coco.yml python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config_yaml} --amp --eval ``` -## Results +## Model Results -GPUs|Model|FPS|ACC -----|---|---|--- -BI-V100 x8|YOLOv5-n| 10.9788 images/s | bbox ap: 0.259 +| Model | GPU | FPS | ACC | +|----------|------------|------------------|----------------| +| YOLOv5-n | BI-V100 x8 | 10.9788 images/s | bbox ap: 0.259 | -## Reference +## References - [PaddleYOLO](https://github.com/PaddlePaddle/PaddleYOLO) diff --git a/cv/detection/yolov5/pytorch/README.md b/cv/detection/yolov5/pytorch/README.md index b202d415aac24061f952392a1ee3fa1fa1eb3d34..d445942503206fdf1587b163a4cbec7b945af25c 100644 --- a/cv/detection/yolov5/pytorch/README.md +++ b/cv/detection/yolov5/pytorch/README.md @@ -1,21 +1,21 @@ # YOLOv5 -YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development. +## Model Description -## Step 1: Installing packages +YOLOv5 is a state-of-the-art object detection model that builds upon the YOLO architecture, offering improved speed and +accuracy. It features a streamlined design with enhanced data augmentation and anchor box strategies. YOLOv5 supports +multiple model sizes (n/s/m/l/x) for different performance needs. The model is known for its ease of use, fast training, +and efficient inference, making it popular for real-time detection tasks across various applications. -```bash -## clone yolov5 and install -git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git -cd deepsparkhub-GPL/cv/detection/yolov5/pytorch/ -bash init.sh -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -36,6 +36,15 @@ coco2017 └── ... ``` +### Install Dependencies + +```bash +## clone yolov5 and install +git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git +cd deepsparkhub-GPL/cv/detection/yolov5/pytorch/ +bash init.sh +``` + Modify the configuration file(data/coco.yaml) ```bash @@ -45,27 +54,19 @@ vim data/coco.yaml # val: the relative path of valid images ``` -## Training the detector +## Model Training Train the yolov5 model as follows, the train log is saved in ./runs/train/exp -### On single GPU - ```bash +# On single GPU python3 train.py --data ./data/coco.yaml --batch-size 32 --cfg ./models/yolov5s.yaml --weights '' -``` -### On single GPU (AMP) - -```bash +# On single GPU (AMP) python3 train.py --data ./data/coco.yaml --batch-size 32 --cfg ./models/yolov5s.yaml --weights '' --amp -``` -### Multiple GPUs on one machine - -```bash -# eight cards -# YOLOv5s +# Multiple GPUs on one machine +## YOLOv5s python3 -m torch.distributed.launch --nproc_per_node 8 \ train.py \ --data ./data/coco.yaml \ @@ -73,14 +74,11 @@ python3 -m torch.distributed.launch --nproc_per_node 8 \ --cfg ./models/yolov5s.yaml --weights '' \ --device 0,1,2,3,4,5,6,7 -# YOLOv5m +## YOLOv5m bash run.sh -``` -### Multiple GPUs on one machine (AMP) - -```bash -# eight cards +# Multiple GPUs on one machine (AMP) +## eight cards python3 -m torch.distributed.launch --nproc_per_node 8 \ train.py \ --data ./data/coco.yaml \ @@ -89,9 +87,7 @@ python3 -m torch.distributed.launch --nproc_per_node 8 \ --device 0,1,2,3,4,5,6,7 --amp ``` -## Test the detector - -Test the yolov5 model as follows, the result is saved in ./runs/detect: +Test the YOLOv5 model as follows, the results are saved in ./runs/detect. ```bash python3 detect.py --source ./data/images/bus.jpg --weights yolov5s.pt --img 640 @@ -99,18 +95,17 @@ python3 detect.py --source ./data/images/bus.jpg --weights yolov5s.pt --img 640 python3 detect.py --source ./data/images/zidane.jpg --weights yolov5s.pt --img 640 ``` -## Results on BI-V100 +## Model Results -| GPUs | FP16 | Batch size | FPS | E2E | mAP@.5 | -| ---- | ---- | ---------- | --- | --- | ------ | -| 1x1 | True | 64 | 81 | N/A | N/A | -| 1x8 | True | 64 | 598 | 24h | 0.632 | +| GPU | FP16 | Batch size | FPS | E2E | mAP@.5 | +|------------|------|------------|-----|-----|--------| +| BI-V100 x8 | True | 64 | 598 | 24h | 0.632 | | Convergence criteria | Configuration (x denotes number of GPUs) | Performance | Accuracy | Power(W) | Scalability | Memory utilization(G) | Stability | | -------------------- | ---------------------------------------- | ----------- | -------- | ---------- | ----------- | ----------------------- | --------- | | mAP:0.5 | SDK V2.2, bs:128, 8x, AMP | 1228 | 0.56 | 140\*8 | 0.92 | 27.3\*8 | 1 | -## Reference +## References - [YOLOv5](https://github.com/ultralytics/yolov5) diff --git a/cv/detection/yolov6/pytorch/README.md b/cv/detection/yolov6/pytorch/README.md index 6c6f4e951939c67a2c5a36ad64696184ae65a9b9..8199002884970fcb1ab413f76eacbe0fb29f90f9 100644 --- a/cv/detection/yolov6/pytorch/README.md +++ b/cv/detection/yolov6/pytorch/README.md @@ -1,37 +1,20 @@ # YOLOv6 -## Model description +## Model Description -For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. -Considering the diverse requirements for speed and accuracy in the real environment, we extensively examine the up-to-date object detection advancements either from industry or academia. Specifically, we heavily assimilate ideas from recent network design, training strategies, testing techniques, quantization, and optimization methods. On top of this, we integrate our thoughts and practice to build a suite of deployment-ready networks at various scales to accommodate diversified use cases. With the generous permission of YOLO authors, we name it YOLOv6. We also express our warm welcome to users and contributors for further enhancement. For a glimpse of performance, our YOLOv6-N hits 35.9% AP on the COCO dataset at a throughput of 1234 FPS on an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495 FPS, outperforming other mainstream detectors at the same scale~(YOLOv5-S, YOLOX-S, and PPYOLOE-S). Our quantized version of YOLOv6-S even brings a new state-of-the-art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also achieves better accuracy performance (i.e., 49.5%/52.3%) than other detectors with a similar inference speed. We carefully conducted experiments to validate the effectiveness of each component. -Implementation of paper: +YOLOv6 is an industrial-grade object detection model that pushes the boundaries of the YOLO series. It incorporates +advanced network design, training strategies, and optimization techniques to achieve state-of-the-art performance. +YOLOv6 offers multiple model sizes for various speed-accuracy trade-offs, excelling in both accuracy and inference +speed. It introduces innovative quantization methods for efficient deployment. The model demonstrates superior +performance compared to other YOLO variants, making it suitable for diverse real-world applications requiring fast and +accurate object detection. -- [YOLOv6 v3.0: A Full-Scale Reloading](https://arxiv.org/abs/2301.05586) 🔥 -- [YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications](https://arxiv.org/abs/2209.02976) +## Model Preparation -## Installing packages +### Prepare Resources -```bash -## install libGL -yum install mesa-libGL - -## install zlib -wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz -tar xvf zlib-1.2.9.tar.gz -cd zlib-1.2.9/ -./configure && make install -cd .. -rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/ - -## clone yolov6 -git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git -cd deepsparkhub-GPL/cv/detection/yolov6/pytorch/ -pip3 install -r requirements.txt -``` - -## Preparing datasets - -- data: prepare dataset and specify dataset paths in data.yaml ( [COCO](http://cocodataset.org), [YOLO format coco labels](https://github.com/meituan/YOLOv6/releases/download/0.1.0/coco2017labels.zip) ) +- data: prepare dataset and specify dataset paths in data.yaml ( [COCO](http://cocodataset.org), [YOLO format coco + labels](https://github.com/meituan/YOLOv6/releases/download/0.1.0/coco2017labels.zip) ) - make sure your dataset structure as follows: ```bash @@ -49,28 +32,45 @@ pip3 install -r requirements.txt │ ├── README.txt ``` -## Training +### Install Dependencies -> After training, reporting "AttributeError: 'NoneType' object has no attribute 'python_exit_status'" is a [known issue](https://github.com/meituan/YOLOv6/issues/506), add "--workers 0" if you want to avoid. +```bash +## install libGL +yum install mesa-libGL -Single gpu train +## install zlib +wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz +tar xvf zlib-1.2.9.tar.gz +cd zlib-1.2.9/ +./configure && make install +cd .. +rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/ -```bash -python3 tools/train.py --batch 32 --conf configs/yolov6s.py --data data/coco.yaml --epoch 300 --name yolov6s_coco +## clone yolov6 +git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git +cd deepsparkhub-GPL/cv/detection/yolov6/pytorch/ +pip3 install -r requirements.txt ``` -Multiple gpu train +## Model Training + +> After training, reporting "AttributeError: 'NoneType' object has no attribute 'python_exit_status'" is a [known +> issue](https://github.com/meituan/YOLOv6/issues/506), add "--workers 0" if you want to avoid. ```bash +# Single gpu training +python3 tools/train.py --batch 32 --conf configs/yolov6s.py --data data/coco.yaml --epoch 300 --name yolov6s_coco + +# Multiple gpu training python3 -m torch.distributed.launch --nproc_per_node 8 tools/train.py --batch 256 --conf configs/yolov6s.py --data data/coco.yaml --epoch 300 --name yolov6s_coco --device 0,1,2,3,4,5,6,7 ``` -## Training Results +## Model Results | Model | Size | mAPval
0.5:0.95 | mAPval
0.5 | -| :------- | ---- | :----------------------- | ------------------- | +|----------|------|--------------------------|---------------------| | YOLOv6-S | 640 | 44.3 | 61.3 | -## Reference +## References - [YOLOv6](https://github.com/meituan/YOLOv6) diff --git a/cv/detection/yolov7/pytorch/README.md b/cv/detection/yolov7/pytorch/README.md index 0457b7fdc02cae1c91039915e6bdddfa98f24b97..4685705435b4adaed578ad98fddefd308dd87a7d 100644 --- a/cv/detection/yolov7/pytorch/README.md +++ b/cv/detection/yolov7/pytorch/README.md @@ -1,23 +1,28 @@ # YOLOv7 -## Model description +## Model Description -Implementation of paper - [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696) +YOLOv7 is a state-of-the-art real-time object detection model that introduces innovative trainable bag-of-freebies +techniques. It achieves superior accuracy and speed compared to previous YOLO versions and other detectors. YOLOv7 +optimizes model architecture, training strategies, and inference efficiency without increasing computational costs. The +model supports various scales for different performance needs and demonstrates exceptional results on COCO benchmarks. +Its efficient design makes it suitable for real-world applications requiring fast and accurate object detection. -## Step 1: Installing packages +## Model Preparation + +### Prepare Resources ```bash -## clone yolov7 and install +# clone yolov7 git clone https://gitee.com/deep-spark/deepsparkhub-GPL.git cd deepsparkhub-GPL/cv/detection/yolov7/pytorch/ -pip3 install -r requirements.txt ``` -## Step 2: Preparing datasets - -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -68,50 +73,55 @@ The datasets format as follows: ``` -## Step 3: Training - -Train the yolov7 model as follows, the train log is saved in ./runs/train/exp - -### Single GPU training +### Install Dependencies ```bash -python3 train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml +pip3 install -r requirements.txt ``` -### Multiple GPU training +## Model Training + +Train the yolov7 model as follows, the train log is saved in ./runs/train/exp. ```bash +# Single GPU training +python3 train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml + +# Multiple GPU training python3 -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch-size 64 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml ``` -## Transfer learning +Transfer learning. -[`yolov7_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt) [`yolov7x_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x_training.pt) [`yolov7-w6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6_training.pt) [`yolov7-e6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6_training.pt) [`yolov7-d6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6_training.pt) [`yolov7-e6e_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e_training.pt) +- [`yolov7_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt) +- [`yolov7x_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7x_training.pt) +- [`yolov7-w6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-w6_training.pt) +- [`yolov7-e6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6_training.pt) +- [`yolov7-d6_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-d6_training.pt) +- [`yolov7-e6e_training.pt`](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e_training.pt) ```bash python3 train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml ``` -## Inference - -On video: +Inference on video: ```bash python3 detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source yourvideo.mp4 ``` -On image: +Inference on image: ```bash python3 detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg ``` -## Results +## Model Results -| Model | Test Size | APtest | AP50test | -| :-- | :-: | :-: | :-: | -| [**YOLOv7**](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt) | 640 | **49.4%** | **68.6%** | +| Model | Test Size | APtest | AP50test | +|:-------|:---------:|:-----------------:|:------------------------------:| +| YOLOv7 | 640 | 49.4% | 68.6% | -## Reference +## References -= [YOLOv7](https://github.com/WongKinYiu/yolov7) +- [YOLOv7](https://github.com/WongKinYiu/yolov7) diff --git a/cv/detection/yolov8/pytorch/README.md b/cv/detection/yolov8/pytorch/README.md index 66673f403cbfc67acf28a45324d293ebd59ba92c..0f209d86473ef05669ce699a9ae4c2888d673806 100644 --- a/cv/detection/yolov8/pytorch/README.md +++ b/cv/detection/yolov8/pytorch/README.md @@ -1,30 +1,22 @@ # YOLOv8 -## Model description +## Model Description -[Ultralytics](https://ultralytics.com) [YOLOv8](https://github.com/ultralytics/ultralytics) is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks. +YOLOv8 is the latest iteration in the YOLO series, offering state-of-the-art performance in object detection and +tracking. It introduces enhanced architecture and training techniques for improved accuracy and speed. YOLOv8 supports +multiple tasks including instance segmentation, pose estimation, and image classification. The model is designed for +efficiency and ease of use, making it suitable for real-time applications. It maintains the YOLO tradition of fast +inference while delivering superior detection performance across various scenarios. -## Step 1: Installation +## Model Preparation -```bash -# Install zlib 1.2.9 -wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz -tar xvf zlib-1.2.9.tar.gz -cd zlib-1.2.9/ -./configure && make install +### Prepare Resources -# Install libGL -yum install mesa-libGL +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -# Install ultralytics -pip3 install ultralytics -``` - -## Step 2: Preparing datasets - -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -50,17 +42,34 @@ mkdir -p /datasets/ ln -s /path/to/coco2017 /datasets/ ``` -## Step 3: Training +### Install Dependencies + +```bash +# Install zlib 1.2.9 +wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz +tar xvf zlib-1.2.9.tar.gz +cd zlib-1.2.9/ +./configure && make install + +# Install libGL +yum install mesa-libGL + +# Install ultralytics +pip3 install ultralytics +``` + +## Model Training ```bash python3 test.py ``` -## Results +## Model Results + +| Model | GPU | FP32 | +|--------|------------|----------| +| YOLOv8 | BI-V100 x8 | MAP=36.3 | -| GPUs | FP32 | -| ---------- | ---------| -| BI-V100 x8 | MAP=36.3 | +## References -## Reference - [ultralytics](https://github.com/ultralytics/ultralytics) diff --git a/cv/detection/yolov9/pytorch/README.md b/cv/detection/yolov9/pytorch/README.md index a492f5691b1857ab67eac3216ec25b3c2ec2134e..2b7c01facb10472a20a38a25f20539ce0758e6f6 100644 --- a/cv/detection/yolov9/pytorch/README.md +++ b/cv/detection/yolov9/pytorch/README.md @@ -1,29 +1,35 @@ # YOLOv9 -## Model description +## Model Description -YOLOv9 is a state-of-the-art object detection algorithm that belongs to the YOLO (You Only Look Once) family of models. It is an improved version of the original YOLO algorithm with better accuracy and performance. +YOLOv9 is the latest advancement in the YOLO series, offering state-of-the-art object detection capabilities. It builds +upon previous versions with enhanced architecture and training techniques for improved accuracy and speed. YOLOv9 +introduces innovative features that optimize performance across various hardware platforms. The model maintains the YOLO +tradition of real-time detection while delivering superior results in complex scenarios. Its efficient design makes it +suitable for applications requiring fast and accurate object recognition in diverse environments. -## Step 1: Installation +## Model Preparation -```bash -yum install -y mesa-libGL -``` +### Prepare Resources -## Step 2: Preparing datasets +Download MS COCO dataset images ([train](http://images.cocodataset.org/zips/train2017.zip), +[val](http://images.cocodataset.org/zips/val2017.zip), [test](http://images.cocodataset.org/zips/test2017.zip)) and +[labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip). If you have +previously used a different version of YOLO, we strongly recommend that you delete train2017.cache and val2017.cache +files, and redownload [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip) ```bash bash scripts/get_coco.sh -``` -Download MS COCO dataset images ([train](http://images.cocodataset.org/zips/train2017.zip), [val](http://images.cocodataset.org/zips/val2017.zip), [test](http://images.cocodataset.org/zips/test2017.zip)) and [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip). If you have previously used a different version of YOLO, we strongly recommend that you delete train2017.cache and val2017.cache files, and redownload [labels](https://github.com/WongKinYiu/yolov7/releases/download/v0.1/coco2017labels-segments.zip) - -## Step 3: Training - -```bash # make soft link to coco dataset mkdir -p datasets/ ln -s /PATH/TO/COCO ./datasets/coco +``` + +### Install Dependencies + +```bash +yum install -y mesa-libGL # get yolov9 code git clone https://github.com/WongKinYiu/yolov9.git @@ -32,20 +38,16 @@ pip install -r requirements.txt pip install Pillow==9.5.0 ``` -### Training on a Single GPU +## Model Training ```bash +# Training on a Single GPU python3 train_dual.py --workers 8 --device 0 --batch 16 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 300 --close-mosaic 15 -``` - -### Multiple GPU training -```bash +# Multiple GPU training torchrun --nproc_per_node 8 --master_port 9527 train_dual.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch 128 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 300 --close-mosaic 15 ``` -## Results - -## Reference +## References -[YOLOv9](https://github.com/WongKinYiu/yolov9) +- [YOLOv9](https://github.com/WongKinYiu/yolov9) diff --git a/cv/distiller/cwd/pytorch/README.md b/cv/distiller/cwd/pytorch/README.md index bdfbfff4a3f5af0d92efe040e4ad7a3109ed8ddb..db68c39443dab2c1ab8a582392594fd542519b4d 100644 --- a/cv/distiller/cwd/pytorch/README.md +++ b/cv/distiller/cwd/pytorch/README.md @@ -1,14 +1,56 @@ # CWD -> [Channel-wise Knowledge Distillation for Dense Prediction](https://arxiv.org/abs/2011.13256) +## Model Description - +CWD (Channel-wise Knowledge Distillation) is a novel knowledge distillation method for dense prediction tasks like +semantic segmentation. Unlike traditional spatial distillation, CWD aligns feature maps channel-wise between teacher and +student networks by transforming each channel's feature map into a probability map and minimizing their KL divergence. +This approach focuses on the most salient regions of channel-wise maps, improving distillation efficiency and accuracy. +CWD outperforms spatial distillation methods while requiring less computational cost during training. -## Model description +## Model Preparation -Knowledge distillation (KD) has been proven to be a simple and effective tool for training compact models. Almost all KD variants for dense prediction tasks align the student and teacher networks' feature maps in the spatial domain, typically by minimizing point-wise and/or pair-wise discrepancy. Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks. To this end, we first transform the feature map of each channel into a probability map using softmax normalization, and then minimize the Kullback-Leibler (KL) divergence of the corresponding channels of the two networks. By doing so, our method focuses on mimicking the soft distributions of channels between networks. In particular, the KL divergence enables learning to pay more attention to the most salient regions of the channel-wise maps, presumably corresponding to the most useful signals for semantic segmentation. Experiments demonstrate that our channel-wise distillation outperforms almost all existing spatial distillation methods for semantic segmentation considerably, and requires less computational cost during training. We consistently achieve superior performance on three benchmarks with various network structures. +### Prepare Resources -## Step 1: Installation +Go to visit [Cityscapes official website](https://www.cityscapes-dataset.com/), then choose 'Download' to download the +Cityscapes dataset. + +Specify `/path/to/cityscapes` to your Cityscapes path in later training process, the unzipped dataset path structure +sholud look like: + +```bash +cityscapes/ +├── gtFine +│   ├── test +│   ├── train +│   │   ├── aachen +│   │   └── bochum +│   └── val +│   ├── frankfurt +│   ├── lindau +│   └── munster +└── leftImg8bit + ├── train + │   ├── aachen + │   └── bochum + └── val + ├── frankfurt + ├── lindau + └── munster +``` + +```shell +mkdir -p data/ +ln -s /path/to/cityscapes data/cityscapes +``` + +```bash +# Preprocess Data +cd ../ +python3 tools/dataset_converters/cityscapes.py data/cityscapes --nproc 8 +``` + +### Install Dependencies ```bash # install libGL @@ -41,34 +83,7 @@ pip3 install mmengine==0.7.3 python3 setup.py develop ``` -## Step 2: Preparing datasets - -Cityscapes 官方网站可以下载 [Cityscapes]() 数据集,按照官网要求注册并登陆后,数据可以在[这里]()找到。 - -```bash -mkdir data/ -cd data/ -``` - -按照惯例,**labelTrainIds.png 用于 cityscapes 训练。 我们提供了一个基于 cityscapesscripts 的脚本用于生成 **labelTrainIds.png。 - -```bash - ├── data - │ ├── cityscapes - │ │ ├── leftImg8bit - │ │ │ ├── train - │ │ │ ├── val - │ │ ├── gtFine - │ │ │ ├── train - │ │ │ ├── val -``` - -```bash -cd .. -# --nproc 表示 8 个转换进程,也可以省略。 -python3 tools/dataset_converters/cityscapes.py data/cityscapes --nproc 8 -``` -## Step 3: Training +## Model Training ```bash # On single GPU @@ -78,11 +93,12 @@ python3 tools/train.py configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspne bash tools/dist_train.sh configs/distill/mmseg/cwd/cwd_logits_pspnet_r101-d8_pspnet_r18-d8_4xb2-80k_cityscapes-512x1024.py 8 ``` -## Results +## Model Results + +| Model | GPU | FP32 | +|---------------------|------------|--------------| +| pspnet_r18(student) | BI-V100 x8 | Miou= 75.32 | -| model | GPU | FP32 | -|-------------------| ----------- | ------------------------------------ | -| pspnet_r18(student) | 8 cards | Miou= 75.32 | +## References -## Reference - [mmrazor](https://github.com/open-mmlab/mmrazor) diff --git a/cv/distiller/rkd/pytorch/README.md b/cv/distiller/rkd/pytorch/README.md index d58ed0e0ecc1df19a3f7d8df1f13dd9c8ac571cc..3bd0e331f7905056c8b40ecf7e0bb8baffec4a8f 100755 --- a/cv/distiller/rkd/pytorch/README.md +++ b/cv/distiller/rkd/pytorch/README.md @@ -1,11 +1,16 @@ -# RKD (Relational Knowledge Distillation) +# RKD -## Model description +## Model Description -Official implementation of [Relational Knowledge Distillation](https://arxiv.org/abs/1904.05068), CVPR 2019\ -This repository contains the source code of experiments for metric learning. +RKD (Relational Knowledge Distillation) is a knowledge distillation technique that transfers relational information +between data points from a teacher model to a student model. Instead of mimicking individual outputs, RKD focuses on +preserving the relationships (distance and angle) between embeddings. This approach is particularly effective for metric +learning tasks, where maintaining the relative structure of the embedding space is crucial. RKD enhances student model +performance by capturing higher-order relational knowledge from the teacher. -## Step 1: Installation +## Model Preparation + +### Install Dependencies ```bash # If 'ZLIB_1.2.9' is not found, you need to install it as below. @@ -17,7 +22,9 @@ cd .. rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/ ``` -## Step 2: Distillation +## Model Training + +### Model Distillation ```bash # Train a teacher embedding network of resnet50 (d=512) sing triplet loss (margin=0.2) with distance-weighted sampling. @@ -57,13 +64,12 @@ python3 run.py --mode eval \ --load student/best.pth ``` -## Results -| model | acc | -|:----------:|:--------:| -| RKD| Best Train Recall: 0.7940, Best Eval Recall: 0.5763 | - -## Reference -- [paper](https://arxiv.org/abs/2302.05637) +## Model Results +| Model | ACC | +|-------|-----------------------------------------------------| +| RKD | Best Train Recall: 0.7940, Best Eval Recall: 0.5763 | +## References +- [Paper](https://arxiv.org/abs/2302.05637) diff --git a/cv/distiller/wsld/pytorch/README.md b/cv/distiller/wsld/pytorch/README.md index cdd5c757bc3e8a35e9e2ae2acc74a21a09a3bcd5..d3c2b3b26ca2283622aaa1ba84a89a1fe324ceef 100644 --- a/cv/distiller/wsld/pytorch/README.md +++ b/cv/distiller/wsld/pytorch/README.md @@ -1,28 +1,19 @@ -# WSLD: Weighted Soft Label Distillation +# WSLD -## Model description -Knowledge distillation is an effective approach to leverage a well-trained network or an ensemble of them, named as the teacher, to guide the training of a student network. The outputs from the teacher network are used as soft labels for supervising the training of a new network.we investigate the bias-variance tradeoff brought by distillation with soft labels. Specifically, we observe that during training the bias-variance tradeoff varies sample-wisely. Further, under the same distillation temperature setting, we observe that the distillation performance is negatively associated with the number of some specific samples, which are named as regularization samples since these samples lead to bias increasing and variance decreasing. Nevertheless, we empirically find that completely filtering out regularization samples also deteriorates distillation performance. +## Model Description -## Step 1: Installation -```bash -# Install libGL -yum install mesa-libGL - -# Install zlib -wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz -tar xvf zlib-1.2.9.tar.gz -cd zlib-1.2.9/ -./configure && make install -cd .. -rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/ +WSLD (Weighted Soft Label Distillation) is a knowledge distillation technique that focuses on transferring soft label +information from a teacher model to a student model. Unlike traditional distillation methods that use uniform weighting, +WSLD assigns different weights to each class based on their importance or difficulty, allowing the student to focus more +on challenging or critical classes. This approach improves the student model's performance, particularly in imbalanced +datasets or tasks where certain classes require more attention. -# Install requirements -pip3 install opencv-python lmdb msgpack -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -40,9 +31,28 @@ imagenet └── val_list.txt ``` -**ImageNet to lmdb file** +### Install Dependencies + +```bash +# Install libGL +yum install mesa-libGL + +# Install zlib +wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz +tar xvf zlib-1.2.9.tar.gz +cd zlib-1.2.9/ +./configure && make install +cd .. +rm -rf zlib-1.2.9.tar.gz zlib-1.2.9/ + +# Install requirements +pip3 install opencv-python lmdb msgpack +``` + +### Preprocess Data -The code is used for training Imagenet. Our pre-trained teacher models are Pytorch official models. By default, we pack the ImageNet data as the lmdb file for faster IO. The lmdb files can be made as follows. +The code is used for training Imagenet. Our pre-trained teacher models are Pytorch official models. By default, we pack +the ImageNet data as the lmdb file for faster IO. The lmdb files can be made as follows. ```bash # 1. Generate the list of the image data. @@ -53,7 +63,7 @@ python3 dataset/img2lmdb.py --image_path /path/to/imagenet --list_path /path/to/ python3 dataset/img2lmdb.py --image_path /path/to/imagenet --list_path /path/to/imagenet --output_path '/path/to/imagenet' --split 'val' ``` -## Step 3: Training +## Model Training - train_with_distillation.py: train the model with our distillation method. - imagenet_train_cfg.py: all dataset and hyperparameter settings. @@ -66,16 +76,15 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_with_distillation.py ``` +## Model Results -## Results - -|GPUs| Network | Method | acc | -|:----:|:----------:|:--------:|:----------:| -| | ResNet 34 | Teacher | 73.19 | -| | ResNet 18 | Original | 69.75 | -| BI-V100 x 8 | ResNet 18 | Proposed | __71.6__ | +| GPU | Network | Method | acc | +|------------|-----------|----------|-------| +| BI-V100 x8 | ResNet 34 | Teacher | 73.19 | +| BI-V100 x8 | ResNet 18 | Original | 69.75 | +| BI-V100 x8 | ResNet 18 | Proposed | 71.6 | -## Reference +## References -- [Rethinking soft labels for knowledge distillation: A Bias-Variance Tradeoff Perspective](https://arxiv.org/abs/2102.00650) +- [Paper](https://arxiv.org/abs/2102.00650) - [Weighted Soft Label Distillation](https://github.com/bellymonster/Weighted-Soft-Label-Distillation) diff --git a/cv/face_detection/retinaface/pytorch/README.md b/cv/face_detection/retinaface/pytorch/README.md index 36267d7929ddd3f41a867e3fc1a6cd72b0d9756e..e220ac6f4aea6935aee6e2df3094806c5a0a3658 100644 --- a/cv/face_detection/retinaface/pytorch/README.md +++ b/cv/face_detection/retinaface/pytorch/README.md @@ -1,6 +1,6 @@ -# RetinaFace: Single-stage Dense Face Localisation in the Wild +# RetinaFace -## Model description +## Model Description Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which @@ -14,11 +14,9 @@ On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. -## Prepare +## Model Preparation -### Install packages - -### Download dataset +### Prepare Resources 1. Download the [WIDER FACE](http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html) dataset. @@ -53,48 +51,29 @@ a single CPU core for a VGA-resolution image. ``` -## Training - -### Single GPU with mobilenet backbone +## Model Training ```shell - +# Single GPU with mobilenet backbone python3 train.py --network mobile0.25 -``` - -### Multi GPU with resnet50 backbone - -```shell - +# Multi GPU with resnet50 backbone python3 train.py --network resnet50 -``` - -## Evaluate - -1. Generate txt file - -```Shell - +# Evaluate +## 1. Generate txt file python3 test_widerface.py --trained_model ${weight_file} --network mobile0.25 or resnet50 -``` - -2. Evaluate txt results. Demo come from [Here](https://github.com/wondervictor/WiderFace-Evaluation) - -```Shell - +## 2. Evaluate txt results. Demo come from [Here](https://github.com/wondervictor/WiderFace-Evaluation) cd ./widerface_evaluate python3 setup.py build_ext --inplace python3 evaluation.py - ``` -3. You can also use widerface official Matlab evaluate demo in +You can also use widerface official Matlab evaluate demo in [Here](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html) -## Results +## Model Results | Model | GPU | Type | AP | |------------|---------|------------|--------------------| diff --git a/cv/face_recognition/arcface/pytorch/README.md b/cv/face_recognition/arcface/pytorch/README.md index f478e2e6087818a86b4462f2b8c909f3f8574f27..d11fd055c50594ecfe1b75b578ba1e8ce53f51c6 100644 --- a/cv/face_recognition/arcface/pytorch/README.md +++ b/cv/face_recognition/arcface/pytorch/README.md @@ -1,23 +1,15 @@ # ArcFace -## Model description +## Model Description This repo is a pytorch implement of ArcFace, which propose an Additive Angular Margin Loss to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to the exact correspondence to the geodesic distance on the hypersphere. ArcFace consistently outperforms the state-of-the-art and can be easily implemented with negligible computational overhead -## Step 1: Installation +## Model Preparation -```bash -pip3 install -r requirements.txt -wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz -tar xvf zlib-1.2.9.tar.gz -cd zlib-1.2.9/ -./configure && make install -``` - -## Step 2: Preparing datasets +### Prepare Resources You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'. @@ -32,18 +24,28 @@ ln -s ${your_path_to_face} lfw_pair.txt python3 txt_annotation.py ``` -## Step 3: Training +### Install Dependencies + +```bash +pip3 install -r requirements.txt +wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz +tar xvf zlib-1.2.9.tar.gz +cd zlib-1.2.9/ +./configure && make install +``` + +## Model Training ```bash bash run.sh $GPUS ``` -## Results +## Model Results -| Model | FPS | LFW_Accuracy | -|---------|--------| -----------------| +| Model | FPS | LFW_Accuracy | +|---------|--------|------------------| | ArcFace | 38.272 | 0.99000+-0.00615 | -## Reference +## References - [arcface-pytorch](https://github.com/bubbliiiing/arcface-pytorch) diff --git a/cv/face_recognition/blazeface/paddlepaddle/README.md b/cv/face_recognition/blazeface/paddlepaddle/README.md index b5b510c840f3eb418185708392c4df7ddbd3c3d7..60772cbb5ae19e7fac62af45904ddce47d8c45b2 100644 --- a/cv/face_recognition/blazeface/paddlepaddle/README.md +++ b/cv/face_recognition/blazeface/paddlepaddle/README.md @@ -1,28 +1,13 @@ # BlazeFace -## Model description +## Model Description BlazeFace is Google Research published face detection model. It's lightweight but good performance, and tailored for mobile GPU inference. It runs at a speed of 200-1000+ FPS on flagship devices. -## Step 1: Installing +## Model Preparation -```bash -git clone https://github.com/PaddlePaddle/PaddleDetection.git -``` - -```bash -cd PaddleDetection -yum install mesa-libGL -y - -pip3 install -r requirements.txt -pip3 install protobuf==3.20.1 -pip3 install urllib3==1.26.6 -pip3 install IPython -pip3 install install numba==0.56.4 -``` - -## Step 2: Prepare Datasets +### Prepare Resources We use WIDER-FACE dataset for training and model tests, the official web site provides detailed data is introduced. @@ -59,19 +44,32 @@ dataset/wider_face/ cd dataset/wider_face && ./download_wider_face.sh ``` -## Step 3: Training +### Install Dependencies ```bash -cd PaddleDetection +git clone https://github.com/PaddlePaddle/PaddleDetection.git +``` -export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml +```bash +cd PaddleDetection +yum install mesa-libGL -y +pip3 install -r requirements.txt +pip3 install protobuf==3.20.1 +pip3 install urllib3==1.26.6 +pip3 install IPython +pip3 install install numba==0.56.4 ``` -## Step 4: Evaluation +## Model Training ```bash +cd PaddleDetection + +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python3 -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml + +# Evaluation python3 -u tools/eval.py -c configs/face_detection/blazeface_fpn_ssh_1000e.yml \ -o weights=output/blazeface_fpn_ssh_1000e/model_final.pdopt \ multi_scale=True @@ -85,12 +83,12 @@ python3 setup.py build_ext --inplace python3 evaluation.py -p ../output/pred/ -g ../eval_tools/ground_truth ``` -## Step 5: result +## Model Results -| GPUs | Network structure | Easy/Medium/Hard Set | Ips -|-------------|-----------|-----------|---—------| -| BI-V100 x 8 | BlazeFace-FPN-SSH | 0.886/0.860/0.753 | 8.6813 | + | Model | GPUs | Easy/Medium/Hard Set | Ips | + |-------------------|------------|----------------------|--------| + | BlazeFace-FPN-SSH | BI-V100 x8 | 0.886/0.860/0.753 | 8.6813 | -## Reference +## References - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) diff --git a/cv/face_recognition/cosface/pytorch/README.md b/cv/face_recognition/cosface/pytorch/README.md index e2076e8e8f35a58d18c4b408356bcb2061d867d6..3ac73556bbf35ffbb94aff8d30667643d426ff48 100644 --- a/cv/face_recognition/cosface/pytorch/README.md +++ b/cv/face_recognition/cosface/pytorch/README.md @@ -1,22 +1,14 @@ # CosFace -## Model description +## Model Description CosFace is a face recognition model that achieves state-of-the-art results by introducing a cosine margin penalty in the loss function when training the neural network, which learns highly discriminative facial embeddings by maximizing inter-class differences and minimizing intra-class variations. -## Step 1: Installation +## Model Preparation -```bash -# install zlib -wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz -tar xvf zlib-1.2.9.tar.gz -cd zlib-1.2.9/ -./configure && make install -``` - -## Step 2: Preparing datasets +### Prepare Resources You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'. @@ -31,7 +23,17 @@ ln -s ${your_path_to_face} lfw_pair.txt . python3 txt_annotation.py ``` -## Step 3: Training +### Install Dependencies + +```bash +# install zlib +wget http://www.zlib.net/fossils/zlib-1.2.9.tar.gz +tar xvf zlib-1.2.9.tar.gz +cd zlib-1.2.9/ +./configure && make install +``` + +## Model Training ```bash # 1 GPU @@ -41,12 +43,12 @@ bash train.sh 0 bash train.sh 0,1,2,3,4,5,6,7 ``` -## Results +## Model Results | Model | FPS | LFW_Accuracy | |---------|---------|--------------| | Cosface | 5492.08 | 0.9865 | -## Reference +## References - [CosFace_pytorch](https://github.com/MuggleWang/CosFace_pytorch) diff --git a/cv/face_recognition/facenet/pytorch/README.md b/cv/face_recognition/facenet/pytorch/README.md index 3cd9d1f502136f50e141b274811e8226e584a4d7..9d0d66112c05d4cf78409dbc8f1df202dea6e00b 100644 --- a/cv/face_recognition/facenet/pytorch/README.md +++ b/cv/face_recognition/facenet/pytorch/README.md @@ -2,15 +2,29 @@ ## Model Description -This is a facenet-pytorch library that can be used to train your own face recognition model. -> Despite significant recent advances in the field of face recognition, implementing face verification and recognition -efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called -FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly -correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, -verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature -vectors. +Facenet is a deep learning model for face recognition that directly maps face images to a compact Euclidean space, where +distances correspond to face similarity. It uses a triplet loss function to ensure that faces of the same person are +closer together than those of different individuals. Facenet excels in tasks like face verification, recognition, and +clustering, offering high accuracy and efficiency. Its compact embeddings make it scalable for large-scale applications +in security and identity verification. -## Step 1: Installation +## Model Preparation + +### Prepare Resources + +You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'. + +```bash +# download +ln -s ${your_path_to_face} datasets/ +ln -s ${your_path_to_face} lfw/ +ln -s ${your_path_to_face} lfw_pair.txt + +# preprocess +python3 txt_annotation.py +``` + +### Install Dependencies ```bash pip3 install -r requirements.txt @@ -26,33 +40,19 @@ pip3 install matplotlib pip3 install scikit-learn ``` -## Step 2: Preparing datasets - -You can download datasets from [BaiduPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'. - -```bash -# download -ln -s ${your_path_to_face} datasets/ -ln -s ${your_path_to_face} lfw/ -ln -s ${your_path_to_face} lfw_pair.txt - -# preprocess -python3 txt_annotation.py -``` - -## Step 3: Training +### Model Training ```bash python3 train.py ``` -## Results +## Model Results -| Model | FPS | LFW_Accuracy | -|---------|--------| -----------------| -| Facenet | 1256.96| 0.97933+-0.00624 | +| Model | FPS | LFW_Accuracy | +|---------|---------|------------------| +| Facenet | 1256.96 | 0.97933+-0.00624 | -## Reference +## References -- [paper](https://arxiv.org/abs/1503.03832) +- [Paper](https://arxiv.org/abs/1503.03832) - [facenet-pytorch](https://github.com/bubbliiiing/facenet-pytorch) diff --git a/cv/face_recognition/facenet/tensorflow/README.md b/cv/face_recognition/facenet/tensorflow/README.md index 547d4e76d582d25446174439b89b271e6d12ed51..16dee24005fafb68527bd524c245f789158a103b 100644 --- a/cv/face_recognition/facenet/tensorflow/README.md +++ b/cv/face_recognition/facenet/tensorflow/README.md @@ -1,18 +1,16 @@ # Facenet -## Model description +## Model Description -This is a facenet-tensorflow library that can be used to train your own face recognition model. +Facenet is a deep learning model for face recognition that directly maps face images to a compact Euclidean space, where +distances correspond to face similarity. It uses a triplet loss function to ensure that faces of the same person are +closer together than those of different individuals. Facenet excels in tasks like face verification, recognition, and +clustering, offering high accuracy and efficiency. Its compact embeddings make it scalable for large-scale applications +in security and identity verification. -## Step 1: Installation +## Model Preparation -```bash -# Install requirements. -bash init.sh -pip3 install numpy==1.23.5 -``` - -## Step 2: Preparing datasets +### Prepare Resources The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training. This training set consists of total of 453 453 images over 10 575 identities after face detection. Some performance @@ -20,26 +18,22 @@ improvement has been seen if the dataset has been filtered before training. Some done will come later. The best performing model has been trained on the [VGGFace2](https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/) dataset consisting of ~3.3M faces and ~9000 classes. -## download +Download from [Baidu YunPan](https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw) with password 'bcrq'. -```bash -cd data -download dataset in this way: -download link: https://pan.baidu.com/s/1qMxFR8H_ih0xmY-rKgRejw password: bcrq -The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training +The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training. -# Data tree: +```bash $ ls data/webface_182_44 0000045 -...... +... $ ls data/lfw_data lfw lfw_160 lfw.tgz ``` -## Pre-processing +Pre-processing. -### Face alignment using MTCNN +Face alignment using MTCNN. One problem with the above approach seems to be that the Dlib face detector misses some of the hard examples (partial occlusion, silhouettes, etc). This makes the training set too "easy" which causes the model to perform worse on other @@ -51,31 +45,35 @@ good results. A Python/Tensorflow implementation of MTCNN can be found [here](https://github.com/davidsandberg/facenet/tree/master/src/align). This implementation does not give identical results to the Matlab/Caffe implementation but the performance is very similar. -## Step 3: Training +### Install Dependencies + +```bash +# Install requirements. +bash init.sh +pip3 install numpy==1.23.5 +``` + +## Model Training Currently, the best results are achieved by training the model using softmax loss. Details on how to train a model using softmax loss on the CASIA-WebFace dataset can be found on the page [Classifier training of Inception-ResNet-v1](https://github.com/davidsandberg/facenet/wiki/Classifier-training-of-inception-resnet-v1) and . -### One Card - ```bash +# One Card nohup bash train_facenet.sh 1> train_facenet.log 2> train_facenet_error.log & tail -f train_facenet.log -``` -### Multiple cards (DDP) - -```bash -# 8 Cards(DDP) +# Multiple cards (DDP) +## 8 Cards(DDP) bash train_facenet_ddp.sh ``` -## Results +## Model Results | Model | FPS | LFW_Accuracy | |---------|--------|------------------| | Facenet | 216.96 | 0.98900+-0.00642 | -## Reference +## References - [facenet](https://github.com/davidsandberg/facenet) diff --git a/cv/gnn/gat/paddlepaddle/README.md b/cv/gnn/gat/paddlepaddle/README.md index 6caf3844a95b29b5c11ecfe40f44363a01e306dc..6429faf2ad6a2060bdda3a882acde097949d860f 100644 --- a/cv/gnn/gat/paddlepaddle/README.md +++ b/cv/gnn/gat/paddlepaddle/README.md @@ -1,34 +1,40 @@ -# GAT (Graph Attention Networks) +# GAT -[Graph Attention Networks \(GAT\)](https://arxiv.org/abs/1710.10903) is a novel architectures that operate on -graph-structured data, which leverages masked self-attentional layers to address the shortcomings of prior methods based -on graph convolutions or their approximations. Based on PGL, we reproduce GAT algorithms and reach the same level of -indicators as the paper in citation network benchmarks. +## Model Description -## Step 1: Installation +GAT (Graph Attention Network) is a novel neural network architecture for graph-structured data that uses self-attention +mechanisms to process node features. Unlike traditional graph convolutional networks, GAT assigns different importance +to neighboring nodes through attention coefficients, allowing for more flexible and expressive feature aggregation. This +approach enables the model to handle varying neighborhood sizes and capture complex relationships in graph data, making +it particularly effective for tasks like node classification and graph-based prediction problems. + +## Model Preparation + +### Prepare Resources + +There's no need to prepare dastasets. The details for datasets can be found in the +[paper](https://arxiv.org/abs/1609.02907). + +### Install Dependencies ```bash git clone -b 2.2.5 https://github.com/PaddlePaddle/PGL pip3 install pgl==2.2.5 ``` -## Step 2: Preparing datasets - -There's no need to prepare dastasets. The details for datasets can be found in the [paper](https://arxiv.org/abs/1609.02907). - -## Step 3: Training +## Model Training ```bash cd PGL/examples/gat/ CUDA_VISIBLE_DEVICES=0 python3 train.py --dataset cora ``` -## Results +## Model Results -| GPUs | Accuracy | FPS | -| --- | --- | --- | -| BI-V100 | 83.16% | 65.56 it/s | +| Model | GPU | Accuracy | FPS | +|-------|---------|----------|------------| +| GAT | BI-V100 | 83.16% | 65.56 it/s | -## Reference +## References - [PGL](https://github.com/PaddlePaddle/PGL) diff --git a/cv/gnn/gcn/mindspore/README.md b/cv/gnn/gcn/mindspore/README.md index a15ece1f980be9b7b3caf91572b8048ea98adb99..8e48864cde91b3278b96cf4f970f4fea767e9d7f 100755 --- a/cv/gnn/gcn/mindspore/README.md +++ b/cv/gnn/gcn/mindspore/README.md @@ -1,23 +1,15 @@ # GCN -## Model description +## Model Description -GCN(Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured +GCN (Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured data. A scalable approach based on an efficient variant of convolutional neural networks which operate directly on graphs was presented. The model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. -[Paper](https://arxiv.org/abs/1609.02907): Thomas N. Kipf, Max Welling. 2016. Semi-Supervised Classification with Graph -Convolutional Networks. In ICLR 2016. +## Model Preparation -## Step 1: Installing - -```sh -pip3 install -r requirements.txt -pip3 install easydict -``` - -## Step 2: Prepare Datasets +### Prepare Resources Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related @@ -28,31 +20,32 @@ dataset below. | Cora | Citation network | 2708 | 5429 | 7 | 1433 | 0.052 | | Citeseer | Citation network | 3327 | 4732 | 6 | 3703 | 0.036 | -## Step 3: Training +### Install Dependencies ```sh -cd scripts -bash train_gcn_1p.sh +pip3 install -r requirements.txt +pip3 install easydict ``` -## Evaluation +## Model Training ```sh -cd .. +cd scripts/ +bash train_gcn_1p.sh + +# Evaluation +cd ../ python3 eval.py --data_dir=scripts/data_mr/cora --device_target="GPU" \ --model_ckpt scripts/train/ckpt/ckpt_gcn-200_1.ckpt &> eval.log & ``` -## Evaluation result - -### Results on BI-V100 +## Model Results -| GPUs | per step time | Acc | -|------|-------------- |-------| -| 1 | 4.454 | 0.8711| +| Model | GPU | per step time | Acc | +|-------|-------------|---------------|--------| +| GCN | BI-V100 x1 | 4.454 | 0.8711 | +| GCN | NV-V100s x1 | 5.278 | 0.8659 | -### Results on NV-V100s +## References -| GPUs | per step time | Acc | -|------|-------------- |-------| -| 1 | 5.278 | 0.8659| +- [Paper](https://arxiv.org/abs/1609.02907) diff --git a/cv/gnn/gcn/paddlepaddle/README.md b/cv/gnn/gcn/paddlepaddle/README.md index a36c0901cb95ccc1948e5482aa16f159c6c695cb..63cb50c181eb3fe23a001f761e32a4b288c91cd5 100644 --- a/cv/gnn/gcn/paddlepaddle/README.md +++ b/cv/gnn/gcn/paddlepaddle/README.md @@ -1,16 +1,21 @@ # GCN -## Model description +## Model Description -GCN(Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured +GCN (Graph Convolutional Networks) was proposed in 2016 and designed to do semi-supervised learning on graph-structured data. A scalable approach based on an efficient variant of convolutional neural networks which operate directly on graphs was presented. The model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. -[Paper](https://gitee.com/link?target=https%3A%2F%2Farxiv.org%2Fabs%2F1609.02907): Thomas N. Kipf, Max Welling. 2016. -Semi-Supervised Classification with Graph Convolutional Networks. In ICLR 2016. +## Model Preparation -## Step 1:Installation +### Prepare Resources + +Datasets are called in the code. + +The datasets contain three citation networks: CORA, PUBMED, CITESEER. + +### Install Dependencies ```sh # Clone PGL repository @@ -24,32 +29,24 @@ pip3 install urllib3==1.23 pip3 install networkx ``` -## Step 2:Preparing datasets - -Datasets are called in the code. - -The datasets contain three citation networks: CORA, PUBMED, CITESEER. - -## Step 3:Training +## Model Training ```sh cd PGL/examples/gcn/ -# Run on CPU -python3 train.py --dataset cora - # Run on GPU CUDA_VISIBLE_DEVICES=0 python3 train.py --dataset cora ``` -## Results +## Model Results -| GPUS | Datasets | speed | Accurary | -|-----------|----------|--------|----------| -| BI V100×1 | CORA | 0.0064 | 80.3% | -| BI V100×1 | PUBMED | 0.0076 | 79.0% | -| BI V100×1 | CITESEER | 0.0085 | 70.6% | + | Model | GPU | Datasets | speed | Accurary | + |-------|------------|----------|--------|----------| + | GCN | BI-V100 ×1 | CORA | 0.0064 | 80.3% | + | GCN | BI-V100 ×1 | PUBMED | 0.0076 | 79.0% | + | GCN | BI-V100 ×1 | CITESEER | 0.0085 | 70.6% | -## Reference +## References -[PGL/GCN](https://github.com/PaddlePaddle/PGL/tree/main/examples/gcn) +- [PGL](https://github.com/PaddlePaddle/PGL/tree/main/examples/gcn) +- [Paper](https://arxiv.org/abs/1609.02907) diff --git a/cv/gnn/graphsage/paddlepaddle/README.md b/cv/gnn/graphsage/paddlepaddle/README.md index ff1851654ef4f79da7d49ee84b54d652438f6e5a..fe7a86ba67f99563d2c0fe77346bfa810a34635d 100644 --- a/cv/gnn/graphsage/paddlepaddle/README.md +++ b/cv/gnn/graphsage/paddlepaddle/README.md @@ -1,21 +1,16 @@ -# GraphSAGE (Inductive Representation Learning on Large Graphs) +# GraphSAGE -[GraphSAGE](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf) is a general inductive framework that -leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen -data. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates embeddings by -sampling and aggregating features from a node’s local neighborhood. Based on PGL, we reproduce GraphSAGE algorithm and -reach the same level of indicators as the paper in Reddit Dataset. Besides, this is an example of subgraph sampling and -training in PGL. +## Model Description -## Step 1: Installation +GraphSAGE (Graph Sample and Aggregated) is an inductive graph neural network model designed for large-scale graph data. +Unlike traditional methods that learn fixed node embeddings, GraphSAGE learns a function to generate embeddings by +sampling and aggregating features from a node's local neighborhood. This approach enables the model to generalize to +unseen nodes and graphs, making it particularly effective for dynamic graphs and large-scale applications like social +network analysis and recommendation systems. -```sh -git clone -b 2.2.5 https://github.com/PaddlePaddle/PGL -pip3 install scikit-learn -pip3 install pgl==2.2.5 -``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources The reddit dataset should be downloaded from the following links and placed in the directory ```pgl.data```. The details for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/graphsage-nips17.pdf). @@ -28,7 +23,15 @@ for Reddit Dataset can be found [here](https://cs.stanford.edu/people/jure/pubs/ ln -s /path/to/reddit/ /usr/local/lib/python3.7/site-packages/pgl/data/ ``` -## Step 3: Training +### Install Dependencies + +```sh +git clone -b 2.2.5 https://github.com/PaddlePaddle/PGL +pip3 install scikit-learn +pip3 install pgl==2.2.5 +``` + +## Model Training To train a GraphSAGE model on Reddit Dataset, you can just run: @@ -38,12 +41,12 @@ cd PGL/examples/graphsage/cpu_sample_version CUDA_VISIBLE_DEVICES=0 python3 train.py --epoch 10 --normalize --symmetry ``` -## Results +## Model Results | GPUs | Accuracy | FPS | |------------|----------|------------| | BI-V100 x1 | 0.9072 | 47.54 s/it | -## Reference +## References - [PGL](https://github.com/PaddlePaddle/pgl) diff --git a/cv/image_generation/README.md b/cv/image_generation/README.md deleted file mode 100644 index dd3dc40b4fc30f7f555bb329a7b89c8dbeeeaea6..0000000000000000000000000000000000000000 --- a/cv/image_generation/README.md +++ /dev/null @@ -1 +0,0 @@ -# Image Generation diff --git a/cv/image_generation/dcgan/mindspore/README.md b/cv/image_generation/dcgan/mindspore/README.md index 776865ad3742bcbb28a5954fd8b67a82d4aed8a6..e5c2825da1a5dea12627b89ed4d6e516dd5c58dc 100644 --- a/cv/image_generation/dcgan/mindspore/README.md +++ b/cv/image_generation/dcgan/mindspore/README.md @@ -1,25 +1,24 @@ # DCGAN -## Model description +## Model Description -The deep convolutional generative adversarial networks (DCGANs) first introduced CNN into the GAN structure, and the strong feature extraction ability of convolution layer was used to improve the generation effect of GAN. +The deep convolutional generative adversarial networks (DCGANs) first introduced CNN into the GAN structure, and the +strong feature extraction ability of convolution layer was used to improve the generation effect of GAN. -[Paper](https://arxiv.org/pdf/1511.06434.pdf): Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks[J]. Computer ence, 2015. -## Step 1: Installing -``` -pip3 install -r requirements.txt -``` -## Step 2: Prepare Datasets +## Model Preparation + +### Prepare Resources Train DCGAN Dataset used: [Imagenet-1k](http://www.image-net.org/index) - Dataset size: ~125G, 224*224 colorful images in 1000 classes - - Train: 120G, 1281167 images - - Test: 5G, 50000 images + - Train: 120G, 1281167 images + - Test: 5G, 50000 images - Data format: RGB images. - - Note: Data will be processed in src/dataset.py + - Note: Data will be processed in src/dataset.py -Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. +Sign up and login in [ImageNet official website](https://www.image-net.org/index.php), then choose 'Download' to +download the whole ImageNet dataset. Specify `/path/to/imagenet` to your ImageNet path in later training process. The ImageNet dataset path structure should look like: @@ -37,28 +36,34 @@ imagenet └── val_list.txt ``` -## Step 3: Training -### On single GPU +### Install Dependencies + ```bash -python3 train.py --device_id=2 --data_url=/path/to/imagenet/train --train_url=./ --device_target=GPU +pip3 install -r requirements.txt ``` -### [Evaluation] + +## Model Training ```bash +# On single GPU +python3 train.py --device_id=2 --data_url=/path/to/imagenet/train --train_url=./ --device_target=GPU + +# Evaluation python3 -u eval.py --device_id=$DEVICE_ID --img_url=$PATH1 --ckpt_url=$PATH2 --device_target=GPU ``` -### [Evaluation result] -### 单卡性能数据:BI-V100 -![image](image2022-9-14_10-39-29.png) -![image](image2022-9-14_10-41-12.png) -### 单卡性能数据:NV-V100S -![image](image2022-9-13_13-5-52.png) -![image](image2022-9-13_13-12-42.png) - +## Model Results +One BI-V100 GPU +![image](image2022-9-14_10-39-29.png) +![image](image2022-9-14_10-41-12.png) +One NV-V100S GPU +![image](image2022-9-13_13-5-52.png) +![image](image2022-9-13_13-12-42.png) +## References +- [Paper](https://arxiv.org/pdf/1511.06434.pdf) diff --git a/cv/image_generation/pix2pix/paddlepaddle/README.md b/cv/image_generation/pix2pix/paddlepaddle/README.md index 7d337174e00d8d858e8cf379854b868a34e17ed0..684ed993aab710258dc8538d9e735a14d3cadfa3 100755 --- a/cv/image_generation/pix2pix/paddlepaddle/README.md +++ b/cv/image_generation/pix2pix/paddlepaddle/README.md @@ -1,20 +1,16 @@ # Pix2Pix -## Model description -Pix2Pix uses paired images for image translation, which has two different styles of the same image as input, can be used for style transfer. Pix2pix is encouraged by cGAN, cGAN inputs a noisy image and a condition as the supervision information to the generation network, Pix2pix uses another style of image as the supervision information input into the generation network, so the fake image is related to another style of image which is input as supervision information, thus realizing the process of image translation. -## Step 1: Installation -```bash -git clone https://github.com/PaddlePaddle/PaddleGAN.git -``` +## Model Description -```bash -cd PaddleGAN -pip3 install -r requirements.txt -pip3 install urllib3==1.26.6 -yum install mesa-libGL -y -``` +Pix2Pix uses paired images for image translation, which has two different styles of the same image as input, can be used +for style transfer. Pix2pix is encouraged by cGAN, cGAN inputs a noisy image and a condition as the supervision +information to the generation network, Pix2pix uses another style of image as the supervision information input into the +generation network, so the fake image is related to another style of image which is input as supervision information, +thus realizing the process of image translation. -## Step 2: Preparing datasets +## Model Preparation + +### Prepare Resources Datasets used by Pix2Pix can be downloaded from [here](http://efrosgans.eecs.berkeley.edu/pix2pix/datasets/). @@ -31,7 +27,20 @@ facades └── val ``` -## Step 3: Training +### Install Dependencies + +```bash +git clone https://github.com/PaddlePaddle/PaddleGAN.git +``` + +```bash +cd PaddleGAN +pip3 install -r requirements.txt +pip3 install urllib3==1.26.6 +yum install mesa-libGL -y +``` + +## Model Training ```bash # move facades dataset to data/ @@ -41,21 +50,21 @@ mv facades/ data/ python3 -u tools/main.py --config-file configs/pix2pix_facades.yaml ``` -## Step 4: Evaluation - ```bash +# Evaluation python3 tools/main.py --config-file configs/pix2pix_facades.yaml --evaluate-only --load ${PATH_OF_WEIGHT} ``` -## Results -|GPUs|Metric FID|FPS| -|:---:|:---:|:---:| -|BI-V100|120.5818|16.12240| +## Model Results + +| Model | GPU | Metric FID | FPS | +|---------|---------|------------|----------| +| Pix2Pix | BI-V100 | 120.5818 | 16.12240 | The generated images at epoch 200 is shown below: - +![results](results.png) +## References -## Reference -- [PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN) +- [PaddleGAN](https://github.com/PaddlePaddle/PaddleGAN) diff --git a/cv/instance_segmentation/README.md b/cv/instance_segmentation/README.md deleted file mode 100644 index 8f98e41f56f89eb1acacd37210049a629d577884..0000000000000000000000000000000000000000 --- a/cv/instance_segmentation/README.md +++ /dev/null @@ -1 +0,0 @@ -# Instance Segmentation diff --git a/cv/instance_segmentation/solo/pytorch/README.md b/cv/instance_segmentation/solo/pytorch/README.md index e28f4b8517c157b25db3c4932de25365dd1ee32f..237fd3b69fdc1eb070aaf31b4b85100fab825fc2 100644 --- a/cv/instance_segmentation/solo/pytorch/README.md +++ b/cv/instance_segmentation/solo/pytorch/README.md @@ -1,37 +1,26 @@ # SOLO -## Model description +## Model Description -We present a new, embarrassingly simple approach to instance segmentation in images. Compared to many other dense prediction tasks, e.g., semantic segmentation, it is the arbitrary number of instances that have made instance segmentation much more challenging. In order to predict a mask for each instance, mainstream approaches either follow the 'detect-thensegment' strategy as used by Mask R-CNN, or predict category masks first then use clustering techniques to group pixels into individual instances. We view the task of instance segmentation from a completely new perspective by introducing the notion of "instance categories", which assigns categories to each pixel within an instance according to the instance's location and size, thus nicely converting instance mask segmentation into a classification-solvable problem. Now instance segmentation is decomposed into two classification tasks. We demonstrate a much simpler and flexible instance segmentation framework with strong performance, achieving on par accuracy with Mask R-CNN and outperforming recent singleshot instance segmenters in accuracy. We hope that this very simple and strong framework can serve as a baseline for many instance-level recognition tasks besides instance segmentation. +SOLO (Segmenting Objects by Locations) is an innovative instance segmentation model that simplifies the task by +introducing "instance categories". It converts instance segmentation into a classification problem by assigning +categories to each pixel based on an object's location and size. Unlike traditional methods, SOLO directly predicts +instance masks without complex post-processing or region proposals. This approach achieves competitive accuracy with +Mask R-CNN while offering a simpler and more flexible framework for instance-level recognition tasks. -## Step 1: Installing packages +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx - -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . - -# Prepare resnet50-0676ba61.pth, skip this if fast network -mkdir -p /root/.cache/torch/hub/checkpoints/ -wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth -``` - -## Step 2: Preparing datasets +### Prepare Resources ```bash -$ mkdir -p data/coco -$ cd data/coco +mkdir -p data/coco +cd data/coco ``` + Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -52,23 +41,42 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies -### One single GPU ```bash -python3 tools/train.py configs/solo/solo_r50_fpn_1x_coco.py +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . + +# Prepare resnet50-0676ba61.pth, skip this if fast network +mkdir -p /root/.cache/torch/hub/checkpoints/ +wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth ``` -### Multiple GPUs on one machine +## Model Training ```bash +# One single GPU +python3 tools/train.py configs/solo/solo_r50_fpn_1x_coco.py + +# Multiple GPUs on one machine sed -i 's/python /python3 /g' tools/dist_train.sh bash tools/dist_train.sh configs/solo/solo_r50_fpn_1x_coco.py 8 ``` -## Results on BI-V100 +## Model Results + +| Model | GPU | mAP(0.5:0.95) | +|-------|------------|---------------| +| SOLO | BI-V100 x8 | 0.361 | -Average Precision (AP) @[ loU=0.50:0.95 | area= all | maxDets=1001 ] = 0.361 +## References -## Reference -[mmdetection](https://github.com/open-mmlab/mmdetection) +- [mmdetection](https://github.com/open-mmlab/mmdetection) diff --git a/cv/instance_segmentation/solov2/pytorch/README.md b/cv/instance_segmentation/solov2/pytorch/README.md index ccc8e3a32968f626fdce2c43a4a5a8529091e9de..01939217a7c64fa62e20b94ce011b25b351d7cea 100644 --- a/cv/instance_segmentation/solov2/pytorch/README.md +++ b/cv/instance_segmentation/solov2/pytorch/README.md @@ -1,36 +1,22 @@ # SOLOV2 -## Model description +## Model Description -In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. +SOLOv2 is an enhanced instance segmentation model that builds upon SOLO's approach by dynamically learning mask heads +conditioned on object locations. It decouples mask prediction into kernel and feature branches for improved efficiency. +SOLOv2 introduces Matrix NMS, a faster non-maximum suppression technique that processes masks in parallel. This +architecture achieves state-of-the-art performance in both speed and accuracy, with a lightweight version running at +31.3 FPS. It serves as a strong baseline for various instance-level recognition tasks beyond segmentation. -## Step 1: Installation +## Model Preparation -```bash -# Install libGL -## CentOS -yum install -y mesa-libGL -## Ubuntu -apt install -y libgl1-mesa-glx +### Prepare Resources -# install MMDetection -git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 -cd mmdetection -pip install -v -e . +Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to +download. -# Prepare resnet50-0676ba61.pth, skip this if fast network -mkdir -p /root/.cache/torch/hub/checkpoints/ -wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth - -# Install others -pip3 install yapf==0.31.0 urllib3==1.26.18 -``` - -## Step 2: Preparing datasets - -Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. - -Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the unzipped dataset path structure sholud look like: +Take coco2017 dataset as an example, specify `/path/to/coco2017` to your COCO path in later training process, the +unzipped dataset path structure sholud look like: ```bash coco2017 @@ -51,7 +37,29 @@ coco2017 └── ... ``` -## Step 3: Training +### Install Dependencies + +```bash +# Install libGL +## CentOS +yum install -y mesa-libGL +## Ubuntu +apt install -y libgl1-mesa-glx + +# install MMDetection +git clone https://github.com/open-mmlab/mmdetection.git -b v3.3.0 --depth=1 +cd mmdetection +pip install -v -e . + +# Prepare resnet50-0676ba61.pth, skip this if fast network +mkdir -p /root/.cache/torch/hub/checkpoints/ +wget https://download.pytorch.org/models/resnet50-0676ba61.pth -O /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth + +# Install others +pip3 install yapf==0.31.0 urllib3==1.26.18 +``` + +## Model Training ```bash # Make soft link to dataset @@ -68,11 +76,12 @@ python3 tools/train.py configs/solov2/solov2_r50_fpn_1x_coco.py bash tools/dist_train.sh configs/solov2/solov2_r50_fpn_1x_coco.py 8 ``` -## Results +## Model Results + +| Model | GPUs | FPS | +|--------|------------|----------------| +| SOLOV2 | BI-V100 x8 | 21.26 images/s | -| GPUs | FPS | -| ---------- | --------- | -| BI-V100 x8 | 21.26 images/s | +## References -## Reference -[mmdetection](https://github.com/open-mmlab/mmdetection) +- [mmdetection](https://github.com/open-mmlab/mmdetection) diff --git a/cv/instance_segmentation/yolact/pytorch/README.md b/cv/instance_segmentation/yolact/pytorch/README.md index 0506ddaf62fe59d98b5e7dec07fb8dab7d6cc555..c06cee1a593e21cd715d7cc7478a37701e4336d0 100644 --- a/cv/instance_segmentation/yolact/pytorch/README.md +++ b/cv/instance_segmentation/yolact/pytorch/README.md @@ -1,24 +1,16 @@ # YOLACT -## Model description -A simple, fully convolutional model for real-time instance segmentation. This is the code for papers: - - [YOLACT: Real-time Instance Segmentation](https://arxiv.org/abs/1904.02689) - - [YOLACT++: Better Real-time Instance Segmentation](https://arxiv.org/abs/1912.06218) +## Model Description -## Step 1: Installing packages -``` -# Cython needs to be installed before pycocotools -pip3 install cython -pip3 install opencv-python pillow pycocotools matplotlib -``` +YOLACT (You Only Look At Coefficients) is a real-time instance segmentation model that separates mask prediction from +object detection. It generates prototype masks and prediction coefficients independently, then combines them to produce +instance masks. This approach enables fast processing while maintaining competitive accuracy. YOLACT++ further enhances +performance with deformable convolutions and optimized prediction heads. The model achieves real-time speeds on single +GPUs, making it suitable for applications requiring quick instance segmentation in video streams or interactive systems. -If you want to use YOLACT++, compile deformable convolutional layers (from [DCNv2](https://github.com/CharlesShang/DCNv2/tree/pytorch_1.0)). - ```Shell - cd external/DCNv2 - python3 setup.py build develop - ``` +## Model Preparation -## Step 2: Preparing datasets +### Prepare Resources Go to visit [COCO official website](https://cocodataset.org/#download), then select the COCO dataset you want to download. @@ -43,25 +35,45 @@ coco2017 └── ... ``` -Modify the configuration file(data/config.py) +### Install Dependencies + +```shell +# Cython needs to be installed before pycocotools +pip3 install cython +pip3 install opencv-python pillow pycocotools matplotlib ``` -$ vim data/config.py -$ # 'train_images': the path of train images -$ # 'train_info': the path of train_info -$ # 'valid_images': the path of valid images -$ # 'valid_info': the path of valid_info + +If you want to use YOLACT++, compile deformable convolutional layers (from +[DCNv2](https://github.com/CharlesShang/DCNv2/tree/pytorch_1.0)). + +```shell +cd external/DCNv2 +python3 setup.py build develop ``` -# Training +Modify the configuration file(data/config.py). + +```shell +vim data/config.py +# 'train_images': the path of train images +# 'train_info': the path of train_info +# 'valid_images': the path of valid images +# 'valid_info': the path of valid_info +``` + +## Model Training + By default, we train on COCO. Make sure to download the entire dataset using the commands above. - - To train, grab an imagenet-pretrained model and put it in `./weights`. - - For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing). - - For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing). - - For Darknet53, download `darknet53.pth` from [here](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view?usp=sharing). - - Run one of the training commands below. - - Note that you can press ctrl+c while training and it will save an `*_interrupt.pth` file at the current iteration. - - All weights are saved in the `./weights` directory by default with the file name `__.pth`. -```Shell + +- To train, grab an imagenet-pretrained model and put it in `./weights`. + - For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing). + - For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing). + - For Darknet53, download `darknet53.pth` from [here](https://drive.google.com/file/d/17Y431j4sagFpSReuPNoFcj9h7azDTZFf/view?usp=sharing). +- Run one of the training commands below. + - Note that you can press ctrl+c while training and it will save an `*_interrupt.pth` file at the current iteration. + - All weights are saved in the `./weights` directory by default with the file name `__.pth`. + +```shell # Trains using the base config with a batch size of 8 (the default). python3 train.py --config=yolact_base_config @@ -75,29 +87,33 @@ python3 train.py --config=yolact_base_config --resume=weights/yolact_base_10_321 python3 train.py --help ``` -## Multi-GPU Support YOLACT now supports multiple GPUs seamlessly during training: - - Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]` - - Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3). - - You should still do this if only using 1 GPU. - - You can check the indices of your GPUs with `nvidia-smi`. - - Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values. - - If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using. - - If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`. - - The learning rate should divide the number of gpus. +- Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]` + - Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3). + - You should still do this if only using 1 GPU. + - You can check the indices of your GPUs with `nvidia-smi`. +- Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values. + - If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using. + - If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where + [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`. +- The learning rate should divide the number of gpus. For example: use 8 GPUs to train. -``` + +```shell +# Multi-GPU Support export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train.py --config=yolact_base_config --batch_size 64 --lr 0.000125 ``` -## Results -| | all | .50 | .55 | .60 | .65 | .70 | .75 | .80 | .85 | .90 | .95 | -|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------| -| box | 11.64 | 23.13 | 21.51 | 19.36 | 16.98 | 14.19 | 10.93 | 6.83 | 2.83 | 0.61 | 0.05 | -| mask | 11.20 | 20.51 | 19.07 | 17.51 | 15.52 | 13.23 | 10.74 | 8.10 | 5.13 | 2.06 | 0.13 | +## Model Results + + | Model | Result | all | .50 | .55 | .60 | .65 | .70 | .75 | .80 | .85 | .90 | .95 | + |--------|--------|-------|-------|-------|-------|-------|-------|-------|------|------|------|------| + | YOLACT | box | 11.64 | 23.13 | 21.51 | 19.36 | 16.98 | 14.19 | 10.93 | 6.83 | 2.83 | 0.61 | 0.05 | + | mask | 11.20 | 20.51 | 19.07 | 17.51 | 15.52 | 13.23 | 10.74 | 8.10 | 5.13 | 2.06 | 0.13 | 0.13 | + +## References -## Reference -https://github.com/dbolya/yolact +- [yolact](https://github.com/dbolya/yolact) diff --git a/cv/multi_object_tracking/README.md b/cv/multi_object_tracking/README.md deleted file mode 100644 index 36654584554ee480f8667d9f30a300db482421ae..0000000000000000000000000000000000000000 --- a/cv/multi_object_tracking/README.md +++ /dev/null @@ -1 +0,0 @@ -# Object Tracking diff --git a/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md b/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md index 66132c1aa71836bdf627dc5c2d601427b060f9d3..4e7ef4662f09c3366029f99f09574c806ea66637 100644 --- a/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md +++ b/cv/multi_object_tracking/bytetrack/paddlepaddle/README.md @@ -1,30 +1,25 @@ # ByteTrack -## Model description +## Model Description -ByteTrack is a simple, fast and strong multi-object tracker. +ByteTrack is an efficient multi-object tracking (MOT) model that improves tracking accuracy by associating every +detection box, including low-score ones, rather than discarding them. It addresses challenges like occluded objects and +fragmented trajectories by leveraging similarities between detections and tracklets. ByteTrack achieves state-of-the-art +performance on benchmarks like MOT17, with high MOTA, IDF1, and HOTA scores while maintaining real-time processing +speeds. Its simple yet effective design makes it a robust solution for various object tracking applications in video +analysis. -Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 scores ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a simple and strong tracker, named ByteTrack. For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU. +## Model Preparation -## Step 1: Installation +### Prepare Resources -```bash -git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleDetection.git -cd PaddleDetection -pip3 install -r requirements.txt -pip3 install protobuf==3.20.3 -pip3 install urllib3==1.26.6 -yum install mesa-libGL -python3 setup.py develop -``` - -## Step 2: Preparing datasets - -Go to visit [MOT17 official website](https://motchallenge.net/), then download the MOT17 dataset, or you can download via [paddledet data](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip),then extract and place it in the dataset/mot/folder. +Go to visit [MOT17 official website](https://motchallenge.net/), then download the MOT17 dataset, or you can download +via [paddledet data](https://bj.bcebos.com/v1/paddledet/data/mot/MOT17.zip),then extract and place it in the +dataset/mot/folder. The dataset path structure sholud look like: -``` +```bash datasets/mot/MOT17/ ├── annotations │   ├── train_half.json @@ -39,10 +34,21 @@ datasets/mot/MOT17/ ``` -## Step 3: Training +### Install Dependencies ```bash +git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleDetection.git +cd PaddleDetection +pip3 install -r requirements.txt +pip3 install protobuf==3.20.3 +pip3 install urllib3==1.26.6 +yum install mesa-libGL +python3 setup.py develop +``` +## Model Training + +```bash # One GPU CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp @@ -50,15 +56,12 @@ CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/mot/bytetrack/detector/ python3 -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp ``` -## Results +## Model Results -| MODEL | mAP(0.5:0.95)| -| ---------- | ------------ | -| ByteTrack | 0.538 | +| Model | GPU | FPS | mAP(0.5:0.95) | +|-----------|------------|--------|---------------| +| ByteTrack | BI-V100 x8 | 4.6504 | 0.538 | -| GPUS | FPS | -| ---------- | ------ | -| BI-V100x 8 | 4.6504 | +## References -## Reference -- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) \ No newline at end of file +- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) diff --git a/cv/multi_object_tracking/deep_sort/pytorch/README.md b/cv/multi_object_tracking/deep_sort/pytorch/README.md index 0a8517920cdbbeff156043ab3ff22b88e119cd85..bf4debfe1be16a18ff1db923b97a9b4c335c8a38 100644 --- a/cv/multi_object_tracking/deep_sort/pytorch/README.md +++ b/cv/multi_object_tracking/deep_sort/pytorch/README.md @@ -1,18 +1,16 @@ # DeepSORT -## Model description +## Model Description -This is an implement of MOT tracking algorithm deep sort. Deep sort is basicly the same with sort -but added a CNN model to extract features in image of human part bounded by a detector. This CNN -model is indeed a RE-ID model and the detector used in [PAPER](https://arxiv.org/abs/1703.07402) is -FasterRCNN , and the original source code is [HERE](https://github.com/nwojke/deep_sort). However in -original code, the CNN model is implemented with tensorflow, which I'm not familier with. SO I -re-implemented the CNN feature extraction model with PyTorch, and changed the CNN model a little -bit. Also, I use **YOLOv3** to generate bboxes instead of FasterRCNN. +DeepSORT is an advanced multi-object tracking algorithm that extends SORT by incorporating deep learning-based +appearance features. It combines motion information with a CNN-based RE-ID model to track objects more accurately, +especially in complex scenarios with occlusions. DeepSORT uses a Kalman filter for motion prediction and associates +detections using both motion and appearance cues. This approach improves tracking consistency and reduces identity +switches, making it particularly effective for person tracking in crowded scenes and video surveillance applications. -We just need to train the RE-ID model! +## Model Preparation -## Preparing datasets +### Prepare Resources Download the [Market-1501](https://zheng-lab.cecs.anu.edu.au/Project/project_reid.html) @@ -50,25 +48,26 @@ data ├── gallery ``` -## Training +## Model Training -The original model used in paper is in original_model.py, and its parameter here [original_ckpt.t7](https://drive.google.com/drive/folders/1xhG0kRH1EX5B9_Iz8gQJb7UNnn_riXi6). +The original model used in paper is in original_model.py, and its parameter here +[original_ckpt.t7](https://drive.google.com/drive/folders/1xhG0kRH1EX5B9_Iz8gQJb7UNnn_riXi6). ```sh +# Train python3 train.py --data-dir your path -``` - -## Evaluate your model -```sh +# Evaluate your model. python3 test.py --data-dir your path python3 evaluate.py ``` -## Results +## Model Results -Acc top1:0.980 +| Model | GPU | Top1 ACC | +|----------|------------|----------| +| DeepSORT | BI-V100 x8 | 0.980 | -## Reference +## References -Please refer to +- [deep_sort_pytorch](https://github.com/ZQPei/deep_sort_pytorch) diff --git a/cv/multi_object_tracking/fairmot/pytorch/README.md b/cv/multi_object_tracking/fairmot/pytorch/README.md index f087b9587357cb842bb28844df6fc2ea5ddb691f..e4167dd92b813a464d11f7d0f41e8bb1a79fcaa6 100644 --- a/cv/multi_object_tracking/fairmot/pytorch/README.md +++ b/cv/multi_object_tracking/fairmot/pytorch/README.md @@ -1,19 +1,18 @@ # FairMOT -## Model description +## Model Description -FairMOT is a model for multi-object tracking which consists of two homogeneous branches to predict pixel-wise objectness scores and re-ID features. The achieved fairness between the tasks is used to achieve high levels of detection and tracking accuracy. The detection branch is implemented in an anchor-free style which estimates object centers and sizes represented as position-aware measurement maps. Similarly, the re-ID branch estimates a re-ID feature for each pixel to characterize the object centered at the pixel. Note that the two branches are completely homogeneous which essentially differs from the previous methods which perform detection and re-ID in a cascaded style. It is also worth noting that FairMOT operates on high-resolution feature maps of strides four while the previous anchor-based methods operate on feature maps of stride 32. The elimination of anchors as well as the use of high-resolution feature maps better aligns re-ID features to object centers which significantly improves the tracking accuracy. +FairMOT is an innovative multi-object tracking model that unifies detection and re-identification in a single framework. +It features two homogeneous branches: one for anchor-free object detection and another for re-ID feature extraction. +Operating on high-resolution feature maps, FairMOT achieves fairness between detection and re-ID tasks, resulting in +improved tracking accuracy. Its joint learning approach eliminates the need for cascaded processing, making it more +efficient and effective for complex tracking scenarios in crowded environments. -## Step 1: Installing packages +## Model Preparation -```shell -$ pip3 install -r requirements.txt -$ pip3 install pandas progress -``` +### Prepare Resources -## Step 2: Preparing data - -### Download MOT17 dataset +Download MOT17 dataset - [Baidu NetDisk](https://pan.baidu.com/s/1lHa6UagcosRBz-_Y308GvQ) - [Google Drive](https://drive.google.com/file/d/1ET-6w12yHNo8DKevOVgK1dBlYs739e_3/view?usp=sharing) @@ -21,18 +20,18 @@ $ pip3 install pandas progress ```shell # Download MOT17 -$ mkdir -p data/MOT -$ cd data/MOT +mkdir -p data/MOT +cd data/MOT ``` ```shell -$ unzip -q MOT17.zip -$ mkdir MOT17/images && mkdir MOT17/labels_with_ids -$ mv ./MOT17/train ./MOT17/images/ && mv ./MOT17/test ./MOT17/images/ +unzip -q MOT17.zip +mkdir MOT17/images && mkdir MOT17/labels_with_ids +mv ./MOT17/train ./MOT17/images/ && mv ./MOT17/test ./MOT17/images/ -$ cd ../../ -$ python3 src/gen_labels_17.py +cd ../../ +python3 src/gen_labels_17.py ## The dataset path looks like below data/ @@ -45,7 +44,7 @@ data/    └── train ``` -### Download Pretrained models +Download Pretrained models - DLA-34 COCO pretrained model: [DLA-34 official](https://drive.google.com/file/d/18Q3fzzAsha_3Qid6mn4jcIFPeOGUaj1d) - HRNetV2-W18 ImageNet pretrained model: [BaiduYun(Access Code: r5xn)](https://pan.baidu.com/s/1Px_g1E2BLVRkKC5t-b-R5Q) @@ -53,91 +52,55 @@ data/ ```shell # Download ctdet_coco_dla_2x -$ mkdir -p models -$ cd models +mkdir -p models +cd models ``` -## Step 3: Training - -**The available train scripts are as follows:** +### Install Dependencies ```shell -train_dla34_mot17.sh -train_hrnet18_mot17.sh -train_hrnet32_mot17.sh - +pip3 install -r requirements.txt +pip3 install pandas progress ``` +## Model Training -### On single GPU +The available train scripts are as follows: ```shell -$ GPU_NUMS=1 bash